Hi,
I implemented something. If you haven't seen it yet, take a look at the
new HtmlConverterXmlTest for exmaples.
Best,
Peter
Am 14.03.2015 um 14:49 schrieb Mario Gazzo:
No problem. You can contact me anytime in case you have additional questions.
On 14 Mar 2015, at 14:34 , Peter Klügl
No problem. You can contact me anytime in case you have additional questions.
On 14 Mar 2015, at 14:34 , Peter Klügl pklu...@uni-wuerzburg.de wrote:
Hi,
thanks for the issue and sorry for the delayed response. I did not yet find
the time to look into it, but I will the next days.
The issue has now been created:
https://issues.apache.org/jira/browse/UIMA-4286
https://issues.apache.org/jira/browse/UIMA-4286
On 11 Mar 2015, at 14:47 , Mario Gazzo mario.ga...@gmail.com wrote:
Thanks, I understand the choices now. I would also probably prefer to use the
document
Hi Peter, while I don't think I will be using the HtmlConverter right away,
I would vote for using the length of the document annotation for
annotations that relate to the whole document (such as metadata). That
makes them show up nicely in the CasEditor/Viewer and you could maintain it
in all
Thanks, I can of course open an issue for this.
I have been playing with a modified version of the HTMLConverter, which is why
my reply is delayed. I disabled the ‘inBody’-flag inside the
HTMLConverterVisitor to get an idea of what the effects might be. It pretty
much did want I thought I
Hi,
the HtmlConverter was built to create an annotated document containing
the plain text of the html or xml source. It intends to remove all
elements that would not be visible for one that takes a look at the
interpreted html, e.g., in an html browser. Thus, it removes a lot of
text of the
We conducted some experiments with both the HtmlAnnotator and the HtmlConverter
but we ran into an issue with the converter. It appears to only convert tag
annotations that surround or are inside the body tag. Metadata elements like
citations are ignored. The only way to get around this seems
Thanks. Looks interesting, seems that it could fit our use case. We will have a
closer look at it.
On 18 Feb 2015, at 21:58 , Peter Klügl pklu...@uni-wuerzburg.de wrote:
Hi,
you might want to take a look at two analysis engines of UIMA Ruta:
HtmlAnnotator and HtmlConverter [1]
The