Re: Approach for keeping track of formatting associated with text views

2015-03-24 Thread Peter Klügl
Hi, I implemented something. If you haven't seen it yet, take a look at the new HtmlConverterXmlTest for exmaples. Best, Peter Am 14.03.2015 um 14:49 schrieb Mario Gazzo: No problem. You can contact me anytime in case you have additional questions. On 14 Mar 2015, at 14:34 , Peter Klügl

Re: Approach for keeping track of formatting associated with text views

2015-03-14 Thread Mario Gazzo
No problem. You can contact me anytime in case you have additional questions. On 14 Mar 2015, at 14:34 , Peter Klügl pklu...@uni-wuerzburg.de wrote: Hi, thanks for the issue and sorry for the delayed response. I did not yet find the time to look into it, but I will the next days.

Re: Approach for keeping track of formatting associated with text views

2015-03-13 Thread Mario Gazzo
The issue has now been created: https://issues.apache.org/jira/browse/UIMA-4286 https://issues.apache.org/jira/browse/UIMA-4286 On 11 Mar 2015, at 14:47 , Mario Gazzo mario.ga...@gmail.com wrote: Thanks, I understand the choices now. I would also probably prefer to use the document

Re: Approach for keeping track of formatting associated with text views

2015-03-12 Thread Jens Grivolla
Hi Peter, while I don't think I will be using the HtmlConverter right away, I would vote for using the length of the document annotation for annotations that relate to the whole document (such as metadata). That makes them show up nicely in the CasEditor/Viewer and you could maintain it in all

Re: Approach for keeping track of formatting associated with text views

2015-03-10 Thread Mario Gazzo
Thanks, I can of course open an issue for this. I have been playing with a modified version of the HTMLConverter, which is why my reply is delayed. I disabled the ‘inBody’-flag inside the HTMLConverterVisitor to get an idea of what the effects might be. It pretty much did want I thought I

Re: Approach for keeping track of formatting associated with text views

2015-03-10 Thread Peter Klügl
Hi, the HtmlConverter was built to create an annotated document containing the plain text of the html or xml source. It intends to remove all elements that would not be visible for one that takes a look at the interpreted html, e.g., in an html browser. Thus, it removes a lot of text of the

Re: Approach for keeping track of formatting associated with text views

2015-03-06 Thread Mario Gazzo
We conducted some experiments with both the HtmlAnnotator and the HtmlConverter but we ran into an issue with the converter. It appears to only convert tag annotations that surround or are inside the body tag. Metadata elements like citations are ignored. The only way to get around this seems

Re: Approach for keeping track of formatting associated with text views

2015-02-18 Thread Mario Gazzo
Thanks. Looks interesting, seems that it could fit our use case. We will have a closer look at it. On 18 Feb 2015, at 21:58 , Peter Klügl pklu...@uni-wuerzburg.de wrote: Hi, you might want to take a look at two analysis engines of UIMA Ruta: HtmlAnnotator and HtmlConverter [1] The