[ 
https://issues.apache.org/jira/browse/UIMA-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Klügl reassigned UIMA-5147:
---------------------------------

    Assignee: Peter Klügl

> RUTA leaves the contents of STYLE tags in plaintext
> ---------------------------------------------------
>
>                 Key: UIMA-5147
>                 URL: https://issues.apache.org/jira/browse/UIMA-5147
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.3.0ruta
>            Reporter: Dale Lane
>            Assignee: Peter Klügl
>            Priority: Minor
>             Fix For: 2.5.1ruta
>
>
> I'm using RUTA HtmlAnnotator and HtmlConverter to turn an HTML document into 
> the plain text extracted from it, with annotations to represent the markup 
> that were in the original HTML. 
> The contents of <STYLE> tags are showing up in the plaintext view, which 
> isn't helpful. As STYLE isn't part of the document contents, I think it'd be 
> better for this not to be added to plaintext, or at least for there to be an 
> option to allow this to be excluded. 
> (Apologies if I've missed a way to do this using the existing options)
> As an example of a simple recreate, a document like this can be used:
> {code:xml}
> <html><head>
>     <style>
>         /*  */
>         .test {
>             text-align: left;
>         }
>     </style>
> </head><body>Hello world</body></html>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to