[
https://issues.apache.org/jira/browse/UIMA-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Klügl reassigned UIMA-5147:
---------------------------------
Assignee: Peter Klügl
> RUTA leaves the contents of STYLE tags in plaintext
> ---------------------------------------------------
>
> Key: UIMA-5147
> URL: https://issues.apache.org/jira/browse/UIMA-5147
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Affects Versions: 2.3.0ruta
> Reporter: Dale Lane
> Assignee: Peter Klügl
> Priority: Minor
> Fix For: 2.5.1ruta
>
>
> I'm using RUTA HtmlAnnotator and HtmlConverter to turn an HTML document into
> the plain text extracted from it, with annotations to represent the markup
> that were in the original HTML.
> The contents of <STYLE> tags are showing up in the plaintext view, which
> isn't helpful. As STYLE isn't part of the document contents, I think it'd be
> better for this not to be added to plaintext, or at least for there to be an
> option to allow this to be excluded.
> (Apologies if I've missed a way to do this using the existing options)
> As an example of a simple recreate, a document like this can be used:
> {code:xml}
> <html><head>
> <style>
> /* */
> .test {
> text-align: left;
> }
> </style>
> </head><body>Hello world</body></html>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)