[ 
https://issues.apache.org/jira/browse/TIKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495754#comment-16495754
 ] 

Hudson commented on TIKA-2100:
------------------------------

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #262 (See 
[https://builds.apache.org/job/tika-2.x-windows/262/])
TIKA-2100 -- fix unit test (tallison: rev 
198d5ef995532f262c970f2ef76e64b852bed7f4)
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java


> Html Parser does not keep the html tag attributes
> -------------------------------------------------
>
>                 Key: TIKA-2100
>                 URL: https://issues.apache.org/jira/browse/TIKA-2100
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: Gerard Bouchar
>            Priority: Major
>             Fix For: 1.19, 2.0.0
>
>
> Parsing a very simple html like 
>  <!DOCTYPE html>
> <html lang="en">
> <head>
> <title>Page Title</title>
> </head>
> <body>
> <h1 align="left">My First Heading</h1>
> <p>My first paragraph.</p>
> </body>
> </html> 
> you won't be able to access the html tag's attributes (here lang="en") in the 
> ContentHandler : 
> *in the method startElement(String ns, String localName, String name,
>       Attributes atts), atts is empty.
> *Moreover it seems that the html tag's attributes are not passed trough the 
> HtmlMapper.mapSafeAttribute method too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to