[ https://issues.apache.org/jira/browse/TIKA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871568#action_12871568 ]
Jukka Zitting commented on TIKA-430: ------------------------------------ Sounds reasonable, especially since unlike extra content text, extra attributes are easily skipped. > Automatically let all valid XHTML 1.0 attributes through from HTML documents > ---------------------------------------------------------------------------- > > Key: TIKA-430 > URL: https://issues.apache.org/jira/browse/TIKA-430 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Ken Krugler > Assignee: Ken Krugler > > Many consumers of parse output wouldn't want to process the raw > (unnormalized) elements they'd get with the IdentityHtmlMapper, but they > would want to get any standard attributes. For example, with <a> elements > they would get any rel attribues. > I believe this would require changing the DefaultHtmlMapper to "know" about > valid attributes for different elements. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.