[ https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707280#comment-16707280 ]
Tim Allison commented on TIKA-2791: ----------------------------------- I'd want to focus on a handful of common tags: p, div, ul, ol, li, table, tr, td, u, i, b, a...any others? > Add structure tags to tika-eval > ------------------------------- > > Key: TIKA-2791 > URL: https://issues.apache.org/jira/browse/TIKA-2791 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Major > > It would be useful to be able to compare counts of common structure tags in > tika-eval. We could also detect and flag bad structure tags, e.g.: > <i><u></i></u> -- This message was sent by Atlassian JIRA (v7.6.3#76005)