Aleksei Udalov created TIKA-2610: ------------------------------------ Summary: Extend HtmlMapper isDiscardElement method with Attributes parameter Key: TIKA-2610 URL: https://issues.apache.org/jira/browse/TIKA-2610 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.17 Reporter: Aleksei Udalov
Currently, if we want to disregard HTML elements by attribute value/existence, an example from one of our projects {code:html} <div data-meta-no-index>Some content to be ignored by custom search indexer (Tika parser)</div> {code} it's required to implement a custom handler with logic very similar to what we have in org.apache.tika.parser.html.HtmlHandler. While it can be easily done by keep using HtmlHandler, but setting an instance of HtmlMapper with (newly added) isDiscardElement(String name, Attributes attributes) method overridden into the ParseContext. -- This message was sent by Atlassian JIRA (v7.6.3#76005)