Aleksei Udalov created TIKA-2610:
------------------------------------

             Summary: Extend HtmlMapper isDiscardElement method with Attributes 
parameter
                 Key: TIKA-2610
                 URL: https://issues.apache.org/jira/browse/TIKA-2610
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.17
            Reporter: Aleksei Udalov


Currently, if we want to disregard HTML elements by attribute value/existence, 
an example from one of our projects
{code:html}
<div data-meta-no-index>Some content to be ignored by custom search indexer 
(Tika parser)</div>
{code}
it's required to implement a custom handler with logic very similar to what we 
have in org.apache.tika.parser.html.HtmlHandler. While it can be easily done by 
keep using HtmlHandler, but setting an instance of HtmlMapper with (newly 
added) isDiscardElement(String name, Attributes attributes) method overridden 
into the ParseContext.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to