Ah, ok, it looks like we used to include "body" in the AUTO set in 
XHTMLContentHandler...

There we go, TIKA-995

https://svn.apache.org/repos/asf/tika/trunk@1663513 
13f79535-47bb-0310-9956-ffa450edef68

-----Original Message-----
From: Allison, Timothy B. [mailto:[email protected]] 
Sent: Friday, June 17, 2016 2:09 PM
To: [email protected]
Subject: doubling of body tag in HTMLParser?

All,
  In working on integrating Tika 1.13 with Solr, I found that we are now 
suppressing the "body" tag in our HTMLParser via DefaultHtmlMapper not 
including "body" among the SAFE_ELEMENTS.  The XHTMLHandler is responsible for 
this now.  Does this ring a bell?

Is this a bug in Tika, or do we expect people to suppress body in their 
implementations of HtmlMapper?

        Cheers,

                Tim

https://issues.apache.org/jira/browse/SOLR-8981

Reply via email to