Ah, ok, it looks like we used to include "body" in the AUTO set in XHTMLContentHandler...
There we go, TIKA-995 https://svn.apache.org/repos/asf/tika/trunk@1663513 13f79535-47bb-0310-9956-ffa450edef68 -----Original Message----- From: Allison, Timothy B. [mailto:[email protected]] Sent: Friday, June 17, 2016 2:09 PM To: [email protected] Subject: doubling of body tag in HTMLParser? All, In working on integrating Tika 1.13 with Solr, I found that we are now suppressing the "body" tag in our HTMLParser via DefaultHtmlMapper not including "body" among the SAFE_ELEMENTS. The XHTMLHandler is responsible for this now. Does this ring a bell? Is this a bug in Tika, or do we expect people to suppress body in their implementations of HtmlMapper? Cheers, Tim https://issues.apache.org/jira/browse/SOLR-8981
