Julien Massiera created CONNECTORS-1656: -------------------------------------------
Summary: HTML extractor produces invalid XML Key: CONNECTORS-1656 URL: https://issues.apache.org/jira/browse/CONNECTORS-1656 Project: ManifoldCF Issue Type: Bug Components: HTML extractor Affects Versions: ManifoldCF 2.17 Reporter: Julien Massiera The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' option is disabled) but invalid XML (some tags like img do not have closing tag), and in some cases it is problematic. For example, when Tika is used behind, it processes the document as an XML document and most of the time a parse exception is raised. -- This message was sent by Atlassian Jira (v8.3.4#803005)