Julien Massiera created CONNECTORS-1656:
-------------------------------------------
Summary: HTML extractor produces invalid XML
Key: CONNECTORS-1656
URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
Project: ManifoldCF
Issue Type: Bug
Components: HTML extractor
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera
The HTML extractor connector produces valid HTML doc (when the 'Strip HTML'
option is disabled) but invalid XML (some tags like img do not have closing
tag), and in some cases it is problematic. For example, when Tika is used
behind, it processes the document as an XML document and most of the time a
parse exception is raised.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)