Julien Massiera created CONNECTORS-1656:
-------------------------------------------

             Summary: HTML extractor produces invalid XML
                 Key: CONNECTORS-1656
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
             Project: ManifoldCF
          Issue Type: Bug
          Components: HTML extractor
    Affects Versions: ManifoldCF 2.17
            Reporter: Julien Massiera


The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
option is disabled) but invalid XML (some tags like img do not have closing 
tag), and in some cases it is problematic. For example, when Tika is used 
behind, it processes the document as an XML document and most of the time a 
parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to