[
https://issues.apache.org/jira/browse/CONNECTORS-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Massiera updated CONNECTORS-1679:
----------------------------------------
Description: The output of the HTML extractor is generated with escaped
entities (eg '&' becomes '& amp ;'), which is not the wanted behavior as we
want this connector to extract text from HTML as it is (was: The output of the
HTML extractor is generated with escaped entities (eg '&' becomes '&'),
which is not the wanted behavior as we want this connector to extract text from
HTML as it is)
> HTML Extractor: output has escaped entities
> -------------------------------------------
>
> Key: CONNECTORS-1679
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1679
> Project: ManifoldCF
> Issue Type: Bug
> Components: HTML extractor
> Affects Versions: ManifoldCF 2.20
> Reporter: Julien Massiera
> Priority: Major
> Fix For: ManifoldCF 2.21
>
> Attachments: patch-CONNECTORS-1679.txt
>
>
> The output of the HTML extractor is generated with escaped entities (eg '&'
> becomes '& amp ;'), which is not the wanted behavior as we want this
> connector to extract text from HTML as it is
--
This message was sent by Atlassian Jira
(v8.20.1#820001)