[ 
https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575850#comment-16575850
 ] 

Olivier Tavard commented on CONNECTORS-1523:
--------------------------------------------

Hello,

In fact the connector does two jobs : extract the part of the html document 
that you want thanks to englobing tag/filters to remove and also extracts the 
metadata in the tags  named "meta tags" and in some other tags like the title 
one (complete list in JsoupProcessing class).

For the englobing tag, it only picks the first one : you can see that on the 
HtmlExtractor class line 153 :
metadataExtracted = 
JsoupProcessing.extractTextAndMetadataHtmlDocument(document.getBinaryStream(),*sp.includeFilters.get(0)*,
 sp.excludeFilters, sp.striphtml);
 
 

> HTML Extractor transformation connector - "No englobing tag specified"
> ----------------------------------------------------------------------
>
>                 Key: CONNECTORS-1523
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1523
>             Project: ManifoldCF
>          Issue Type: Bug
>    Affects Versions: ManifoldCF 2.10
>            Reporter: Steph van Schalkwyk
>            Priority: Major
>
> When adding Englobing tag to HTML Extractor transformation, Englobing tag is 
> not persisted. 
> Can add on config screen in job edit, but value is not persisted.
> Results in "No englobing tag specified".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to