[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575850#comment-16575850 ]
Olivier Tavard commented on CONNECTORS-1523: -------------------------------------------- Hello, In fact the connector does two jobs : extract the part of the html document that you want thanks to englobing tag/filters to remove and also extracts the metadata in the tags named "meta tags" and in some other tags like the title one (complete list in JsoupProcessing class). For the englobing tag, it only picks the first one : you can see that on the HtmlExtractor class line 153 : metadataExtracted = JsoupProcessing.extractTextAndMetadataHtmlDocument(document.getBinaryStream(),*sp.includeFilters.get(0)*, sp.excludeFilters, sp.striphtml); > HTML Extractor transformation connector - "No englobing tag specified" > ---------------------------------------------------------------------- > > Key: CONNECTORS-1523 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1523 > Project: ManifoldCF > Issue Type: Bug > Affects Versions: ManifoldCF 2.10 > Reporter: Steph van Schalkwyk > Priority: Major > > When adding Englobing tag to HTML Extractor transformation, Englobing tag is > not persisted. > Can add on config screen in job edit, but value is not persisted. > Results in "No englobing tag specified". -- This message was sent by Atlassian JIRA (v7.6.3#76005)