[jira] [Resolved] (CONNECTORS-1550) HTML Tag mapping

2018-10-19 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1550.
-
Resolution: Not A Problem

Hi [~DonaldVdD], please post questions like this to the 
us...@manifoldcf.apache.org mailing list.  Jira is meant for bugs and 
enhancement requests.  Thank you!


> HTML Tag mapping
> 
>
> Key: CONNECTORS-1550
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1550
> Project: ManifoldCF
>  Issue Type: Wish
>  Components: Elastic Search connector, Tika extractor, Web connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Donald Van den Driessche
>Priority: Major
>
> I’ll be crawling a website with the standard Web connecter. I want to extract 
> just certain html tags like ,  and . 
> I’ve set up an HTML extractor transformation connector and the internal Tika 
> transformation connector. But I can’t find any place to do a mapping to the 
> output for this.
>  
> Do I have to write my own transformation connector to extract the content of 
> these tags? Or is there a built in solution?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1550) HTML Tag mapping

2018-10-19 Thread Donald Van den Driessche (JIRA)
Donald Van den Driessche created CONNECTORS-1550:


 Summary: HTML Tag mapping
 Key: CONNECTORS-1550
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1550
 Project: ManifoldCF
  Issue Type: Wish
  Components: Elastic Search connector, Tika extractor, Web connector
Affects Versions: ManifoldCF 2.10
Reporter: Donald Van den Driessche


I’ll be crawling a website with the standard Web connecter. I want to extract 
just certain html tags like ,  and . 
I’ve set up an HTML extractor transformation connector and the internal Tika 
transformation connector. But I can’t find any place to do a mapping to the 
output for this.
 
Do I have to write my own transformation connector to extract the content of 
these tags? Or is there a built in solution?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)