[ 
https://issues.apache.org/jira/browse/CONNECTORS-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068655#comment-13068655
 ] 

Karl Wright commented on CONNECTORS-214:
----------------------------------------

Based on your use case, it would not seem appropriate to me to add this 
functionality to the Solr connector, but rather to the web connector itself.  
So I'd propose implementing a feature along the lines of the first comment in 
this ticket.  It's easy enough to do; I should have something to commit no 
later than Sunday evening.

> Add post-extraction inclusions and exclusions into the web connector
> --------------------------------------------------------------------
>
>                 Key: CONNECTORS-214
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-214
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>            Reporter: Erlend GarĂ¥sen
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> If html files are excluded for a job, links in these files will not be 
> followed. If we add inclusion and exclusion filters based on post-extraction, 
> it will be possible to fetch only certain types of documents, such as PDFs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to