[ 
https://issues.apache.org/jira/browse/CONNECTORS-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068554#comment-13068554
 ] 

Karl Wright commented on CONNECTORS-214:
----------------------------------------

It wasn't added to the Solr connector because it wasn't clear whether the mime 
type filter would be adequate for people's needs, and the Solr connector had 
grown an unconfortable number of tabs already.

So where things were left is that the infrastructure was written to support 
filtering by url, but the Solr connector only had mime type and length 
filtering support added.  Having said that, if you have a need I would be 
willing to finish the job.  It would be good to understand your actual use case 
so I'd be sure to cover it.


> Add post-extraction inclusions and exclusions into the web connector
> --------------------------------------------------------------------
>
>                 Key: CONNECTORS-214
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-214
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>            Reporter: Erlend GarĂ¥sen
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> If html files are excluded for a job, links in these files will not be 
> followed. If we add inclusion and exclusion filters based on post-extraction, 
> it will be possible to fetch only certain types of documents, such as PDFs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to