[ https://issues.apache.org/jira/browse/CONNECTORS-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068554#comment-13068554 ]
Karl Wright commented on CONNECTORS-214: ---------------------------------------- It wasn't added to the Solr connector because it wasn't clear whether the mime type filter would be adequate for people's needs, and the Solr connector had grown an unconfortable number of tabs already. So where things were left is that the infrastructure was written to support filtering by url, but the Solr connector only had mime type and length filtering support added. Having said that, if you have a need I would be willing to finish the job. It would be good to understand your actual use case so I'd be sure to cover it. > Add post-extraction inclusions and exclusions into the web connector > -------------------------------------------------------------------- > > Key: CONNECTORS-214 > URL: https://issues.apache.org/jira/browse/CONNECTORS-214 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2 > Reporter: Erlend GarĂ¥sen > Assignee: Karl Wright > Fix For: ManifoldCF next > > > If html files are excluded for a job, links in these files will not be > followed. If we add inclusion and exclusion filters based on post-extraction, > it will be possible to fetch only certain types of documents, such as PDFs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira