Markus Jelsma created SOLR-3808: ----------------------------------- Summary: Extraction contrib to utilize Boilerpipe Key: SOLR-3808 URL: https://issues.apache.org/jira/browse/SOLR-3808 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Markus Jelsma Priority: Minor Fix For: 4.0
Solr's extraction contrib uses Tika for document parsing and should be able te use Boilerpipe. Tika comes with Boilerpipe, a library capable of removing boilerplate text from HTML pages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org