Tika boilerpipe extractors

2018-06-27 Thread Arora, Madhvi
Hi All, Note reposting my question since looks like earlier one got posted on wrong thread. We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that come with Tika boilerpipe support. I am getting best result for pages where there are only outlinks with

Tika boilerpipe extractors

2018-06-27 Thread Arora, Madhvi
Hi All, We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that come with Tika boilerpipe support. I am getting best result for pages where there are only outlinks with CanolaExtractor in a page like this: https://support.automationdirect.com/faq/dl205.php But