Tika boilerpipe extractors

2018-06-27 Thread Arora, Madhvi
Hi All, Note reposting my question since looks like earlier one got posted on wrong thread. We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that come with Tika boilerpipe support. I am getting best result for pages where there are only outlinks with

Tika boilerpipe extractors

2018-06-27 Thread Arora, Madhvi
Hi All, We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that come with Tika boilerpipe support. I am getting best result for pages where there are only outlinks with CanolaExtractor in a page like this: https://support.automationdirect.com/faq/dl205.php But

Re: [MASSMAIL][ANNOUNCE] New Nutch committer and PMC -

2018-06-27 Thread Jorge Betancourt
Welcome on board Roannel! Great to have you here! Best Regards, On Wed, Jun 27, 2018 at 9:17 AM Semyon Semyonov wrote: > Hi Roannel, > > Congratulations and good luck! > > Semyon. > > > Sent: Wednesday, June 27, 2018 at 3:42 AM > From: "Roannel Fernández Hernández" > To: user@nutch.apache.org

Nutch 2.x. Apache Gora backends survey

2018-06-27 Thread Alfonso Nishikawa
Hello. This is a survey for Nutch 2.x users. I am developing a web application to access any data persisted through Apache Gora. At this moment supports HBase and I would want to know what are the most used backend to prioritize the backends to add (need some work). The question is: What backend

Re: [MASSMAIL][ANNOUNCE] New Nutch committer and PMC -

2018-06-27 Thread Semyon Semyonov
Hi Roannel, Congratulations and good luck!   Semyon.   Sent: Wednesday, June 27, 2018 at 3:42 AM From: "Roannel Fernández Hernández" To: user@nutch.apache.org Subject: Re: [MASSMAIL][ANNOUNCE] New Nutch committer and PMC - Hi Folks Thank you very much for allowing me to be part of this