Nutch 1.x and Solr compatible versions

2017-05-02 Thread Arora, Madhvi
Hi, We currently use Nutch 1.10 and SOLR 4.x. We are in a process of upgrading both software. I wanted to find out if the latest version of Nutch 1.13 is compatible with SOLR 6. Also, if there is any documentation that I can use for upgrading Nutch that will be compatible with SOLR 6. Thanks

RE: Wrong FS exception in Fetcher

2017-05-02 Thread Yossi Tamari
Hi, Issue created: https://issues.apache.org/jira/browse/NUTCH-2383. Thanks, Yossi. -Original Message- From: Sebastian Nagel [mailto:wastl.na...@googlemail.com] Sent: 02 May 2017 16:08 To: user@nutch.apache.org Subject: Re: Wrong FS exception in Fetcher Hi Yossi, > that 1.13

Re: Wrong FS exception in Fetcher

2017-05-02 Thread Sebastian Nagel
Hi Yossi, > that 1.13 requires Hadoop 2.7.2 specifically. That's not a hard requirement. Usually you have to use the Hadoop version of your running Hadoop cluster. Mostly this causes no problems, but if there are problems it's a good strategy to try this first. Thanks, for the detailed log.

RE: Wrong FS exception in Fetcher

2017-05-02 Thread Yossi Tamari
Thanks Sebastian, The output with set -x is below. I'm new to Nutch and was not aware that 1.13 requires Hadoop 2.7.2 specifically. While I see it now in pom.xml, it may be a good idea to document it in the download page and provide a download link (since the Hadoop releases page contains

Re: Wrong FS exception in Fetcher

2017-05-02 Thread Sebastian Nagel
Hi Yossi, strange error, indeed. Is it also reproducible in pseudo-distributed mode using Hadoop 2.7.2, the version Nutch depends on?n Could you also add the line set -x to bin/nutch and run bin/crawl again to see how all steps are executed. Thanks, Sebastian On 04/30/2017 04:04 PM, Yossi

Re: indexer-elastic version bump runtime dep issue

2017-05-02 Thread Sebastian Nagel
Hi Jurian, thanks, that's great news! I'll have a look at your patch. Best, Sebastian On 05/01/2017 05:56 PM, Jurian Broertjes wrote: > Hi Sebastian, > > I've continued to struggle with this on several levels (both local and on > Hadoop), and in the end > tried to change the way the

Re: crawlDb speed around deduplication

2017-05-02 Thread Sebastian Nagel
Hi Michael, the easiest way is probably to check the actual job configuration as shown by the Hadoop resource manager webapp, see screenshot. It's also indicated from where a configuration property is set. Best, Sebastian On 05/02/2017 12:57 AM, Michael Coffey wrote: > Thanks, I will do some

RE: idexer "possible analysis error"

2017-05-02 Thread Markus Jelsma
Hello - this means you have a broken analyzer, one of the token filters or charfilters in your chain is broken. It is usually about a startOffset being ahead of a endOffset, which is not possible indeed. Lucene detects this proactive and won't allow you to add erroneous input. Fix your