Hi All,
Note reposting my question since looks like earlier one got posted on wrong
thread.
We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that
come with Tika boilerpipe support. I am getting best result for pages where
there are only outlinks with
Hi All,
We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that
come with Tika boilerpipe support. I am getting best result for pages where
there are only outlinks with CanolaExtractor in a page like this:
https://support.automationdirect.com/faq/dl205.php
But
Hi,
We currently use Nutch 1.10 and SOLR 4.x. We are in a process of upgrading both
software. I wanted to find out if the latest version of Nutch 1.13 is
compatible with SOLR 6. Also, if there is any documentation that I can use for
upgrading Nutch that will be compatible with SOLR 6.
Thanks
>
>On Thu, Aug 18, 2016 at 7:08 AM, <user-digest-h...@nutch.apache.org> wrote:
>
>>
>> From: "Arora, Madhvi" <mar...@automationdirect.com>
>> To: "user@nutch.apache.org" <user@nutch.apache.org>
>> Cc:
>> Date: Wed, 1
Hi,
I wanted to find out how to correct the issue below and will appreciate any
help.
I am trying to upgrade to Nutch 1.12. I am using solr 5.3.1. The reason I am
upgrading are:
1: https crawling
2: Boilerplate canola extraction through tika
The only problem so far I am having is an
s kind of related to what I need.
On 8/5/16, 2:18 PM, "Arora, Madhvi" <mar...@automationdirect.com> wrote:
>Thank you very much!
>
>
>
>
>On 8/5/16, 2:13 PM, "Markus Jelsma" <markus.jel...@openindex.io> wrote:
>
>>I am not sure which versio
Thank you very much!
On 8/5/16, 2:13 PM, "Markus Jelsma" wrote:
>I am not sure which version is was added, you'd have to check CHANGES.txt, but
>upgrading is usually a good idea and very simple.
>Markus
>
>
>
>-Original message-
>> From:Arora, Madhvi
Markus so to crawl https and http urls successfully we just need to switch to a
newer version of Nutch I.e. Higher than Nutch 1.10?
On 8/5/16, 12:47 PM, "Markus Jelsma" wrote:
>Hello - see inline.
>Markus
>
>-Original message-
>> From:Arora, Madhvi
Hi,
We are using Nutch 1.10 and Solr 5. We have around 10 different web sites that
are crawled regularly. We are changing protocol of a few websites from http to
https. So we will have a mix bag of http and https protocols.
I checked in nutch user-mail archive and get that we need to change