Hello Shubham,
If youy have set fetcher.parse to true, then do not execute the parse job,
because it is already parsed. I don't know about 2.x, but in 1.x, you cannot
parse a segment which was already parsed. If you use some crawl script that has
parsing hardcoded in it, despite fetcher.parse,
Thank you very much!
On 8/5/16, 2:13 PM, "Markus Jelsma" wrote:
>I am not sure which version is was added, you'd have to check CHANGES.txt, but
>upgrading is usually a good idea and very simple.
>Markus
>
>
>
>-Original message-
>> From:Arora, Madhvi
I am not sure which version is was added, you'd have to check CHANGES.txt, but
upgrading is usually a good idea and very simple.
Markus
-Original message-
> From:Arora, Madhvi
> Sent: Friday 5th August 2016 19:53
> To: user@nutch.apache.org
> Subject:
Markus so to crawl https and http urls successfully we just need to switch to a
newer version of Nutch I.e. Higher than Nutch 1.10?
On 8/5/16, 12:47 PM, "Markus Jelsma" wrote:
>Hello - see inline.
>Markus
>
>-Original message-
>> From:Arora, Madhvi
Hello - see inline.
Markus
-Original message-
> From:Arora, Madhvi
> Sent: Friday 5th August 2016 18:03
> To: user@nutch.apache.org
> Subject: Protocol change to https
>
> Hi,
>
> We are using Nutch 1.10 and Solr 5. We have around 10 different web sites
Hi,
We are using Nutch 1.10 and Solr 5. We have around 10 different web sites that
are crawled regularly. We are changing protocol of a few websites from http to
https. So we will have a mix bag of http and https protocols.
I checked in nutch user-mail archive and get that we need to change
CLASSIFICATION: UNCLASSIFIED
Is there a particular schema.xml file I should be using with nutch 1.12 to
index into solr 6.1.0?
Im trying to debug indexing error:
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at
7 matches
Mail list logo