Re: [jira] Commented: (NUTCH-442) Integrate Solr/Nutch

Julien Nioche Mon, 23 Jun 2008 12:51:02 -0700

Vladimir,

There is a duplication of actions between the Crawl and Indexer patches on
one hand and the NUTCH-442_v5.patch on the other hand.
I simply replaced in 442_v5 the sections which are also modified by C and I
patches then applied this modified patch to the code. That worked fine.


J.

2008/6/21 Vladimir Garvardt (JIRA) <[EMAIL PROTECTED]>:

>
>    [
> https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607005#action_12607005]
>
> Vladimir Garvardt commented on NUTCH-442:
> -----------------------------------------
>
> Hello.
>
> I'm trying to apply this patch and faced a problem that I cannot solve by
> myself.
>
> I checked out nutch trunk (rev 670194), downloaded attachments from this
> issue and started patching.
> First I applied Crawl.patch, then Indexer.patch and then
> NUTCH-442_v5.patch. On applying last patch I got warning message. This
> happened because of conflict between Crawl.patch and NUTCH-442_v5.patch.
>
> Crawl.patch performs the following action:
> // index, dedup & merge
> +      indexer.index(indexes, solrUrl, crawlDb, linkDb,
> +          Arrays.asList(fs.listPaths(segments,
> HadoopFSUtil.getPassAllFilter())));
>
> and NUTCH-442_v5.patch performs the following action
>       // index, dedup & merge
> -      indexer.index(indexes, crawlDb, linkDb, fs.listPaths(segments,
> HadoopFSUtil.getPassAllFilter()));
> +      indexer.index(indexes, null, crawlDb, linkDb,
> +          Arrays.asList(fs.listPaths(segments,
> HadoopFSUtil.getPassAllFilter())));
>
>
> The main between this patches in second parameter.
> First I tried to build nutch with second parameter set to null - crawling
> finished successfully, but no data was added to solr.
> Then I changed second parameter to solrUrl and rebuilt nutch. On indexing
> following Exception was caught and indexing failed (no data in solr):
> Indexer: starting
> Indexer: crawldb: crawl/crawldb
> Indexer: linkdb: crawl/linkdb
> Indexer: solrUrl: http://localhost:8984/solr/
> Indexer: adding segment:
> file:/home/vladimirga/Documents/dev/src/lucene-src/nutch-2008-06-21/wrk-01/crawl/segments/20080621200352
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:318)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:148)
>
> What can cause that problem and how can I fix it to make nutch index into
> solr?
>
> Thanks.
>
> > Integrate Solr/Nutch
> > --------------------
> >
> >                 Key: NUTCH-442
> >                 URL: https://issues.apache.org/jira/browse/NUTCH-442
> >             Project: Nutch
> >          Issue Type: New Feature
> >         Environment: Ubuntu linux
> >            Reporter: rubdabadub
> >         Attachments: Crawl.patch, Indexer.patch, NUTCH-442_v4.patch,
> NUTCH-442_v5.patch, NUTCH_442_v3.patch, RFC_multiple_search_backends.patch,
> schema.xml
> >
> >
> > Hi:
> > After trying out Sami's patch regarding Solr/Nutch. Can be found here (
> http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html)
> and I can confirm it worked :-) And that lead me to request the following :
> > I would be very very great full if this could be included in nutch 0.9 as
> I am trying to eliminate my python based crawler which post documents to
> solr. As I am in the corporate enviornment I can't install trunk version in
> the production enviornment thus I am asking this to be included in 0.9
> release. I hope my wish would be granted.
> > I look forward to get some feedback.
> > Thank you.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Re: [jira] Commented: (NUTCH-442) Integrate Solr/Nutch

Reply via email to