Vladimir, There is a duplication of actions between the Crawl and Indexer patches on one hand and the NUTCH-442_v5.patch on the other hand. I simply replaced in 442_v5 the sections which are also modified by C and I patches then applied this modified patch to the code. That worked fine.
J. 2008/6/21 Vladimir Garvardt (JIRA) <[EMAIL PROTECTED]>: > > [ > https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607005#action_12607005] > > Vladimir Garvardt commented on NUTCH-442: > ----------------------------------------- > > Hello. > > I'm trying to apply this patch and faced a problem that I cannot solve by > myself. > > I checked out nutch trunk (rev 670194), downloaded attachments from this > issue and started patching. > First I applied Crawl.patch, then Indexer.patch and then > NUTCH-442_v5.patch. On applying last patch I got warning message. This > happened because of conflict between Crawl.patch and NUTCH-442_v5.patch. > > Crawl.patch performs the following action: > // index, dedup & merge > + indexer.index(indexes, solrUrl, crawlDb, linkDb, > + Arrays.asList(fs.listPaths(segments, > HadoopFSUtil.getPassAllFilter()))); > > and NUTCH-442_v5.patch performs the following action > // index, dedup & merge > - indexer.index(indexes, crawlDb, linkDb, fs.listPaths(segments, > HadoopFSUtil.getPassAllFilter())); > + indexer.index(indexes, null, crawlDb, linkDb, > + Arrays.asList(fs.listPaths(segments, > HadoopFSUtil.getPassAllFilter()))); > > > The main between this patches in second parameter. > First I tried to build nutch with second parameter set to null - crawling > finished successfully, but no data was added to solr. > Then I changed second parameter to solrUrl and rebuilt nutch. On indexing > following Exception was caught and indexing failed (no data in solr): > Indexer: starting > Indexer: crawldb: crawl/crawldb > Indexer: linkdb: crawl/linkdb > Indexer: solrUrl: http://localhost:8984/solr/ > Indexer: adding segment: > file:/home/vladimirga/Documents/dev/src/lucene-src/nutch-2008-06-21/wrk-01/crawl/segments/20080621200352 > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:318) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:148) > > What can cause that problem and how can I fix it to make nutch index into > solr? > > Thanks. > > > Integrate Solr/Nutch > > -------------------- > > > > Key: NUTCH-442 > > URL: https://issues.apache.org/jira/browse/NUTCH-442 > > Project: Nutch > > Issue Type: New Feature > > Environment: Ubuntu linux > > Reporter: rubdabadub > > Attachments: Crawl.patch, Indexer.patch, NUTCH-442_v4.patch, > NUTCH-442_v5.patch, NUTCH_442_v3.patch, RFC_multiple_search_backends.patch, > schema.xml > > > > > > Hi: > > After trying out Sami's patch regarding Solr/Nutch. Can be found here ( > http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html) > and I can confirm it worked :-) And that lead me to request the following : > > I would be very very great full if this could be included in nutch 0.9 as > I am trying to eliminate my python based crawler which post documents to > solr. As I am in the corporate enviornment I can't install trunk version in > the production enviornment thus I am asking this to be included in 0.9 > release. I hope my wish would be granted. > > I look forward to get some feedback. > > Thank you. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- DigitalPebble Ltd http://www.digitalpebble.com