[ https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607005#action_12607005 ]
Vladimir Garvardt commented on NUTCH-442: ----------------------------------------- Hello. I'm trying to apply this patch and faced a problem that I cannot solve by myself. I checked out nutch trunk (rev 670194), downloaded attachments from this issue and started patching. First I applied Crawl.patch, then Indexer.patch and then NUTCH-442_v5.patch. On applying last patch I got warning message. This happened because of conflict between Crawl.patch and NUTCH-442_v5.patch. Crawl.patch performs the following action: // index, dedup & merge + indexer.index(indexes, solrUrl, crawlDb, linkDb, + Arrays.asList(fs.listPaths(segments, HadoopFSUtil.getPassAllFilter()))); and NUTCH-442_v5.patch performs the following action // index, dedup & merge - indexer.index(indexes, crawlDb, linkDb, fs.listPaths(segments, HadoopFSUtil.getPassAllFilter())); + indexer.index(indexes, null, crawlDb, linkDb, + Arrays.asList(fs.listPaths(segments, HadoopFSUtil.getPassAllFilter()))); The main between this patches in second parameter. First I tried to build nutch with second parameter set to null - crawling finished successfully, but no data was added to solr. Then I changed second parameter to solrUrl and rebuilt nutch. On indexing following Exception was caught and indexing failed (no data in solr): Indexer: starting Indexer: crawldb: crawl/crawldb Indexer: linkdb: crawl/linkdb Indexer: solrUrl: http://localhost:8984/solr/ Indexer: adding segment: file:/home/vladimirga/Documents/dev/src/lucene-src/nutch-2008-06-21/wrk-01/crawl/segments/20080621200352 Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894) at org.apache.nutch.indexer.Indexer.index(Indexer.java:318) at org.apache.nutch.crawl.Crawl.main(Crawl.java:148) What can cause that problem and how can I fix it to make nutch index into solr? Thanks. > Integrate Solr/Nutch > -------------------- > > Key: NUTCH-442 > URL: https://issues.apache.org/jira/browse/NUTCH-442 > Project: Nutch > Issue Type: New Feature > Environment: Ubuntu linux > Reporter: rubdabadub > Attachments: Crawl.patch, Indexer.patch, NUTCH-442_v4.patch, > NUTCH-442_v5.patch, NUTCH_442_v3.patch, RFC_multiple_search_backends.patch, > schema.xml > > > Hi: > After trying out Sami's patch regarding Solr/Nutch. Can be found here > (http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html) > and I can confirm it worked :-) And that lead me to request the following : > I would be very very great full if this could be included in nutch 0.9 as I > am trying to eliminate my python based crawler which post documents to solr. > As I am in the corporate enviornment I can't install trunk version in the > production enviornment thus I am asking this to be included in 0.9 release. I > hope my wish would be granted. > I look forward to get some feedback. > Thank you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.