If I am reading the message right :) then yes that problem would have been fixed by now. I believe that problem was with an earlier version of Nutch (0.7).
Dennis Kubes Kai_testing Middleton wrote: > Am I correct that the 'new' mergedb and mergelinkdb commands together would > fix this problem from April 2006 > http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04112.html > > > Re: Query on merged indexes returned 0 hit - test case > included (Nutch 0.8) > Andrzej Bialecki > > Tue, 04 Apr 2006 09:29:51 -0700 > > > > > > > > > > > Olive g wrote: > Hi Andrzej & other gurus who might be reading this message :-): > > I ran some tests and somehow my query returned 0 hit against merged > indexes. Here is my test case and it's a bit long, thank you in > advance for your patience: > 1. crawled the first 100 urls > > ~/nutch/search/bin/nutch crawl urls-001-100 -dir test1 -depth 1 >& > test1.log& > 2. set searcher.dir to test1 > > 3. query for "movie" > ~/nutch/search/bin/nutch org.apache.nutch.searcher.NutchBean movie > > it returned 64 hits (a web research with tomcat returned the same > result) > 4. crawled the second 100 urls > > ~/nutch/search/bin/nutch crawl urls-101-200 -dir test2 -depth 1 >& > test2.log& > 5. set searcher.dir to test2 > > 6. query for "movie" > ~/nutch/search/bin/nutch org.apache.nutch.searcher.NutchBean movie > it returned 55 hits (a web research with tomcat returned the same > result) > 7. attempted to merge using the following command: > ../search/bin/nutch merge test3 test1 test2 >& merge-test3& > returned error: > Exception in thread "main" java.rmi.RemoteException: > java.io.IOException: Cannot > open filename /user/root/test1/crawldb/segments > at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120) > > 8. attempted to merge again using the following command: > ../search/bin/nutch merge test4 test1/indexes test2/indexes >& > merge-test4& > merged successfully with no errors > > 9. set searcher.dir to test4 > > 10. query for "movie" by: > ~/nutch/search/bin/nutch org.apache.nutch.searcher.NutchBean movie > and it returned 0 hit (a web research with tomcat returned the same > result) > 060403 201545 10 opening segments in test4/segments > 060403 201545 10 found resource common-terms.utf8 at > file:/root/nutch/search/conf/common-terms.utf8 > 060403 201545 10 opening linkdb in test4/linkdb > Total hits: 0 > > It appeared to be looking for test4/segments and test4/linkdb which > did not exist? > Well, the short answer is that you cannot at the moment merge crawldbs > or linkdbs. As a consequence, you cannot use multiple outputs of 'nutch > crawl' together (because NutchBean needs to reference a single linkdb > during searching). > This is technically possible, but simply not implemented (yet). > > -- > Best regards, > Andrzej Bialecki > > ----- Original Message ---- > From: Doğacan Güney <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, July 16, 2007 1:59:39 PM > Subject: Re: four nutch merge commands: mergedb, mergesegs, mergelinkdb, merge > > Hi > > On 7/16/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote: >> I've been reviewing the four different merge commands (as of nutch v0.9): >> >> $ nutch | grep merg >> mergedb merge crawldb-s, with optional filtering >> mergesegs merge several segments, with optional filtering and >> slicing >> mergelinkdb merge linkdb-s, with optional filtering >> merge merge several segment indexes >> >> Here are the javadocs: >> mergedb -- >> http://lucene.apache.org/nutch/apidocs/org/apache/nutch/crawl/CrawlDbMerger.html >> mergesegs -- >> http://lucene.apache.org/nutch/apidocs/org/apache/nutch/segment/SegmentMerger.html >> mergelinkdb -- >> http://lucene.apache.org/nutch/apidocs/org/apache/nutch/crawl/LinkDbMerger.html >> merge -- >> http://lucene.apache.org/nutch/apidocs/org/apache/nutch/indexer/IndexMerger.html >> >> Naively: why are there four merge commands? Are some subsets of the others? >> Are they used in conjunction? What are the usage scenarios of each? > > Each is used in a different scenario > mergedb: as its name does not imply, it is used to merge crawldb. So > consider this mergecrawldb > > mergesegs: merges segments. It merges <segment>/{content,crawl_fetch, > crawl_generate, crawl_parse, parse_data, parse_text} information from > different segments. > > merge: Merges lucene indexes. After a index job, you end up with a > indexes directory with a bunch of part-<num> directories inside. > Command merge takes such a directory and produces a single index. A > single index has a better performance (I think). You can say that > merge is poorly named, it should have been called mergeindexes or > something. > > mergelinkdb: Should be obvious, merges linkdb-s. > > So none of them is a subset of another. They all have different > purposes. It is kind of confusing to have a "merge" command that only > merges indexes, so perhaps we can add a mergeindexes command, keep > merge for some time (noting that it has been deprecated) then remove > it. > >> I notice that Andrzej wrote the first three, and they have wiki entries >> (pretty much the same as the javadoc): >> (I found these from http://www.mail-archive.com/[EMAIL >> PROTECTED]/msg03588.html) >> http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_mergedb >> http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_mergelinkdb >> http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_mergesegs >> It seems most of the nutch-user discussions I've seen so far relate to the >> simple merge command. Are the first three "advanced commands"? >> > > > > > > > > > > ____________________________________________________________________________________ > Got a little couch potato? > Check out fun summer activities for kids. > http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
