I wanted to do a test vertical crawl (db.ignore.external.links=true)
of several dozen sites using "nutch crawl urlDir -threads 10 -depth 6
-topN 32768 -dir /var/nutch/testindex"...
FWIW, I ran the crawl on an Athlon 1900 with 1.5GB RAM and the crawl
directory size is about 2,4GB. Maximum memory usage was about
1.6-1.7GB (went into swap).

This is what I found at the end of hadoop.log when the process finished:

2007-04-12 04:11:24,903 INFO  indexer.Indexer - Indexer: done
2007-04-12 04:11:25,138 INFO  indexer.DeleteDuplicates - Dedup: starting
2007-04-12 04:11:26,178 INFO  indexer.DeleteDuplicates - Dedup: adding indexes i
n: /var/nutch/testindex/indexes
2007-04-12 04:12:59,636 INFO  indexer.DeleteDuplicates - Dedup: done
2007-04-12 04:12:59,637 INFO  indexer.IndexMerger - merging indexes to: /var/nut
ch/testindex/index
2007-04-12 04:12:59,684 INFO  indexer.IndexMerger - Adding
/var/nutch/testindex/indexes/part-00000
2007-04-12 04:16:09,532 INFO  indexer.IndexMerger - done merging
2007-04-12 04:16:09,728 INFO  crawl.Crawl - crawl finished: /var/nutch/testindex


Looks to me like everything was in perfect order, but I got the
following error when querying the index throught the nutch web ui:
"HTTP Status 404 - /var/nutch/testindex/index/segments (No such file
or directory)"

This is what I saw in the /var/nutch/testindex/index directory:
$ ls
_0.fdt  _0.fnm  _0.nrm  _0.tii  segments_2
_0.fdx  _0.frq  _0.prx  _0.tis  segments.gen

Obviously, there is no segments file.
Any ideas why that is?

TIA,
t.n.a.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to