Any suggestions on debugging the generator? My log4j is already in DEBUG, but there are no DEBUG entries except for the final WARN that says
08/02/20 15:38:09 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ... 08/02/20 15:38:09 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch. 08/02/20 15:38:09 WARN crawl.Crawl: No URLs to fetch - check your seed list and URL filters. I've inserted code at Generator.java:424, which says: if (readers == null || readers.length == 0 || !readers[0].next(new FloatWritable())) { LOG.warn("Generator: 0 records selected for fetching, exiting ..."); essentially at the decision point to see which of the conditions triggered the 0 records selected message, and the "readers" object is perfectly fine, but the SequenceFileOutputFormat is reporting there are no values (I suppose of URL scores) at all to be retrieved, causing the generator to stop. On Wed, Feb 20, 2008 at 5:39 PM, John Mendenhall <[EMAIL PROTECTED]> wrote: > > $ /exp/sw/nutch-0.9/bin/nutch crawl urls -dir crawled-15 -depth 3 > > > (also tried this with '+*', '+.', didn't work either) > > I don't understand how +* would ever work since * is for > repeating the previous element. But, +. should work. > > Everything else looked okay to me. I would start looking > at the logs closely. I would try setting your log4j > properties to INFO or DEBUG level for the generator > step. > > The inject is obviously working since your stats shows > the urls in the crawldb as unfetched. So, debug the > generator. > > > > JohnM > > -- > john mendenhall > [EMAIL PROTECTED] > surf utopia > internet services >