Hi Ian, Abidari,

  We were having a similar problem as well. The problem in our case was
happening when all the urls are from the same host. If the urls are from
different hosts, the generator was able to generate the list. Otherwise, the
generator creates an empty fetchlist.

  We got around this problem by injecting some dummy urls in the list that
were from a different host. Could you try doing the same thing and see if
the generator works? If it does, then we can check the generator code to see
why this is happening.

Regards,

-vishal.

-----Original Message-----
From: Ian Holsman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 23, 2007 11:11 AM
To: [EMAIL PROTECTED]
Subject: Re: Nutch 0.9 - Generator: 0 records selected for fetching, exiting



Abidari wrote:
> 
> Ian
>  
> Can you please help with this? I have upgraded to Nutch 0.9. I am able to

> run Nutch in a standalone mode, ie without hadoop. But with hadoop I get
> the  
> error "Generator: 0 records selected for fetching, exiting ...". 
> I have performed this step - bin/hadoop dfs -put urls urls.  And upon  
> running bin/hadoop dfs -ls, I see that urls is there in the dfs
>  
> Output of Crawl.
>  
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth =  3
> topN = 50
> Injector: starting
> Injector: crawlDb:  crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to  crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector:  done
> Generator: Selecting best-scoring urls due for fetch.
> Generator:  starting
> Generator: segment: crawl/segments/20070419134155
> Generator:  filtering: false
> Generator: topN: 50
> Generator: 0 records selected for  fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs  to fetch - check your seed list and URL filters.
> crawl finished:  crawl
> 
> 


Hi Abidari,

I ran into this problem as well.

I'm not sure if it is related, but when I examine the stderr of the mapper
job I see:

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /opt/nutch/search/logs (Is a directory)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
        at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
        at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp
pender.java:215)
        at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132
)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
        at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav
a:654)
        at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav
a:612)
        at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur
ator.java:509)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
415)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
441)
        at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.
java:468)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
        at org.apache.log4j.Logger.getLogger(Logger.java:104)
        at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
        at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
        at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja
va:529)
        at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja
va:235)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
        at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:82)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1423)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].


which points to log4j being mis configured.

abidari, did you get any further with this? Andrei any hints??? 
-- 
View this message in context:
http://www.nabble.com/Nutch-0.9---Generator%3A-0-records-selected-for-fetchi
ng%2C-exiting-tf3609078.html#a10757841
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to