As Doğacan stated we would need to see the error in the log files.  But 
if you have crawled smaller quantities and are scaling up and only now 
is it failing, it may be an OutOfMemoryException, in which case you can 
change the mapred.child.java.opts in the hadoop-site.xml file to a 
higher value, say -Xmx512M (we have ours set for -Xmx1024M).

Dennis Kubes

Jason Ma wrote:
> I'm running Nutch on RedHat Linux with Java 1.6.0_01.  I have
> successfully crawled and indexed smaller quantities of data in the
> past.  However, after I tried to scale up the crawling, Nutch would
> give an exception when indexing (the bottom of the log included
> below).  Please let me know if there's more information I should
> provide.
> 
> I'd be very grateful for any suggestions or advice you may have.
> 
> Thanks in advance,
> Jason Ma
> 
> ....
> 
> Indexing [http://64.13.133.31/pics/up-VC2GQ0CA9QSHHGHM-s] with
> analyzer [EMAIL PROTECTED]
> (null)
> Indexing [http://64.13.133.31/pics/up-VET16648L9TBU53B-s] with
> analyzer [EMAIL PROTECTED]
> (null)
> Indexing [http://64.13.133.31/pics/up-VHIUOB6N8CVESR52-s] with
> analyzer [EMAIL PROTECTED]
> (null)
> Indexing [http://64.13.133.31/pics/user_promo_mini.png] with analyzer
> [EMAIL PROTECTED] (null)
> Optimizing index.
> merging segments _73 (1 docs) _74 (1 docs) _75 (1 docs) _76 (1 docs)
> _77 (1 docs) _78 (1 docs) _79 (1 docs) _7a (1 docs) _7b (1 docs) _7c
> (1 docs) _7d (1 docs) _7e (1 docs) _7f (1 docs) _7g (1 docs) _7h (1
> docs) _7i (1 docs) _7j (1 docs) _7k (1 docs) _7l (1 docs) _7m (1 docs)
> _7n (1 docs) _7o (1 docs) _7p (1 docs) _7q (1 docs) into _7r (24 docs)
> merging segments _1e (50 docs) _2t (50 docs) _48 (50 docs) _5n (50
> docs) _72 (50 docs) _7r (24 docs) into _7s (274 docs)
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
>        at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to