I'm running Nutch on RedHat Linux with Java 1.6.0_01. I have successfully crawled and indexed smaller quantities of data in the past. However, after I tried to scale up the crawling, Nutch would give an exception when indexing (the bottom of the log included below). Please let me know if there's more information I should provide.
I'd be very grateful for any suggestions or advice you may have. Thanks in advance, Jason Ma .... Indexing [http://64.13.133.31/pics/up-VC2GQ0CA9QSHHGHM-s] with analyzer [EMAIL PROTECTED] (null) Indexing [http://64.13.133.31/pics/up-VET16648L9TBU53B-s] with analyzer [EMAIL PROTECTED] (null) Indexing [http://64.13.133.31/pics/up-VHIUOB6N8CVESR52-s] with analyzer [EMAIL PROTECTED] (null) Indexing [http://64.13.133.31/pics/user_promo_mini.png] with analyzer [EMAIL PROTECTED] (null) Optimizing index. merging segments _73 (1 docs) _74 (1 docs) _75 (1 docs) _76 (1 docs) _77 (1 docs) _78 (1 docs) _79 (1 docs) _7a (1 docs) _7b (1 docs) _7c (1 docs) _7d (1 docs) _7e (1 docs) _7f (1 docs) _7g (1 docs) _7h (1 docs) _7i (1 docs) _7j (1 docs) _7k (1 docs) _7l (1 docs) _7m (1 docs) _7n (1 docs) _7o (1 docs) _7p (1 docs) _7q (1 docs) into _7r (24 docs) merging segments _1e (50 docs) _2t (50 docs) _48 (50 docs) _5n (50 docs) _72 (50 docs) _7r (24 docs) into _7s (274 docs) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) at org.apache.nutch.indexer.Indexer.index(Indexer.java:296) at org.apache.nutch.indexer.Indexer.main(Indexer.java:313) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
