On 6/17/07, Sean Dean <[EMAIL PROTECTED]> wrote: > After the change to Indexer.java here is the more verbose log of the error > that seems to be happening during the indexing phase; > > 2007-06-15 21:53:06,098 INFO indexer.Indexer - url=http://acadisc.com/, > parseDa > ta=Version: 5 > Status: failed(2,200): sun.io.MalformedInputException: Missing byte-order mark > Title: > Outlinks: 0 > Content Metadata: > Parse Metadata: > 2007-06-15 21:53:06,101 WARN mapred.LocalJobRunner - job_73pqhd > java.lang.NullPointerException: value cannot be null > at org.apache.lucene.document.Field.<init>(Field.java:188) > at org.apache.lucene.document.Field.<init>(Field.java:164) > at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:1 > 55) > 2007-06-15 21:53:06,845 FATAL indexer.Indexer - Indexer: java.io.IOException: > Jo > b failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:280) > at org.apache.nutch.indexer.Indexer.run(Indexer.java:302) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.indexer.Indexer.main(Indexer.java:285) > >
Can you also do a readseg -get <segment> "http://acadisc.com/" -nogenerate -noparsetext -noparsedata and send the result? -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
