On 6/17/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> After the change to Indexer.java here is the more verbose log of the error 
> that seems to be happening during the indexing phase;
>
> 2007-06-15 21:53:06,098 INFO  indexer.Indexer - url=http://acadisc.com/, 
> parseDa
> ta=Version: 5
> Status: failed(2,200): sun.io.MalformedInputException: Missing byte-order mark
> Title:
> Outlinks: 0
> Content Metadata:
> Parse Metadata:
> 2007-06-15 21:53:06,101 WARN  mapred.LocalJobRunner - job_73pqhd
> java.lang.NullPointerException: value cannot be null
>         at org.apache.lucene.document.Field.<init>(Field.java:188)
>         at org.apache.lucene.document.Field.<init>(Field.java:164)
>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:1
> 55)
> 2007-06-15 21:53:06,845 FATAL indexer.Indexer - Indexer: java.io.IOException: 
> Jo
> b failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:280)
>         at org.apache.nutch.indexer.Indexer.run(Indexer.java:302)
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:285)
>
>

Can you also do a

readseg -get <segment> "http://acadisc.com/"; -nogenerate -noparsetext
-noparsedata

and send the result?

-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to