After the change to Indexer.java here is the more verbose log of the error that 
seems to be happening during the indexing phase;
 
2007-06-15 21:53:06,098 INFO  indexer.Indexer - url=http://acadisc.com/, parseDa
ta=Version: 5
Status: failed(2,200): sun.io.MalformedInputException: Missing byte-order mark
Title:
Outlinks: 0
Content Metadata:
Parse Metadata:
2007-06-15 21:53:06,101 WARN  mapred.LocalJobRunner - job_73pqhd
java.lang.NullPointerException: value cannot be null
        at org.apache.lucene.document.Field.<init>(Field.java:188)
        at org.apache.lucene.document.Field.<init>(Field.java:164)
        at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:1
55)
2007-06-15 21:53:06,845 FATAL indexer.Indexer - Indexer: java.io.IOException: Jo
b failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:280)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:302)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:285)


----- Original Message ----
From: Andrzej Bialecki <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, June 15, 2007 4:03:53 PM
Subject: Re: Indexing problems in nutch-nightly


Sean Dean wrote:
> When I try using that it actually wont compile, I tried moving the " in front 
> of parse but that didn't work either.
>  
> compile-core:
>     [javac] Compiling 1 source file to /usr/local/nutch/build/classes
>     [javac] 
> /usr/local/nutch/src/java/org/apache/nutch/indexer/Indexer.java:197: cannot 
> find symbol
>     [javac] symbol  : variable parse
>     [javac] location: class org.apache.nutch.indexer.Indexer
>     [javac]     LOG.info("url=" + key + ", parse=" + parse);
>     [javac]                                          ^
>     [javac] 1 error
> 
> Any suggestions?

Oops, sorry - this should be parseData instead of parse.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to