On Jul 2, 2009, at 12:09 PM, Allan Roberto Avendano Sudario wrote:
Regards,
This is the entire exception message:
java -cp $JAVACLASSPATH org.apache.mahout.utils.vectors.Driver --dir
/home/hadoop/Desktop/<urls>/index --field content --dictOut
/home/hadoop/Desktop/dictionary/dict.txt --output
/home/hadoop/Desktop/dictionary/out.txt --max 50 --norm 2
09/07/02 09:35:47 INFO vectors.Driver: Output File:
/home/hadoop/Desktop/dictionary/out.txt
09/07/02 09:35:47 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
09/07/02 09:35:47 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
09/07/02 09:35:47 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.utils.vectors.lucene.LuceneIteratable
$TDIterator.next(LuceneIteratable.java:111)
at
org.apache.mahout.utils.vectors.lucene.LuceneIteratable
$TDIterator.next(LuceneIteratable.java:82)
at
org
.apache
.mahout
.utils
.vectors
.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:25)
at org.apache.mahout.utils.vectors.Driver.main(Driver.java:204)
Well, I used a nutch crawl index, is that correct? mmm... I have
change to
contenc field, but nothing happened.
Possibly the nutch crawl doesn´t have Term Vector indexed.
This would be my guess. A small edit to Nutch code would probably
allow it. Just find where it creates a new Field and add in the TV
stuff.