The first problem seems to be index version incompatibility.

Since you created index with Lucene 3.0, you will need the same
version to read the index. It seem while creating the vectors, the
version of Lucene is lower than that.  Can you check if you are using
the same lucene jar while creating vector?

Not sure what the second problem is.

--shashi

On Fri, Jan 15, 2010 at 11:11 AM, Rob Ennals <[email protected]> wrote:
> Hi Guys,
>
> I'm totally new to Mahout so I'm running into what I expect are newbie issues.
>
> To get started with clustering, I tried importing some indexes from Lucene.
>
> Following the Lucene tutorial, I created a really simple index of the
> Lucene source code:
> http://lucene.apache.org/java/3_0_0/demo.html
>
> I then tried to convert this to a Mahout Vector, following as per
> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
>
> This gives me a CorruptIndexException:
>
> r...@rob:~/svn/mahout$ java
> org.apache.mahout.utils.vectors.lucene.Driver --dir
> /home/rob/Reference/Installers/lucene-3.0.0/index --output
> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
> contents
> Exception in thread "main"
> org.apache.lucene.index.CorruptIndexException: Incompatible format
> version: 2 expected 1 or lower
>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:117)
>        at 
> org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
>        at 
> org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:104)
>        at 
> org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27)
>        at 
> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
>        at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
>        at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:140)
>
>
> I also tried running the driver on the actual Lucene index that I want
> to apply it to, and this time to a NullPointerException:
>
> r...@rob:~/svn/mahout$ java
> org.apache.mahout.utils.vectors.lucene.Driver --dir
> /home/rob/git/thinklink/scala/bin/index/ --output
> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
> contents
> Jan 14, 2010 9:40:40 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Output File: /home/rob/test/output
> Exception in thread "main" java.lang.NullPointerException
>        at 
> org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
>        at 
> org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1074)
>        at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
>        at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
>        at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)
>        at 
> org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226)
>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
>
>
> In both cases, the indexes should have the "contents" field.
>
>
> I assume I'm doing something stupid here. If someone can tell me what
> that is, then that would be great.
>
>
> Thanks
>
> -Rob
>

Reply via email to