Right, Mahout is currently on Lucene 2.9. We should upgrade. On Jan 15, 2010, at 1:01 AM, Shashikant Kore wrote:
> The first problem seems to be index version incompatibility. > > Since you created index with Lucene 3.0, you will need the same > version to read the index. It seem while creating the vectors, the > version of Lucene is lower than that. Can you check if you are using > the same lucene jar while creating vector? > > Not sure what the second problem is. > > --shashi > > On Fri, Jan 15, 2010 at 11:11 AM, Rob Ennals <[email protected]> wrote: >> Hi Guys, >> >> I'm totally new to Mahout so I'm running into what I expect are newbie >> issues. >> >> To get started with clustering, I tried importing some indexes from Lucene. >> >> Following the Lucene tutorial, I created a really simple index of the >> Lucene source code: >> http://lucene.apache.org/java/3_0_0/demo.html >> >> I then tried to convert this to a Mahout Vector, following as per >> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html >> >> This gives me a CorruptIndexException: >> >> r...@rob:~/svn/mahout$ java >> org.apache.mahout.utils.vectors.lucene.Driver --dir >> /home/rob/Reference/Installers/lucene-3.0.0/index --output >> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field >> contents >> Exception in thread "main" >> org.apache.lucene.index.CorruptIndexException: Incompatible format >> version: 2 expected 1 or lower >> at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:117) >> at >> org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) >> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) >> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) >> at >> org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:104) >> at >> org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) >> at >> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74) >> at >> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704) >> at >> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) >> at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) >> at org.apache.lucene.index.IndexReader.open(IndexReader.java:314) >> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:140) >> >> >> I also tried running the driver on the actual Lucene index that I want >> to apply it to, and this time to a NullPointerException: >> >> r...@rob:~/svn/mahout$ java >> org.apache.mahout.utils.vectors.lucene.Driver --dir >> /home/rob/git/thinklink/scala/bin/index/ --output >> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field >> contents >> Jan 14, 2010 9:40:40 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: Output File: /home/rob/test/output >> Exception in thread "main" java.lang.NullPointerException >> at >> org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) >> at >> org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910) >> at >> org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1074) >> at >> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397) >> at >> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284) >> at >> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265) >> at >> org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226) >> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197) >> >> >> In both cases, the indexes should have the "contents" field. >> >> >> I assume I'm doing something stupid here. If someone can tell me what >> that is, then that would be great. >> >> >> Thanks >> >> -Rob >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
