[jira] Commented: (MAHOUT-191) NPE while creating term vectors with an index on a field that does not exist in all the documents
[ https://issues.apache.org/jira/browse/MAHOUT-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786782#action_12786782 ] Shashikant Kore commented on MAHOUT-191: Yeah, this can be committed. > NPE while creating term vectors with an index on a field that does not exist > in all the documents > - > > Key: MAHOUT-191 > URL: https://issues.apache.org/jira/browse/MAHOUT-191 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.3 > Environment: mac, snow leopard, eclipse galileo, jdk 6 >Reporter: Sushil Bajracharya >Assignee: Sean Owen > Fix For: 0.3 > > Attachments: MAHOUT-191-patch.txt, MAHOUT-191.patch > > > (based on the message from here: > http://www.nabble.com/Creating-Vectors-from-Text-tt24298643.html#a26090263) > I checked out mahout from trunk and tried to create term frequency vector > from a lucene index and ran into this.. > 09/10/27 17:36:10 INFO lucene.Driver: Output File: > /Users/shoeseal/DATA/luc2tvec.out > 09/10/27 17:36:11 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 09/10/27 17:36:11 INFO compress.CodecPool: Got brand-new compressor > Exception in thread "main" java.lang.NullPointerException > at > org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:109) > at > org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:1) > at > org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:40) > at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200) > I am running this from Eclipse (snow leopard with JDK 6), on an index that > has field with stored term vectors.. > my input parameters for Driver are: > --dir /smallidx/ --output /luc2tvec.out --idField id_field > --field field_with_TV --dictOut /luc2tvec.dict --max 50 --weight tf > Luke shows the following info on the fields I am using: > id_field is indexed, stored, omit norms > field_with_TV is indexed, tokenized, stored, term vector -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-191) NPE while creating term vectors with an index on a field that does not exist in all the documents
[ https://issues.apache.org/jira/browse/MAHOUT-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771408#action_12771408 ] Shashikant Kore commented on MAHOUT-191: I noticed a different problem for empty field. TFDFMapper returned vector for previous non-empty document. This happens because setExpectations() is not getting called on empty documents. I have attached a patch to addres the same. Couldn't find a better way. If you know, please update accordingly. > NPE while creating term vectors with an index on a field that does not exist > in all the documents > - > > Key: MAHOUT-191 > URL: https://issues.apache.org/jira/browse/MAHOUT-191 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.3 > Environment: mac, snow leopard, eclipse galileo, jdk 6 >Reporter: Sushil Bajracharya > Attachments: MAHOUT-191-patch.txt, MAHOUT-191.patch > > > (based on the message from here: > http://www.nabble.com/Creating-Vectors-from-Text-tt24298643.html#a26090263) > I checked out mahout from trunk and tried to create term frequency vector > from a lucene index and ran into this.. > 09/10/27 17:36:10 INFO lucene.Driver: Output File: > /Users/shoeseal/DATA/luc2tvec.out > 09/10/27 17:36:11 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 09/10/27 17:36:11 INFO compress.CodecPool: Got brand-new compressor > Exception in thread "main" java.lang.NullPointerException > at > org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:109) > at > org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:1) > at > org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:40) > at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200) > I am running this from Eclipse (snow leopard with JDK 6), on an index that > has field with stored term vectors.. > my input parameters for Driver are: > --dir /smallidx/ --output /luc2tvec.out --idField id_field > --field field_with_TV --dictOut /luc2tvec.dict --max 50 --weight tf > Luke shows the following info on the fields I am using: > id_field is indexed, stored, omit norms > field_with_TV is indexed, tokenized, stored, term vector -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.