subject:"\[jira\] Commented\: \(MAHOUT\-191\) NPE while creating term vectors with an index on a field that does not exist in all the documents"

[jira] Commented: (MAHOUT-191) NPE while creating term vectors with an index on a field that does not exist in all the documents

2009-12-06 Thread Shashikant Kore (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786782#action_12786782
 ] 

Shashikant Kore commented on MAHOUT-191:


Yeah, this can be committed.

> NPE while creating term vectors with an index on a field that does not exist 
> in all the documents
> -
>
> Key: MAHOUT-191
> URL: https://issues.apache.org/jira/browse/MAHOUT-191
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.3
> Environment: mac, snow leopard, eclipse galileo, jdk 6
>Reporter: Sushil Bajracharya
>Assignee: Sean Owen
> Fix For: 0.3
>
> Attachments: MAHOUT-191-patch.txt, MAHOUT-191.patch
>
>
> (based on the message from here: 
> http://www.nabble.com/Creating-Vectors-from-Text-tt24298643.html#a26090263)
> I checked out mahout from trunk and tried to create term frequency vector 
> from a lucene index and ran into this..
> 09/10/27 17:36:10 INFO lucene.Driver: Output File: 
> /Users/shoeseal/DATA/luc2tvec.out
> 09/10/27 17:36:11 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 09/10/27 17:36:11 INFO compress.CodecPool: Got brand-new compressor
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:109)
> at 
> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:1)
> at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:40)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200)
> I am running this from Eclipse (snow leopard with JDK 6), on an index that 
> has field with stored term vectors..
> my input parameters for Driver are:
> --dir /smallidx/ --output /luc2tvec.out --idField id_field
>  --field field_with_TV --dictOut /luc2tvec.dict --max 50  --weight tf
> Luke shows the following info on the fields I am using:
>  id_field is indexed, stored, omit norms
>  field_with_TV is indexed, tokenized, stored, term vector 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-191) NPE while creating term vectors with an index on a field that does not exist in all the documents

2009-10-29 Thread Shashikant Kore (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771408#action_12771408
 ] 

Shashikant Kore commented on MAHOUT-191:


I noticed a different problem for empty field. TFDFMapper returned vector for 
previous non-empty document.  This happens because  setExpectations() is not 
getting called on empty documents. 

I have attached a patch to addres the same.  Couldn't find a better way. If you 
know, please update accordingly. 

> NPE while creating term vectors with an index on a field that does not exist 
> in all the documents
> -
>
> Key: MAHOUT-191
> URL: https://issues.apache.org/jira/browse/MAHOUT-191
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.3
> Environment: mac, snow leopard, eclipse galileo, jdk 6
>Reporter: Sushil Bajracharya
> Attachments: MAHOUT-191-patch.txt, MAHOUT-191.patch
>
>
> (based on the message from here: 
> http://www.nabble.com/Creating-Vectors-from-Text-tt24298643.html#a26090263)
> I checked out mahout from trunk and tried to create term frequency vector 
> from a lucene index and ran into this..
> 09/10/27 17:36:10 INFO lucene.Driver: Output File: 
> /Users/shoeseal/DATA/luc2tvec.out
> 09/10/27 17:36:11 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 09/10/27 17:36:11 INFO compress.CodecPool: Got brand-new compressor
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:109)
> at 
> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:1)
> at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:40)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200)
> I am running this from Eclipse (snow leopard with JDK 6), on an index that 
> has field with stored term vectors..
> my input parameters for Driver are:
> --dir /smallidx/ --output /luc2tvec.out --idField id_field
>  --field field_with_TV --dictOut /luc2tvec.dict --max 50  --weight tf
> Luke shows the following info on the fields I am using:
>  id_field is indexed, stored, omit norms
>  field_with_TV is indexed, tokenized, stored, term vector 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-191) NPE while creating term vectors with an index on a field that does not exist in all the documents

[jira] Commented: (MAHOUT-191) NPE while creating term vectors with an index on a field that does not exist in all the documents

2 matches

Site Navigation

Mail list logo

Footer information