[ 
https://issues.apache.org/jira/browse/MAHOUT-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stuart Smith updated MAHOUT-952:
--------------------------------

    Description: 
Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can 
tell, it's the class itself) doesn't handle '?' as a marker for unknown value. 
See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html  

I just started looking at Mahout classifiers this week, so I'm not sure how to 
handle this yet. If I figure it out, I'll post a patch, but until then, 
guidance would be helpful!

  was:
Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can 
tell, it's the class itself) doesn't handle '?' as a marker for unknown value. 
See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html  

I just started looking at Mahout classifiers this week, so I'm not sure how to 
handle this yet. If I figure it out, I'll post a patch, but until then, 
guidance would be helpful!

Off topic, but I'm also having some issue were the labels populated in the map 
apparently aren't coming from the Attribute Header at the top of the file. I 
have very sparse vectors (1800+ attributes, only a few hundred set for any 
given before).. and I keep getting IndexOutOfBounds or mismatched cardinality 
issues, depending on whether I use full ARFF or sparse ARFF. Either way, when I 
dump the Labels from getModel(), it doesn't have them all.. even if I parse the 
ARFF myself, and call setLabel() (Apparently just throws that away). Looks like 
the DenseVectors keep thinking the cardinality is 534, when it should be 
1800+.... when I know more, I'll create a new issue

    
> ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other 
> ARFF issues
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-952
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-952
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.6
>         Environment: Latest SVN on ubuntu
>            Reporter: Stuart Smith
>            Priority: Minor
>              Labels: ARFF
>
> Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can 
> tell, it's the class itself) doesn't handle '?' as a marker for unknown 
> value. See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html  
> I just started looking at Mahout classifiers this week, so I'm not sure how 
> to handle this yet. If I figure it out, I'll post a patch, but until then, 
> guidance would be helpful!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to