[ https://issues.apache.org/jira/browse/MAHOUT-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stuart Smith updated MAHOUT-952: -------------------------------- Description: Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can tell, it's the class itself) doesn't handle '?' as a marker for unknown value. See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html I just started looking at Mahout classifiers this week, so I'm not sure how to handle this yet. If I figure it out, I'll post a patch, but until then, guidance would be helpful! was: Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can tell, it's the class itself) doesn't handle '?' as a marker for unknown value. See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html I just started looking at Mahout classifiers this week, so I'm not sure how to handle this yet. If I figure it out, I'll post a patch, but until then, guidance would be helpful! Off topic, but I'm also having some issue were the labels populated in the map apparently aren't coming from the Attribute Header at the top of the file. I have very sparse vectors (1800+ attributes, only a few hundred set for any given before).. and I keep getting IndexOutOfBounds or mismatched cardinality issues, depending on whether I use full ARFF or sparse ARFF. Either way, when I dump the Labels from getModel(), it doesn't have them all.. even if I parse the ARFF myself, and call setLabel() (Apparently just throws that away). Looks like the DenseVectors keep thinking the cardinality is 534, when it should be 1800+.... when I know more, I'll create a new issue > ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other > ARFF issues > ----------------------------------------------------------------------------------------- > > Key: MAHOUT-952 > URL: https://issues.apache.org/jira/browse/MAHOUT-952 > Project: Mahout > Issue Type: Bug > Components: Integration > Affects Versions: 0.6 > Environment: Latest SVN on ubuntu > Reporter: Stuart Smith > Priority: Minor > Labels: ARFF > > Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can > tell, it's the class itself) doesn't handle '?' as a marker for unknown > value. See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html > I just started looking at Mahout classifiers this week, so I'm not sure how > to handle this yet. If I figure it out, I'll post a patch, but until then, > guidance would be helpful! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira