Make splitData smart enough to not consider a CSV header to be part of the data
-------------------------------------------------------------------------------
Key: MAHOUT-997
URL: https://issues.apache.org/jira/browse/MAHOUT-997
Project: Mahout
Issue Type: Improvement
Components: Integration
Affects Versions: 0.6
Environment: OS X
Reporter: Andrew Harbick
Priority: Minor
Fix For: 0.6
If you do something like:
MAHOUT_LOCAL=1 $MAHOUT_HOME/bin/mahout splitDataset --input all.csv --output
split --trainingPercentage 0.9 --probePercentage 0.1
The header row from your CSV will end up with 90% chance in your training data
and 10% chance in your evaluation data. To use a tool like trainlogistic or
runlogistic the header file is needed in both.
Perhaps add an argument to splitData to duplicate the header line?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira