[ 
https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125194#comment-13125194
 ] 

Joe Prasanna Kumar commented on MAHOUT-155:
-------------------------------------------

+*Problem:*+
Nominal attributes in ARFF format are not getting completely converted to 
vector format. When the nominal attribute is mapped to a value of 0, it is not 
getting reflected in the vector. For example consider the below bank.ARFF file 
from WEKA site
{color:blue} 
@relation bank

@attribute age numeric
@attribute sex {MALE,FEMALE}
@attribute region {INNER_CITY,RURAL,TOWN,SUBURBAN}
@attribute income numeric
@attribute married {YES,NO}
@attribute children {YES,NO}
@attribute car {YES,NO}
@attribute mortgage {YES,NO}
@attribute pep {YES,NO}

@data

40,MALE,TOWN,30085.1,YES,YES,YES,YES,NO 
{color}
The nominal mappings for the above arff file is 
{sex={FEMALE=1, MALE=0}, region={INNER_CITY=0, TOWN=2, RURAL=1, SUBURBAN=3} , 
children={YES=0, NO=1},  married={YES=0, NO=1},car={YES=0, NO=1}, 
mortgage={YES=0, NO=1}, pep={YES=0, NO=1}} 
When this arff gets converted to vector format, it outputs
{color:red}{0:40.0,2:2.0,3:30085.1,8:1.0}{color}
Because the attribute married assigns 0 to YES and 1 to NO, this attribute 
(attribute # 5) doesn't show up in the vector

+*Issue:*+
When I try to convert a nominal attribute in ARFF format to vector format, 
ARFFIterator by default creates a Dense vector. Since the nominal attribute 
(here itz married) has value 0, the dense vector ignores this attribute.

+*Solution:*+
1. In ARFFVectorIterable, when we add the nominal attributes to the ARFFModel, 
we'll start the class values from 1 instead of 0. This will fix the issue. 
So in the above bank.ARFF, the nominal mappings would be 
{sex={FEMALE=2, MALE=1}, region={INNER_CITY=1, TOWN=3, RURAL=2, SUBURBAN=4}, 
children={YES=1, NO=2}, married={YES=1, NO=2},car={YES=1, NO=2}, 
mortgage={YES=1, NO=2}, pep={YES=1, NO=2} } 
and the output of the vector is {color:green}
{0:40.0,1:1.0,2:3.0,3:30085.1,4:1.0,5:1.0,6:1.0,7:1.0,8:2.0}
{color}
If this issue and solution looks right, I can upload a patch with the fix.
Please let me know your thoughts.
Joe.
                
> ARFF VectorIterable
> -------------------
>
>                 Key: MAHOUT-155
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-155
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> Convert ARFF to Vector.  See http://www.cs.waikato.ac.nz/~ml/weka/arff.html
> Create a VectorIterable implementation for ARFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to