[ 
https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144856#comment-13144856
 ] 

Joe Prasanna Kumar commented on MAHOUT-155:
-------------------------------------------

After adding few more test data related to date format, I encountered some 
interesting issues. 

1. When the name of the attribute starts with any of the data types like say 
"dateOfFirstPurchase" then the Iterator was considering this as date type and 
tries to create a date out of "OfFirstPurchase". I've modified the 
ARFFVectorIterable and ARFFType to fix this.

2. If there was a commma in a date / String data, then it was considered as a 
data on its own. For eg, "0:08 PM, PDT" was treated as 2 strings "0:08 PM" as 
one and "PDT" as the second. In ARFFIterator, I've added modified COMMA_PATTERN 
to be ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" This does a split on the comma only if 
that comma has zero, or an even number of quotes in ahead of it. Credit for 
this regex pattern goes to an answer in stackoverflow.

I have modified the test case for few more date formats and they all seem to 
work now.
The patch has been updated in this task. After formatting the code using the 
template available in 
https://cwiki.apache.org/MAHOUT/how-to-contribute.data/Mahout-Eclipse-Codeformatter.xml
 , the diff seems to be quite a lot. 

Please test with this patch and if it all looks good maybe we can close this 
issue. 

Joe.







                
> ARFF VectorIterable
> -------------------
>
>                 Key: MAHOUT-155
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-155
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>              Labels: MAHOUT_INTRO_CONTRIBUTE
>         Attachments: MAHOUT-155-DateTestAndFix.patch, MAHOUT-155.patch
>
>
> Convert ARFF to Vector.  See http://www.cs.waikato.ac.nz/~ml/weka/arff.html
> Create a VectorIterable implementation for ARFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to