[ https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144856#comment-13144856 ]
Joe Prasanna Kumar commented on MAHOUT-155: ------------------------------------------- After adding few more test data related to date format, I encountered some interesting issues. 1. When the name of the attribute starts with any of the data types like say "dateOfFirstPurchase" then the Iterator was considering this as date type and tries to create a date out of "OfFirstPurchase". I've modified the ARFFVectorIterable and ARFFType to fix this. 2. If there was a commma in a date / String data, then it was considered as a data on its own. For eg, "0:08 PM, PDT" was treated as 2 strings "0:08 PM" as one and "PDT" as the second. In ARFFIterator, I've added modified COMMA_PATTERN to be ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" This does a split on the comma only if that comma has zero, or an even number of quotes in ahead of it. Credit for this regex pattern goes to an answer in stackoverflow. I have modified the test case for few more date formats and they all seem to work now. The patch has been updated in this task. After formatting the code using the template available in https://cwiki.apache.org/MAHOUT/how-to-contribute.data/Mahout-Eclipse-Codeformatter.xml , the diff seems to be quite a lot. Please test with this patch and if it all looks good maybe we can close this issue. Joe. > ARFF VectorIterable > ------------------- > > Key: MAHOUT-155 > URL: https://issues.apache.org/jira/browse/MAHOUT-155 > Project: Mahout > Issue Type: New Feature > Components: Math > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Labels: MAHOUT_INTRO_CONTRIBUTE > Attachments: MAHOUT-155-DateTestAndFix.patch, MAHOUT-155.patch > > > Convert ARFF to Vector. See http://www.cs.waikato.ac.nz/~ml/weka/arff.html > Create a VectorIterable implementation for ARFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira