[jira] [Updated] (MAHOUT-953) ArffVectorIterable does not gracefully handle duplicate attribute name

2013-06-25 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-953:
-

Status: Open  (was: Patch Available)

 ArffVectorIterable does not gracefully handle duplicate attribute name
 --

 Key: MAHOUT-953
 URL: https://issues.apache.org/jira/browse/MAHOUT-953
 Project: Mahout
  Issue Type: Improvement
  Components: Integration
Affects Versions: 0.6
Reporter: Stuart Smith
Priority: Trivial
 Fix For: Backlog


 If you have duplicate attribute names in your ARFF file, and you have 
 non-sparse arff vectors, ARFFVectorIterable.computeNext will throw a 
 ArrayIndexOutOfBoundsExceptions, as it allocates a DenseVector with the size 
 of your attribute labels (duplicates removed), but your arff vectors could 
 have more values (if they reference the attribute at both indexes). This is a 
 somewhat pathological ARFF file.
 Not sure if I should note the error (throw an exception) in computeNext() 
 when it's out of bounds, or when someone tries to add duplicate label to the 
 MapBackedArffModel.
 My first impulse would be to check in computeNext(), but addLabel() in 
 MapBackedArffModel will do something rather pathological in the case of 
 duplicate attributes: it overwrites the Label map with the new index, but the 
 idxLabel map will hold a mapping from both indexes to the attribute name, so 
 it's out of sync.. so it may be best to disallow duplicate attribute names 
 IllegalArgumentException altogether.
 For example
 @attribute my_attribute NUMERIC
 @attribute my_attribute NUMERIC
 addLabel()
 addLabel()
 labelBindings - ('my_attribute', 1)
 idxLabel - (0, 'my_attribute), (1, 'my_attribute')
 I'll happily submit a patch, just wondering if it should be in computeNext() 
 or addLabel()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-953) ArffVectorIterable does not gracefully handle duplicate attribute name

2013-06-02 Thread Robin Anil (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil updated MAHOUT-953:
--

Fix Version/s: (was: 0.8)
   Backlog

Bring it back to 0.8 queue if anyone is willing to do the work within the next 
week.

 ArffVectorIterable does not gracefully handle duplicate attribute name
 --

 Key: MAHOUT-953
 URL: https://issues.apache.org/jira/browse/MAHOUT-953
 Project: Mahout
  Issue Type: Improvement
  Components: Integration
Affects Versions: 0.6
Reporter: Stuart Smith
Priority: Trivial
 Fix For: Backlog


 If you have duplicate attribute names in your ARFF file, and you have 
 non-sparse arff vectors, ARFFVectorIterable.computeNext will throw a 
 ArrayIndexOutOfBoundsExceptions, as it allocates a DenseVector with the size 
 of your attribute labels (duplicates removed), but your arff vectors could 
 have more values (if they reference the attribute at both indexes). This is a 
 somewhat pathological ARFF file.
 Not sure if I should note the error (throw an exception) in computeNext() 
 when it's out of bounds, or when someone tries to add duplicate label to the 
 MapBackedArffModel.
 My first impulse would be to check in computeNext(), but addLabel() in 
 MapBackedArffModel will do something rather pathological in the case of 
 duplicate attributes: it overwrites the Label map with the new index, but the 
 idxLabel map will hold a mapping from both indexes to the attribute name, so 
 it's out of sync.. so it may be best to disallow duplicate attribute names 
 IllegalArgumentException altogether.
 For example
 @attribute my_attribute NUMERIC
 @attribute my_attribute NUMERIC
 addLabel()
 addLabel()
 labelBindings - ('my_attribute', 1)
 idxLabel - (0, 'my_attribute), (1, 'my_attribute')
 I'll happily submit a patch, just wondering if it should be in computeNext() 
 or addLabel()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-953) ArffVectorIterable does not gracefully handle duplicate attribute name

2013-06-01 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-953:
---

Fix Version/s: 0.8

 ArffVectorIterable does not gracefully handle duplicate attribute name
 --

 Key: MAHOUT-953
 URL: https://issues.apache.org/jira/browse/MAHOUT-953
 Project: Mahout
  Issue Type: Improvement
  Components: Integration
Affects Versions: 0.6
Reporter: Stuart Smith
Priority: Trivial
 Fix For: 0.8


 If you have duplicate attribute names in your ARFF file, and you have 
 non-sparse arff vectors, ARFFVectorIterable.computeNext will throw a 
 ArrayIndexOutOfBoundsExceptions, as it allocates a DenseVector with the size 
 of your attribute labels (duplicates removed), but your arff vectors could 
 have more values (if they reference the attribute at both indexes). This is a 
 somewhat pathological ARFF file.
 Not sure if I should note the error (throw an exception) in computeNext() 
 when it's out of bounds, or when someone tries to add duplicate label to the 
 MapBackedArffModel.
 My first impulse would be to check in computeNext(), but addLabel() in 
 MapBackedArffModel will do something rather pathological in the case of 
 duplicate attributes: it overwrites the Label map with the new index, but the 
 idxLabel map will hold a mapping from both indexes to the attribute name, so 
 it's out of sync.. so it may be best to disallow duplicate attribute names 
 IllegalArgumentException altogether.
 For example
 @attribute my_attribute NUMERIC
 @attribute my_attribute NUMERIC
 addLabel()
 addLabel()
 labelBindings - ('my_attribute', 1)
 idxLabel - (0, 'my_attribute), (1, 'my_attribute')
 I'll happily submit a patch, just wondering if it should be in computeNext() 
 or addLabel()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira