[ https://issues.apache.org/jira/browse/MAHOUT-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-320: ----------------------------- Attachment: IntPairWritable.patch I see. It's to make positive numbers negative and vice versa, in order to use WritableComparable's compare() function on bytes, which assumes values are essentially unsigned. Surely this will end in tears to store values this way. In particular it's already broken Frequency in the same class, which reads the values as unsigned ints directly. Here's my complete patch for all said items. > Modify IntPairWritable in LDA implementation to be binary comparable to > improve performance. > -------------------------------------------------------------------------------------------- > > Key: MAHOUT-320 > URL: https://issues.apache.org/jira/browse/MAHOUT-320 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.3 > Reporter: Drew Farris > Assignee: Robin Anil > Priority: Minor > Attachments: IntPairWritable.patch, MAHOUT-320.patch, > MAHOUT-320.patch, MAHOUT-320.patch, MAHOUT-320.patch, MAHOUT-320.patch > > > Per discussion with Robin, modifying o.a.m.clustering.lda.IntPairWritable to > be binary comparable will improve the performance of the comparison > operations during a sort because no marshaling will need to occur to compare > IntPairWritable instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.