[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836207#action_12836207 ]
Drew Farris commented on MAHOUT-299: ------------------------------------ Thanks for the review Sean, I'll get it committed sometime today if I can steal some time to do so (sometime this weekend, worst case). Point taken about the static imports, I prefer the readability, but I probably rely on my IDE too much to track down references like that, so I'll remove them to conform with the overall style we're following. Same about RuntimeException, will revsie that as well. Once those changes are complete, I'll commit and close the issue -- all in all it will be a great way to test my Karma. > Collocations: improve performance by making Gram BinaryComparable > ----------------------------------------------------------------- > > Key: MAHOUT-299 > URL: https://issues.apache.org/jira/browse/MAHOUT-299 > Project: Mahout > Issue Type: Improvement > Components: Utils > Affects Versions: 0.3 > Reporter: Drew Farris > Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-299.patch > > > Robin's profiling indicated that a large portion of a run was spent in > readFields() in Gram due to the deserialization occuring as a part of Gram > comparions for sorting. He pointed me to BinaryComparable and the > implementation in Text. > Like Text, in this new implementation, Gram stores its string in binary form. > When encoding the string at construction time we allocate an extra > character's worth of data to hold the Gram type information. When sorting > Grams, the binary arrays are compared instead of deserializing and comparing > fields. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.