Collocations: improve performance by making Gram BinaryComparable
-----------------------------------------------------------------

                 Key: MAHOUT-299
                 URL: https://issues.apache.org/jira/browse/MAHOUT-299
             Project: Mahout
          Issue Type: Improvement
          Components: Utils
    Affects Versions: 0.3
            Reporter: Drew Farris
            Priority: Minor
             Fix For: 0.3


Robin's profiling indicated that a large portion of a run was spent in 
readFields() in Gram due to the deserialization occuring as a part of Gram 
comparions for sorting. He pointed me to BinaryComparable and the 
implementation in Text.

Like Text, in this new implementation, Gram stores its string in binary form. 
When encoding the string at construction time we allocate an extra character's 
worth of data to hold the Gram type information. When sorting Grams, the binary 
arrays are compared instead of deserializing and comparing fields.

 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to