[ 
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836162#action_12836162
 ] 

Sean Owen commented on MAHOUT-299:
----------------------------------

Broadly it looks fine to me, especially as proven by your unit tests. You're 
welcome to commit with your new karma as far as I'm concerned, or ping me to do 
so. Just want to get it in shortly as it's marked for 0.3

Minor style questions:

I don't like static imports but not sure if that is common practice or my own. 
(I always want to hunt for the symbol NGRAM in the class and of course it's not 
there... gets even weirder for methods). So up to you on that minor point. It 
would reduce the number of imports, actually, to not use it.

I'd not throw RuntimeException -- IllegalStateException is my favorite for a 
"this can't happen" situation. Error subclasses seem too harsh, as they mean 
the JVM is busted. While nobody's going to meaningfully handle such an 
exception, it seems to me good style to afford the possibility of 
differentiating between catching this and all other runtime exceptions.



> Collocations: improve performance by making Gram BinaryComparable
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-299
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-299
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Priority: Minor
>             Fix For: 0.3
>
>         Attachments: MAHOUT-299.patch
>
>
> Robin's profiling indicated that a large portion of a run was spent in 
> readFields() in Gram due to the deserialization occuring as a part of Gram 
> comparions for sorting. He pointed me to BinaryComparable and the 
> implementation in Text.
> Like Text, in this new implementation, Gram stores its string in binary form. 
> When encoding the string at construction time we allocate an extra 
> character's worth of data to hold the Gram type information. When sorting 
> Grams, the binary arrays are compared instead of deserializing and comparing 
> fields.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to