[ https://issues.apache.org/jira/browse/MAHOUT-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149391#comment-13149391 ]
Ted Dunning commented on MAHOUT-881: ------------------------------------ {quote} avoid what allocating the arrays {quote} There is a somewhat pathological interest in the Lucene community about avoiding allocation. The standard approach to doing this is to re-use data structures. In fact, this often has a perverse effect on the GC that makes programs slower overall and definitely makes the code far more error prone. The problem with performance is rarely allocation and is far more commonly the cost of copying. If the idiom that you are using to avoid allocation still involves as much copying, then you are unlikely to save anything at all by avoiding an allocation and it may cost you quite a bit since you are making an array live longer than its natural life which can, in the worst situations, even trigger a full GC if the array survives too long. For most uses of arrays such as score accumulators, the copying is inherent in the algorithm being used and is not something to be avoided because having the array be collected as a short-lived object is usually the most efficient way to go. Mutation and re-use also introduces complexities of storage management that are roughly equivalent to the cognitive load of malloc/free which, particularly if not associated with any level of optimization should be avoided like the plague. One common idiom that used to cause performance issues had to do with gratuitous boxing and unboxing of data in order to package it for passing between different parts of code. This is much less of a problem than it used to be because lots of these uses are inlined and the structure creation is optimized away. You still have to watch for it with collections because of the memory pressure that it creates. > Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting > ---------------------------------------------------------------------------- > > Key: MAHOUT-881 > URL: https://issues.apache.org/jira/browse/MAHOUT-881 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.6 > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: Call_Tree.html, Call_Tree_2.html, MAHOUT-881.patch, > MAHOUT-881.patch, MAHOUT-881.patch > > > TopItems.getTop*() all do a fair number of excessive operations that can be > replaced by switching to using Lucene's PriorityQueue implementation, which > is more efficient and faster than Java's built in PQ implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira