[ 
https://issues.apache.org/jira/browse/MAHOUT-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149391#comment-13149391
 ] 

Ted Dunning commented on MAHOUT-881:
------------------------------------

{quote}
avoid what allocating the arrays
{quote}

There is a somewhat pathological interest in the Lucene community about 
avoiding allocation.  The standard approach to doing this is to re-use data 
structures.

In fact, this often has a perverse effect on the GC that makes programs slower 
overall and definitely makes the code far more error prone.

The problem with performance is rarely allocation and is far more commonly the 
cost of copying.  If the idiom that you are using to avoid allocation still 
involves as much copying, then you are unlikely to save anything at all by 
avoiding an allocation and it may cost you quite a bit since you are making an 
array live longer than its natural life which can, in the worst situations, 
even trigger a full GC if the array survives too long.

For most uses of arrays such as score accumulators, the copying is inherent in 
the algorithm being used and is not something to be avoided because having the 
array be collected as a short-lived object is usually the most efficient way to 
go.

Mutation and re-use also introduces complexities of storage management that are 
roughly equivalent to the cognitive load of malloc/free which, particularly if 
not associated with any level of optimization should be avoided like the plague.

One common idiom that used to cause performance issues had to do with 
gratuitous boxing and unboxing of data in order to package it for passing 
between different parts of code.  This is much less of a problem than it used 
to be because lots of these uses are inlined and the structure creation is 
optimized away.  You still have to watch for it with collections because of the 
memory pressure that it creates.


                
> Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-881
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-881
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.6
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: Call_Tree.html, Call_Tree_2.html, MAHOUT-881.patch, 
> MAHOUT-881.patch, MAHOUT-881.patch
>
>
> TopItems.getTop*() all do a fair number of excessive operations that can be 
> replaced by switching to using Lucene's PriorityQueue implementation, which 
> is more efficient and faster than Java's built in PQ implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to