[ 
https://issues.apache.org/jira/browse/MAHOUT-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149270#comment-13149270
 ] 

Sean Owen commented on MAHOUT-881:
----------------------------------

Since it's easy, I just used jprofiler to observe the exact difference. See 
attached excerpt from the call graph, before and after. Out of about 23 minutes 
of CPU time spent in the getTopItems() method, the cost of queue operations did 
in fact drop, from 90 microseconds to 47 microseconds. That's 0.0065% of 
runtime before, or about 1 part in 15,000, so I don't think you are observing 
any actual difference in runtime.

I'm not against this; I suppose that if it's not adding what we thought and is 
introducing very slightly more complexity and change, I'd be very slightly 
predisposed to not make such a change. Or what about addressing some of those 
allocations directly that Yonik mentioned, if anything? Those are 1-liners.

In any event, I would rather not also change AbstractAverageDifferenceEvaluator 
-- was that just a temp change? Or I could work in reporting those figures 
differently for you. To answer your TODO: returning an array would change the 
API, and cause some other difficult breakage (IIRC). It's a List on purpose. I 
think you'll find however that Arrays.asList() is the better method call there, 
probably avoids overhead.
                
> Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-881
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-881
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.6
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: Call_Tree.html, Call_Tree_2.html, MAHOUT-881.patch, 
> MAHOUT-881.patch, MAHOUT-881.patch
>
>
> TopItems.getTop*() all do a fair number of excessive operations that can be 
> replaced by switching to using Lucene's PriorityQueue implementation, which 
> is more efficient and faster than Java's built in PQ implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to