[ https://issues.apache.org/jira/browse/MAHOUT-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149106#comment-13149106 ]
Yonik Seeley commented on MAHOUT-881: ------------------------------------- Grant's patch seems to half the number of PQ operations while collecting the top N. I don't know enough about possible Mahout usage patterns, or the relative cost to this against everything else to know if it matters, but it does for Lucene. One lucene example is sorting by date descending when they have been added in roughly ascending order - the number of PQ operations is very high, so cutting it down by half is a nice win. As far as the sort, the patch essentially implements a heap sort, taking advantage of the fact that we have already done the first half of that (building the heap). Not sure how to compare that to other sort algorithms... Memory usage also seems improved. This existing code {code} List<RecommendedItem> result = Lists.newArrayListWithCapacity(size); result.addAll(topItems); Collections.sort(result, ByValueRecommendedItemComparator.getInstance()); {code} - Object[size] allocated for the ArrayList - result.addAll calls toArray which creates another Object[size] and then copies it to the first - Collections.sort creates another Object[size] while doing the mergesort, then iterates the original collection, setting every item I think Grant's might be improved by using Arrays.asList which would avoid an extra Object[size] allocation + copy. So the patch also improves on the memory use - but again, I don't know enough about Mahout to know if it matters in this context. > Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting > ---------------------------------------------------------------------------- > > Key: MAHOUT-881 > URL: https://issues.apache.org/jira/browse/MAHOUT-881 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.6 > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: MAHOUT-881.patch > > > TopItems.getTop*() all do a fair number of excessive operations that can be > replaced by switching to using Lucene's PriorityQueue implementation, which > is more efficient and faster than Java's built in PQ implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira