[ https://issues.apache.org/jira/browse/MAHOUT-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149385#comment-13149385 ]
Ted Dunning commented on MAHOUT-881: ------------------------------------ {quote} from 90 microseconds to 47 microseconds {quote} This is consistent with my experience in other efforts. The priority queue is rarely the problem if you avoid inserting most elements and even if you do insert most elements due to pathological ordering of the original data, it isn't a big deal since the cost is n log k where n is the number of documents and k is the size of the queue. One big difference that we can probably make, however, is to multi-thread some of these sequential programs. This isn't very hard with the Executors in Java. This doesn't make things more efficient, but it does make them 10x faster on commonly available servers. That is an effort for a different JIRA in any case. > Refactor TopItems to use Lucene's PriorityQueue and remove excessive sorting > ---------------------------------------------------------------------------- > > Key: MAHOUT-881 > URL: https://issues.apache.org/jira/browse/MAHOUT-881 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.6 > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: Call_Tree.html, Call_Tree_2.html, MAHOUT-881.patch, > MAHOUT-881.patch, MAHOUT-881.patch > > > TopItems.getTop*() all do a fair number of excessive operations that can be > replaced by switching to using Lucene's PriorityQueue implementation, which > is more efficient and faster than Java's built in PQ implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira