[
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585428#comment-16585428
]
Varun Thacker commented on SOLR-12587:
--------------------------------------
When I took a deeper look at it today there are still a few subtle things that
wasn't obvious to me :
# The Solr PQ has a reset method which resets the size to maxSize and then
does a System.arraycopy . If we were to use the Lucene PQ we don't have a way
to reset size to maxSize . Secondly we would no longer do System.arraycopy and
instead reset the heap in the for loop which is probably slower and hence was
done like this in the first place? A 25M export on the "id" field used to take
7m15s now took 10.54s when i simulated this by not reusing the PQ and creating
a new PQ for every 30k docs collected in ExportWriter (which was earlier using
the reset )
{code:java}
protected void reset() {
Object[] heap = getHeapArray();
if(cache != null) {
System.arraycopy(cache, 1, heap, 1, heap.length-1);
size = maxSize;
} else {
populate();
}
}{code}
# We could perhaps do a "true" reset and even avoid doing a System.arraycopy ,
if we never nulled the object we popped and relied on size do do the right
thing. Then reset would simply change call SortDoc#reset and change back size
to maxSize. We would save a lot of objects generated
{code:java}
public final T pop() {
if (size > 0) {
T result = heap[1]; // save first value
heap[1] = heap[size]; // move last to first
heap[size] = null; // permit GC of objects //<---------- remove this
line
size--;
downHeap(); // adjust heap
return result;
} else {
return null;
}
}
// pseudo code for reset
protected void reset() {
Object[] heap = getHeapArray();
for (int i = 1; i < heap.length; i++) {
((SortDoc) heap[i]).reset();
}
size = maxSize;
}{code}
In approach 1 , we'd essentially be giving up on whatever optimizations
System.arraycopy does ( being a native call ) vs relying on a for loop.
In approach 2 , we'd basically be creating some sort of a reusable PQ
Thoughts ?
> Reuse Lucene's PriorityQueue for the ExportHandler
> --------------------------------------------------
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Varun Thacker
> Assignee: Varun Thacker
> Priority: Major
> Labels: export-writer
> Attachments: SOLR-12587.patch, SOLR-12587.patch
>
>
> We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} .
> The Export Handler also implements a PriorityQueue
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very
> similar with minor API differences.
>
> The aim here is to reuse Lucene's PQ and remove the Solr implementation.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]