[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585428#comment-16585428 ] Varun Thacker commented on SOLR-12587: -- When I took a deeper look at it today there are still a few subtle things that wasn't obvious to me : # The Solr PQ has a reset method which resets the size to maxSize and then does a System.arraycopy . If we were to use the Lucene PQ we don't have a way to reset size to maxSize . Secondly we would no longer do System.arraycopy and instead reset the heap in the for loop which is probably slower and hence was done like this in the first place? A 25M export on the "id" field used to take 7m15s now took 10.54s when i simulated this by not reusing the PQ and creating a new PQ for every 30k docs collected in ExportWriter (which was earlier using the reset ) {code:java} protected void reset() { Object[] heap = getHeapArray(); if(cache != null) { System.arraycopy(cache, 1, heap, 1, heap.length-1); size = maxSize; } else { populate(); } }{code} # We could perhaps do a "true" reset and even avoid doing a System.arraycopy , if we never nulled the object we popped and relied on size do do the right thing. Then reset would simply change call SortDoc#reset and change back size to maxSize. We would save a lot of objects generated {code:java} public final T pop() { if (size > 0) { T result = heap[1]; // save first value heap[1] = heap[size]; // move last to first heap[size] = null;// permit GC of objects //<-- remove this line size--; downHeap(); // adjust heap return result; } else { return null; } } // pseudo code for reset protected void reset() { Object[] heap = getHeapArray(); for (int i = 1; i < heap.length; i++) { ((SortDoc) heap[i]).reset(); } size = maxSize; }{code} In approach 1 , we'd essentially be giving up on whatever optimizations System.arraycopy does ( being a native call ) vs relying on a for loop. In approach 2 , we'd basically be creating some sort of a reusable PQ Thoughts ? > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch, SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558743#comment-16558743 ] Varun Thacker commented on SOLR-12587: -- Updated patch which will apply cleanly once LUCENE-8428 has been committed > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch, SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556970#comment-16556970 ] Varun Thacker commented on SOLR-12587: -- Thanks Adrien! I worked on a prototype patch and i'll post in the on the lucene Jira tomorrow > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556968#comment-16556968 ] Adrien Grand commented on SOLR-12587: - I opened LUCENE-8428 for discussion. > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556261#comment-16556261 ] Adrien Grand commented on SOLR-12587: - Oh I see, a chicken-and-egg issue! I think this getSentinelObjet is designed to be implemented in an anonymous class. Maybe we should change PriorityQueue to take eg. a {{Supplier}} in place of the {{boolean prepopulate}} and remove {{getSentinelObject}}. > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555979#comment-16555979 ] Varun Thacker commented on SOLR-12587: -- Hi Adrien, I thought about that approach initially but won't {{proto}} be null at that point? {code:java} public SortQueue(int len, SortDoc proto) { super(len, true); this.proto = proto; .. } @Override protected SortDoc getSentinelObject() { return proto.copy(); //proto has not been initialized }{code} > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555903#comment-16555903 ] Adrien Grand commented on SOLR-12587: - I think you could fix your first nocommit by passing prepopulate=true to the constructor and implementing getSentinelObject with {{return proto.copy(); }}. For your reset() method, you could also take advantage of the fact that PriorityQueue extends Iterable so you don't need casts or to know that indices start at 1 in the heap array. > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Labels: export-writer > Attachments: SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554890#comment-16554890 ] Varun Thacker commented on SOLR-12587: -- I've put out a patch which is just exploring the idea . Tests pass but wanted to see what others feel about this before spending more time > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > Attachments: SOLR-12587.patch > > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554746#comment-16554746 ] Varun Thacker commented on SOLR-12587: -- Here's a version of [SortQueue|[http://example.com|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/export/SortQueue.java]] looks functionally the same . {code:java} class SortQueue extends PriorityQueue { final private SortDoc proto; public SortQueue(int len, SortDoc proto) { super(len); this.proto = proto; } protected boolean lessThan(SortDoc t1, SortDoc t2) { return t1.lessThan(t2); } // reset get's called even before we collect the first set of docs so we can initialize // heap with proto here protected void reset() { Object[] heap = getHeapArray(); for (int i=1; i< heap.length; i++) { heap[i] = proto.copy(); } size = maxSize; } }{code} > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler
[ https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554688#comment-16554688 ] Varun Thacker commented on SOLR-12587: -- Hi [~joel.bernstein] I'm trying to understand why we need the {{cache}} array in {{SortQueue}} . In reset could we simply do this? {code:java} protected void reset() { Object[] heap = getHeapArray(); for (int i=1; i< heap.length; i++) { heap[i] = proto.copy(); } size = maxSize; }{code} > Reuse Lucene's PriorityQueue for the ExportHandler > -- > > Key: SOLR-12587 > URL: https://issues.apache.org/jira/browse/SOLR-12587 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Major > > We have a priority queue in Lucene {{org.apache.lucene.utilPriorityQueue}} . > The Export Handler also implements a PriorityQueue > {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very > similar with minor API differences. > > The aim here is to reuse Lucene's PQ and remove the Solr implementation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org