[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-08-19 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585428#comment-16585428
 ] 

Varun Thacker commented on SOLR-12587:
--

When I took a deeper look at it today there are still a few subtle things that 
wasn't obvious to me :
 # The Solr PQ has a reset method which resets the size to maxSize and then 
does a System.arraycopy . If we were to use the Lucene PQ we don't have a way 
to reset size to maxSize . Secondly we would no longer do System.arraycopy and 
instead reset the heap in the for loop which is probably slower and hence was 
done like this in the first place? A 25M export on the "id" field used to take 
7m15s now took 10.54s when i simulated this by not reusing the PQ and creating 
a new PQ for every 30k docs collected in ExportWriter (which was earlier using 
the reset )
{code:java}
protected void reset() {
  Object[] heap = getHeapArray();
  if(cache != null) {
System.arraycopy(cache, 1, heap, 1, heap.length-1);
size = maxSize;
  } else {
populate();
  }
}{code}

 # We could perhaps do a "true" reset and even avoid doing a System.arraycopy , 
if we never nulled the object we popped and relied on size do do the right 
thing. Then reset would simply change call SortDoc#reset and change back size 
to maxSize. We would save a lot of objects generated
{code:java}
public final T pop() {
  if (size > 0) {
T result = heap[1];   // save first value
heap[1] = heap[size]; // move last to first
heap[size] = null;// permit GC of objects //<-- remove this 
line
size--;
downHeap();   // adjust heap
return result;
  } else {
return null;
  }
}
// pseudo code for reset
protected void reset() {
  Object[] heap = getHeapArray();
  for (int i = 1; i < heap.length; i++) {
((SortDoc) heap[i]).reset();
  }
  size = maxSize;
}{code}
 

In approach 1 , we'd essentially be giving up on whatever optimizations 
System.arraycopy does ( being a native call ) vs relying on a for loop. 
In approach 2 , we'd basically be creating some sort of a reusable PQ 



Thoughts ?

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch, SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-26 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558743#comment-16558743
 ] 

Varun Thacker commented on SOLR-12587:
--

Updated patch which will apply cleanly once LUCENE-8428 has been committed

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch, SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-25 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556970#comment-16556970
 ] 

Varun Thacker commented on SOLR-12587:
--

Thanks Adrien! I worked on a prototype patch and i'll post in the on the lucene 
Jira tomorrow 

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556968#comment-16556968
 ] 

Adrien Grand commented on SOLR-12587:
-

I opened LUCENE-8428 for discussion.

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556261#comment-16556261
 ] 

Adrien Grand commented on SOLR-12587:
-

Oh I see, a chicken-and-egg issue! I think this getSentinelObjet is designed to 
be implemented in an anonymous class. Maybe we should change PriorityQueue to 
take eg. a {{Supplier}} in place of the {{boolean prepopulate}} and remove 
{{getSentinelObject}}.

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-25 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555979#comment-16555979
 ] 

Varun Thacker commented on SOLR-12587:
--

Hi Adrien,

I thought about that approach initially but won't {{proto}} be null at that 
point?
{code:java}
public SortQueue(int len, SortDoc proto) {
  super(len, true);
  this.proto = proto;
..
}

@Override
protected SortDoc getSentinelObject() {
  return proto.copy(); //proto has not been initialized
}{code}

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555903#comment-16555903
 ] 

Adrien Grand commented on SOLR-12587:
-

I think you could fix your first nocommit by passing prepopulate=true to the 
constructor and implementing getSentinelObject with {{return proto.copy(); }}. 
For your reset() method, you could also take advantage of the fact that 
PriorityQueue extends Iterable so you don't need casts or to know that indices 
start at 1 in the heap array.

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>  Labels: export-writer
> Attachments: SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-24 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554890#comment-16554890
 ] 

Varun Thacker commented on SOLR-12587:
--

I've put out a patch which is just exploring the idea . Tests pass but wanted 
to see what others feel about this before spending more time 

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12587.patch
>
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-24 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554746#comment-16554746
 ] 

Varun Thacker commented on SOLR-12587:
--

Here's a version of 
[SortQueue|[http://example.com|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/export/SortQueue.java]]
 looks functionally the same .
{code:java}
class SortQueue extends PriorityQueue {

  final private SortDoc proto;

  public SortQueue(int len, SortDoc proto) {
super(len);
this.proto = proto;
  }

  protected boolean lessThan(SortDoc t1, SortDoc t2) {
return t1.lessThan(t2);
  }

  // reset get's called even before we collect the first set of docs so we can 
initialize
  // heap with proto here
  protected void reset() {
Object[] heap = getHeapArray();
for (int i=1; i< heap.length; i++) {
  heap[i] = proto.copy();
}
size = maxSize;
  }

}{code}
 

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12587) Reuse Lucene's PriorityQueue for the ExportHandler

2018-07-24 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554688#comment-16554688
 ] 

Varun Thacker commented on SOLR-12587:
--

Hi [~joel.bernstein]

I'm trying to understand why we need the {{cache}} array in {{SortQueue}}  . 

In reset could we simply do this? 
{code:java}
protected void reset() {
  Object[] heap = getHeapArray();
  for (int i=1; i< heap.length; i++) {
heap[i] = proto.copy();
  }
  size = maxSize;
}{code}
 

> Reuse Lucene's PriorityQueue for the ExportHandler
> --
>
> Key: SOLR-12587
> URL: https://issues.apache.org/jira/browse/SOLR-12587
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Major
>
> We have a priority queue in Lucene  {{org.apache.lucene.utilPriorityQueue}} . 
> The Export Handler also implements a PriorityQueue 
> {{org.apache.solr.handler.export.PriorityQueue}} . Both are obviously very 
> similar with minor API differences. 
>  
> The aim here is to reuse Lucene's PQ and remove the Solr implementation. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org