[jira] [Comment Edited] (SOLR-10524) Explore in-memory partitioning for processing Overseer queue messages

2017-05-08 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001514#comment-16001514
 ] 

Joel Bernstein edited comment on SOLR-10524 at 5/8/17 8:58 PM:
---

I'm seeing the errors below when running the StreamExpressionTest. I suspect 
it's related to this ticket. I've been adding tests the past couple of days but 
only started seeing this today:
 Overseer main queue loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239409 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239410 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239411 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239412 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239414 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239415 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop



was (Author: joel.bernstein):
I'm seeing the errors below in the StreamingExpressionTest. I suspect it's 
related to this ticket:
 Overseer main queue loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239409 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239410 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239411 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239412 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239414 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239415 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop


> Explore in-memory partitioning for processing Overseer queue messages
> -
>
> Key: SOLR-10524
> URL: https://issues.apache.org/jira/browse/SOLR-10524
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
> Attachments: SOLR-10524-NPE-fix.patch, SOLR-10524.patch, 
> 

[jira] [Comment Edited] (SOLR-10524) Explore in-memory partitioning for processing Overseer queue messages

2017-05-08 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001514#comment-16001514
 ] 

Joel Bernstein edited comment on SOLR-10524 at 5/8/17 8:58 PM:
---

I'm seeing the errors below when running the StreamExpressionTest. I suspect 
it's related to this ticket. I've been adding tests the past couple of days but 
only started seeing this today:


 Overseer main queue loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239409 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239410 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239411 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239412 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239414 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239415 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop



was (Author: joel.bernstein):
I'm seeing the errors below when running the StreamExpressionTest. I suspect 
it's related to this ticket. I've been adding tests the past couple of days but 
only started seeing this today:
 Overseer main queue loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239409 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239410 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239411 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239412 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239413 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239414 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop
   [junit4]   2> java.lang.NullPointerException
   [junit4]   2> 239415 ERROR 
(OverseerStateUpdate-97928916256817164-127.0.0.1:51485_solr-n_00) 
[n:127.0.0.1:51485_solr] o.a.s.c.Overseer Exception in Overseer main queue 
loop


> Explore in-memory partitioning for processing Overseer queue messages
> -
>
> Key: SOLR-10524
> URL: https://issues.apache.org/jira/browse/SOLR-10524
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick 

[jira] [Comment Edited] (SOLR-10524) Explore in-memory partitioning for processing Overseer queue messages

2017-05-05 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998138#comment-15998138
 ] 

Shalin Shekhar Mangar edited comment on SOLR-10524 at 5/5/17 10:56 AM:
---

Yes, I like this. Same performance, much smaller changes and no chance of 
something going wrong in the cluster because of processing re-ordered messages. 
+1 to commit.

-There are optimizations we can do on the read side using multi-get. Lets open 
another issue to explore that as well.- Oops, zookeeper has no multi-get.

As a side note, there is a bug in the nsToMs method in testOverseer -- it 
actually assumes the nanoseconds as milliseconds and the converts them to nano 
seconds! I'll fix it separately. 


was (Author: shalinmangar):
Yes, I like this. Same performance, much smaller changes and no chance of 
something going wrong in the cluster because of processing re-ordered messages. 
+1 to commit.

There are optimizations we can do on the read side using multi-get. Lets open 
another issue to explore that as well.

As a side note, there is a bug in the nsToMs method in testOverseer -- it 
actually assumes the nanoseconds as milliseconds and the converts them to nano 
seconds! I'll fix it separately. 

> Explore in-memory partitioning for processing Overseer queue messages
> -
>
> Key: SOLR-10524
> URL: https://issues.apache.org/jira/browse/SOLR-10524
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
> Attachments: SOLR-10524.patch, SOLR-10524.patch, SOLR-10524.patch, 
> SOLR-10524.patch
>
>
> There are several JIRAs (I'll link in a second) about trying to be more 
> efficient about processing overseer messages as the overseer can become a 
> bottleneck, especially with very large numbers of replicas in a cluster. One 
> of the approaches mentioned near the end of SOLR-5872 (15-Mar) was to "read 
> large no:of items say 1. put them into in memory buckets and feed them 
> into overseer".
> This JIRA is to break out that part of the discussion as it might be an easy 
> win whereas "eliminating the Overseer queue" would be quite an undertaking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10524) Explore in-memory partitioning for processing Overseer queue messages

2017-05-04 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997500#comment-15997500
 ] 

Scott Blum edited comment on SOLR-10524 at 5/4/17 9:56 PM:
---

Couple of thoughts:

1) In the places where you've changed Collection -> List, I would go one step 
further and make it a concrete ArrayList, to a) explicitly convey that the 
returned list is a mutable copy rather than a view of internal state and b) 
explicitly convey that sortAndAdd() is operating efficiently on said lists.

2) DQ.remove(id): don't you want to unconditionally knownChildren.remove(id), 
even if the ZK delete succeeds?

3) DQ.remove(id): there is no need to loop here, in fact you'll get stuck in an 
infinite loop if someone else deletes the node you're targeting.  The reason 
there's a loop in removeFirst() is because it's trying a different id each 
iteration.

Suggested remove(id) impl:

{code}
  public void remove(String id) throws KeeperException, InterruptedException {
// Remove the ZK node *first*; ZK will resolve any races with peek()/poll().
// This is counterintuitive, but peek()/poll() will not return an element 
if the underlying
// ZK node has been deleted, so it's okay to update knownChildren 
afterwards.
try {
  String path = dir + "/" + id;
  zookeeper.delete(path, -1, true);
} catch (KeeperException.NoNodeException e) {
  // Another client deleted the node first, this is fine.
}
updateLock.lockInterruptibly();
try {
  knownChildren.remove(id);
} finally {
  updateLock.unlock();
}
  }
{code}



was (Author: dragonsinth):
Couple of thoughts:

1) In the places where you've changed Collection -> List, I would go one step 
further and make it a concrete ArrayList, to a) explicitly convey that the 
returned list is a mutable copy rather than a view of internal state and b) 
explicitly convey that sortAndAdd() is operating efficiently on said lists.

2) DQ.remove(id): don't you need to unconditionally knownChildren.remove(id), 
even if the ZK delete succeeds?

3) DQ.remove(id): there is no need to loop here, in fact you'll get stuck in an 
infinite loop if someone else deletes the node you're targeting.  The reason 
there's a loop in removeFirst() is because it's trying a different id each 
iteration.

Suggested remove(id) impl:

{code}
  public void remove(String id) throws KeeperException, InterruptedException {
// Remove the ZK node *first*; ZK will resolve any races with peek()/poll().
// This is counterintuitive, but peek()/poll() will not return an element 
if the underlying
// ZK node has been deleted, so it's okay to update knownChildren 
afterwards.
try {
  String path = dir + "/" + id;
  zookeeper.delete(path, -1, true);
} catch (KeeperException.NoNodeException e) {
  // Another client deleted the node first, this is fine.
}
updateLock.lockInterruptibly();
try {
  knownChildren.remove(id);
} finally {
  updateLock.unlock();
}
  }
{code}


> Explore in-memory partitioning for processing Overseer queue messages
> -
>
> Key: SOLR-10524
> URL: https://issues.apache.org/jira/browse/SOLR-10524
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
> Attachments: SOLR-10524.patch, SOLR-10524.patch
>
>
> There are several JIRAs (I'll link in a second) about trying to be more 
> efficient about processing overseer messages as the overseer can become a 
> bottleneck, especially with very large numbers of replicas in a cluster. One 
> of the approaches mentioned near the end of SOLR-5872 (15-Mar) was to "read 
> large no:of items say 1. put them into in memory buckets and feed them 
> into overseer".
> This JIRA is to break out that part of the discussion as it might be an easy 
> win whereas "eliminating the Overseer queue" would be quite an undertaking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org