[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-22 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253949#comment-15253949
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
-

Although it can be reproduced in many tests rarely, I found it in 
AsyncCallRequestStatusResponseTest. Beasting this test finds the failure pretty 
often.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254262#comment-15254262
 ] 

Mark Miller commented on SOLR-9030:
---

Why does that assert even exist?

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-22 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254288#comment-15254288
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
-

It exists to ensure that we do not update/overwrite a cluster state if we had 
no idea of its previous znode version. Also the default value of znode in a 
DocCollection is -1. If left unchecked, ZK will overwrite the value in the 
state without the CAS checks that we rely on.

bq. And shouldn't we expect that that can happen and deal with it 
appropriately? (A retry or something?)

Yes and it does recover automatically. A BadVersionException will cause the 
complete cluster state to be re-fetched from ZK and the operation is retried. 
In production environments, the BadVersionException will not be a problem but 
the overwriting of state can be.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254280#comment-15254280
 ] 

Mark Miller commented on SOLR-9030:
---

bq. or a BadVersionException as well

And shouldn't we expect that that can happen and deal with it appropriately? (A 
retry or something?)

Not that something else might not be off, but it just seems like that assert is 
strange, and we should handle the case when the setData fails due to a version 
conflict - seems odd to specify a version to expect to update and then not deal 
with a failure.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-25 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257309#comment-15257309
 ] 

Scott Blum commented on SOLR-9030:
--

Can you try making DocCollection.znodeVersion be private final int?  It's 
currently marked as non-final, which makes no sense because it's never updated. 
 It's a long shot, but setting a final instance variable in a constructor gives 
a stronger concurrency / visibility guarantee than setting a non-final instance 
var.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-25 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257312#comment-15257312
 ] 

Scott Blum commented on SOLR-9030:
--

BTW, can you paste in the beast command you're using?  I can try it also.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258043#comment-15258043
 ] 

Mark Miller commented on SOLR-9030:
---

bq. Can you try making DocCollection.znodeVersion be private final int? It's 
currently marked as non-final, which makes no sense because it's never updated. 
It's a long shot, but setting a final instance variable in a constructor gives 
a stronger concurrency / visibility guarantee than setting a non-final instance 
var.

That should not be true should it? An effectively immutable field should have 
the same thread safety as a final field?

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258071#comment-15258071
 ] 

Mark Miller commented on SOLR-9030:
---

Of course we should make that final anyway, I'm just trying to make sure I 
understand - we should be fine with effectively immutable because clusterState 
is published via volatile right? Final is required when you share an object 
across threads and don't safely publish it somehow.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258110#comment-15258110
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
-

You can reproduce using:
{code}
ant -Dtestcase=AsyncCallRequestStatusResponseTest -Dbeast.iters=50 beast
{code}

This particular assert is tripped when the following hold true:
# control_collection is in state format 1
# collection1 is in state format 2
# collection1 is created but not written to ZK -- znode version is -1
# downnode modifies at least two collections (in the above test, it modifies 
control_collection and collection1)
# maybeFlushBefore logic is tripped when modifying control_collection (because 
it is in a different state format) -- this flushes collection1 -- znode version 
is now 0
# now the change made by downnode to collection1 trips the assert because when 
'downnode' was executed, the returned ZkWriteCommand had collection1 with 
version -1 but now it is 0.

One way to solve this problem i.e. to avoid the overwriting would be to use a 
sentinel value other than -1 for znodeVersion because ZK considers -1 as any 
version in CAS. If we can use, say Integer.MAX_VALUE then the logic to retry on 
version conflict will automatically take care of this problem and we can remove 
the assert too.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258560#comment-15258560
 ] 

Scott Blum commented on SOLR-9030:
--

The final keyword conveys special thread-safety semantics in the JMM.
https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.5

I don't know that this is actually the bug, but because DocCollection 
initializes znodeVersion to -1 before subsequently setting it to the 'correct' 
value, it's not out of the question that this could expose a data race that 
marking the field 'final' would have prevented.  We should make that change 
anyway.  Here's another description with a code example:

http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight

I don't think "effectively final" has any special meaning for instance fields; 
the compiler can infer that local variables are effectively final through 
static analysis, but the compiler can't know whether or not you might modify a 
non-final instance field via reflection.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258571#comment-15258571
 ] 

Scott Blum commented on SOLR-9030:
--

Nice digging!  Can I just say for the record, I'm not a huge fan of 
ZkStateWriter in its current formulation.  Would love to chat sometime about 
its design and how we could improve it.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259068#comment-15259068
 ] 

Mark Miller commented on SOLR-9030:
---

Like I said though, those thread-safety semantics apply when you share an 
object across threads without safe publication. We do safe publication in this 
case.

When DocCollection is shared across threads, it is published with a volatile 
(or an atomicreference now?), and any member fields of that volatile object 
will be fully initialized and sharable across threads as long as they are 
effectively immutable.

If we wanted to just share a DocCollection across threads and there was no 
memory barrier, then they would have to be final and take advantage of the 
special thread safety semantics in the JVM.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-26 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259219#comment-15259219
 ] 

Scott Blum commented on SOLR-9030:
--

I did say it was a long shot. :)
Still, any reason for me not to do the 1-liner and make it final?

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262376#comment-15262376
 ] 

Mark Miller commented on SOLR-9030:
---

bq. Of course we should make that final anyway

I just wanted to make sure my understanding was correct - I was less sure when 
I first commented, going by memory, did a lot of reading by the time of my 
second comment. Pretty confident in my assessment currently. It's just 
important, because if I was wrong, we should try and fix a bunch of faulty 
code, as I'm sure there is more code like this. As it is, final is better, but 
not required. You can count on a volatile field having it's constructor run and 
all of it's fields being initialized as if it all happened in a single thread.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-04-28 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263124#comment-15263124
 ] 

Scott Blum commented on SOLR-9030:
--

Glad you wrote back to clarify, this got me to go do some reading as well and I 
agree with your conclusions.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270465#comment-15270465
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
-

Yeah you have said that before. I am all for simplifying and improving it but I 
don't have an idea right now. I'd be happy to hear your thoughts.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-04 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270875#comment-15270875
 ] 

Scott Blum commented on SOLR-9030:
--

ZkStateWriter is basically a write cache.  It should be much simpler than it 
is.  A few things that bug me in no particular order:

1) Tracking lastStateFormat / lastCollectionName and in general having a 
maybeFlushBefore / maybeFlushAfter makes no real sense to me.  If ZkStateWriter 
were capable of operating as a perfect write cache, the *content* of what's 
being written should never force a flush.  It should be able to just always 
keep queuing operations until the desired time delay is hit, or it's flushed 
from the outside.

2) ZkStateWriter's ClusterState liveNodes should probably be a view on 
ZkStateReader's ClusterState liveNode.

3) ZkWriteCallback - the one place this is used is the Overseer 
stateUpdateQueue handling.  I think the way that loop works would ZkStateWriter 
could be done a little better.  Ideally, I would want to peek up to N children 
at a time from that queue, send them all through ZkStateWriter in succession, 
flush, then remove those N items from the stateUpdateQueue.   If the flush 
failed from some reason, it could return a count of items committed so we could 
remove that many items from the stateUpdateQueue.  It seems a little nuts to 
have a second workQueue in operation the way it is today.  I get that in some 
situations we'd end up doing more net cluster state writes, but I think we'd 
still do fewer net writes to ZK since we do so much queue management.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271607#comment-15271607
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
-

Regarding #1, somebody still has to determine when to flush, maybe we just 
always do it time-based. In that case the logic can move (back) to Overseer. 
This flushing logic used to live in Overseer initially but it made things 
complicated so I moved it out to ZkStateWriter to simplify the overseer loop.

For #2 -- the ZkStateWriter's live nodes aren't used anywhere. It is only for 
correctness that I always copy over live nodes from the ZkStateReader. I don't 
mind doing this though.

For #3 -- I never liked ZkWriteCallback even though I wrote that bit. It was 
always a hack. I like your idea of peeking N items at a time.

Let's create separate issues to track these improvements.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
> Attachments: SOLR-9030.patch
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271650#comment-15271650
 ] 

ASF subversion and git services commented on SOLR-9030:
---

Commit c2662f24ac171c38aa17c0b7bbae0fd6b43652b5 in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c2662f2 ]

SOLR-9030: The 'downnode' overseer command can trip asserts in ZkStateWriter


> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
> Attachments: SOLR-9030.patch, SOLR-9030.patch
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271654#comment-15271654
 ] 

ASF subversion and git services commented on SOLR-9030:
---

Commit 827573b1a7bda2ae853f03c518f313e5992c1a7c in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=827573b ]

SOLR-9030: Added a code comment as to why we use Integer.MAX_VALUE instead of -1


> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
> Attachments: SOLR-9030.patch, SOLR-9030.patch
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272166#comment-15272166
 ] 

ASF subversion and git services commented on SOLR-9030:
---

Commit 29f69975022e6937e75091237e884fead444d07b in lucene-solr's branch 
refs/heads/branch_6x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=29f6997 ]

SOLR-9030: The 'downnode' overseer command can trip asserts in ZkStateWriter
(cherry picked from commit c2662f2)


> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
> Attachments: SOLR-9030.patch, SOLR-9030.patch
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9030) The 'downnode' command can trip asserts in ZkStateWriter or cause BadVersionException in Overseer

2016-05-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272170#comment-15272170
 ] 

ASF subversion and git services commented on SOLR-9030:
---

Commit ee45e83439a69e67037413304c32e3caf0bfb1d2 in lucene-solr's branch 
refs/heads/branch_6x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ee45e83 ]

SOLR-9030: Added a code comment as to why we use Integer.MAX_VALUE instead of -1

(cherry picked from commit 827573b1a7bda2ae853f03c518f313e5992c1a7c)


> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -
>
> Key: SOLR-9030
> URL: https://issues.apache.org/jira/browse/SOLR-9030
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 6.1, master
>
> Attachments: SOLR-9030.patch, SOLR-9030.patch
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>[junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>[junit4]> Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_00,
>  state=RUNNABLE, group=Overseer state updater.]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>[junit4]> Caused by: java.lang.AssertionError
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>[junit4]>  at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>[junit4]>  at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org