[jira] [Commented] (CASSANDRA-15448) Throttle the speed of merkletree row hash

2019-12-12 Thread maxwellguo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995282#comment-16995282
 ] 

maxwellguo commented on CASSANDRA-15448:


I don't mean my repair is not efficient , for some server of low pofile, doing 
repair may got affect .And the issue are all not resolved. For changing hash is 
a way to alleviate this problem,but I think it may be dangerous ,sha-256 will 
got  very low probability of data collision. 

> Throttle the speed of merkletree row hash 
> --
>
> Key: CASSANDRA-15448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15448
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Tool/nodetool
>Reporter: maxwellguo
>Assignee: maxwellguo
>Priority: Normal
>
> Under our enviroment , we may got some Low-profile servers, like 4core 8G 
> memory, so when doing repair for merkletree calculate , the cpu may cost so 
> much. And we think repair can take long time so do some speed throttle may 
> increase repair time,but can make the server more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15429) Support NodeTool for in-jvm dtest

2019-12-12 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995177#comment-16995177
 ] 

Dinesh Joshi commented on CASSANDRA-15429:
--

Hi [~yifanc], thanks for the PR. I have left a few review comments.

> Support NodeTool for in-jvm dtest
> -
>
> Key: CASSANDRA-15429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15429
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Test/dtest
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In-JVM dtest framework does not support nodetool as of now. This 
> functionality is wanted in some tests, e.g. constructing an end-to-end test 
> scenario that uses nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15429) Support NodeTool for in-jvm dtest

2019-12-12 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15429:
-
Status: Changes Suggested  (was: Review In Progress)

> Support NodeTool for in-jvm dtest
> -
>
> Key: CASSANDRA-15429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15429
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Test/dtest
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In-JVM dtest framework does not support nodetool as of now. This 
> functionality is wanted in some tests, e.g. constructing an end-to-end test 
> scenario that uses nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15450:
--
Description: In AbstractCluster.uncaughtExceptions, we attempt to get the 
instance from the class loader and used the “generation”. This value is 
actually the cluster id, so causes tests to fail when multiple tests share the 
same JVM; it should be using the “id” field which represents the instance id 
relative to the cluster.  (was: In 
\{code}AbstractCluster.uncaughtExceptions\{code}, we attempt to get the 
instance from the class loader and used the “generation”. This value is 
actually the cluster id, so causes tests to fail when multiple tests share the 
same JVM; it should be using the “id” field which represents the instance id 
relative to the cluster.)

> in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the 
> wrong instance, it uses cluster generation when it should be using the 
> instance id
> ---
>
> Key: CASSANDRA-15450
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In AbstractCluster.uncaughtExceptions, we attempt to get the instance from 
> the class loader and used the “generation”. This value is actually the 
> cluster id, so causes tests to fail when multiple tests share the same JVM; 
> it should be using the “id” field which represents the instance id relative 
> to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15450:
--
Description: In \{code}AbstractCluster.uncaughtExceptions\{code}, we 
attempt to get the instance from the class loader and used the “generation”. 
This value is actually the cluster id, so causes tests to fail when multiple 
tests share the same JVM; it should be using the “id” field which represents 
the instance id relative to the cluster.  (was: In 
AbstractCluster.uncaughtExceptions, we attempt to get the instance from the 
class loader and used the “generation”.  This value is actually the cluster id, 
so causes tests to fail when multiple tests share the same JVM; it should be 
using the “id” field which represents the instance id relative to the cluster.)

> in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the 
> wrong instance, it uses cluster generation when it should be using the 
> instance id
> ---
>
> Key: CASSANDRA-15450
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In \{code}AbstractCluster.uncaughtExceptions\{code}, we attempt to get the 
> instance from the class loader and used the “generation”. This value is 
> actually the cluster id, so causes tests to fail when multiple tests share 
> the same JVM; it should be using the “id” field which represents the instance 
> id relative to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995117#comment-16995117
 ] 

David Capwell commented on CASSANDRA-15450:
---

[~drohrer] could you review as you were the one who reported the issue?
[~ifesdjeen] could you also review?

> in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the 
> wrong instance, it uses cluster generation when it should be using the 
> instance id
> ---
>
> Key: CASSANDRA-15450
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In AbstractCluster.uncaughtExceptions, we attempt to get the instance from 
> the class loader and used the “generation”.  This value is actually the 
> cluster id, so causes tests to fail when multiple tests share the same JVM; 
> it should be using the “id” field which represents the instance id relative 
> to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15450:
--
Test and Documentation Plan: 
Patch located here: https://github.com/apache/cassandra/pull/397

Replicated the issue in IntelliJ by selecting {code}GossipSettlesTest{code} and 
{code}FailingRepairTest{code} and calling run.  Before this patch, 
FailingRepairTest hangs until timeout, it never sees the jvm kill attempt (went 
to the wrong host); after this patch the correct host gets killed
 Status: Patch Available  (was: Open)

> in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the 
> wrong instance, it uses cluster generation when it should be using the 
> instance id
> ---
>
> Key: CASSANDRA-15450
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In AbstractCluster.uncaughtExceptions, we attempt to get the instance from 
> the class loader and used the “generation”.  This value is actually the 
> cluster id, so causes tests to fail when multiple tests share the same JVM; 
> it should be using the “id” field which represents the instance id relative 
> to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15450:
---
Labels: pull-request-available  (was: )

> in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the 
> wrong instance, it uses cluster generation when it should be using the 
> instance id
> ---
>
> Key: CASSANDRA-15450
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>
> In AbstractCluster.uncaughtExceptions, we attempt to get the instance from 
> the class loader and used the “generation”.  This value is actually the 
> cluster id, so causes tests to fail when multiple tests share the same JVM; 
> it should be using the “id” field which represents the instance id relative 
> to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15450:
--
 Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear 
Impact(13164)
   Complexity: Low Hanging Fruit
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the 
> wrong instance, it uses cluster generation when it should be using the 
> instance id
> ---
>
> Key: CASSANDRA-15450
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>
> In AbstractCluster.uncaughtExceptions, we attempt to get the instance from 
> the class loader and used the “generation”.  This value is actually the 
> cluster id, so causes tests to fail when multiple tests share the same JVM; 
> it should be using the “id” field which represents the instance id relative 
> to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15450) in-jvm dtest cluster uncaughtExceptions propagation of exception goes to the wrong instance, it uses cluster generation when it should be using the instance id

2019-12-12 Thread David Capwell (Jira)
David Capwell created CASSANDRA-15450:
-

 Summary: in-jvm dtest cluster uncaughtExceptions propagation of 
exception goes to the wrong instance, it uses cluster generation when it should 
be using the instance id
 Key: CASSANDRA-15450
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15450
 Project: Cassandra
  Issue Type: Bug
  Components: Test/dtest
Reporter: David Capwell
Assignee: David Capwell


In AbstractCluster.uncaughtExceptions, we attempt to get the instance from the 
class loader and used the “generation”.  This value is actually the cluster id, 
so causes tests to fail when multiple tests share the same JVM; it should be 
using the “id” field which represents the instance id relative to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Impacts: Clients  (was: None)

> Credentials out of sync after replacing the nodes
> -
>
> Key: CASSANDRA-15449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png
>
>
> Hello,
> We are seeing a strange issue where, after replacing multiple C* nodes from 
> the clusters intermittently we see an issue where few nodes doesn't have any 
> credentials and the client queries fail.
> Here are the sequence of steps
> 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes 
> in one DC. 
> 2. The approach we took to replace the nodes is kill one node and launch a 
> new node with {{-Dcassandra.replace_address=}} and proceed with next node 
> once the node is bootstrapped and CQL is enabled.
>  3. This process works fine and all of a sudden, we started seeing our 
> application started failing with the below errors in the logs
> {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc 
> has no SELECT permission on  or any of its parents at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
>  at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
>  at
> {quote}
> 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
> rest of the nodes are serving ~100 requests. (attached the metrics)
>  5. We suspect some credentials sync issue and manually synced the 
> credentials and restarted the nodes with 0 requests, which fixed the problem.
> Also, one few C* nodes we see below exception immediately after the bootstrap 
> is completed and the process dies. is this contributing to the credentials 
> issue?
> NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
> are not the same.
> {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
> Exception encountered during startup
>  java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
> salted_hash in selection clause
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
> ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
>  [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  Caused by: org.apache.cassandra.exceptions.InvalidRequestException: 
> Undefined name salted_hash in selection clause
>  at 
> org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  ... 7 common frames omitted
> {quote}
> Not sure why this is happening, is this a potential bug or any other pointers 
> to fix the problem.
> C* Version: 2.1.16
>  Client: Datastax Java Driver.
>  system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Description: 
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

 5. We suspect some credentials sync issue and manually synced the credentials 
and restarted the nodes with 0 requests, which fixed the problem.

Also, one few C* nodes we see below exception immediately after the bootstrap 
is completed and the process dies. is this contributing to the credentials 
issue?
NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
are not the same.

{quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
Exception encountered during startup
 java.lang.AssertionError: 
org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
salted_hash in selection clause
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.16.jar:2.1.16]
 Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined 
name salted_hash in selection clause
 at 
org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 ... 7 common frames omitted
{quote}
Not sure why this is happening, is this a potential bug or any other pointers 
to fix the problem.

C* Version: 2.1.16
 Client: Datastax Java Driver.
 system_auth RF: 3, dc-1:3 and dc-2:3

  was:
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

 !Screen Shot 2019-12-12 at 11.13.52 AM.png! 
 5. We suspect some credentials sync issue and 

[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Attachment: Screen Shot 2019-12-12 at 11.13.52 AM.png

> Credentials out of sync after replacing the nodes
> -
>
> Key: CASSANDRA-15449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
> Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png
>
>
> Hello,
> We are seeing a strange issue where, after replacing multiple C* nodes from 
> the clusters intermittently we see an issue where few nodes doesn't have any 
> credentials and the client queries fail.
> Here are the sequence of steps
> 1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes 
> in one DC. 
> 2. The approach we took to replace the nodes is kill one node and launch a 
> new node with {{-Dcassandra.replace_address=}} and proceed with next node 
> once the node is bootstrapped and CQL is enabled.
>  3. This process works fine and all of a sudden, we started seeing our 
> application started failing with the below errors in the logs
> {quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc 
> has no SELECT permission on  or any of its parents at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
>  at 
> com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
>  at
> {quote}
> 4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
> rest of the nodes are serving ~100 requests. (attached the metrics)
>  !Screen Shot 2019-12-12 at 11.13.52 AM.png! 
>  5. We suspect some credentials sync issue and manually synced the 
> credentials and restarted the nodes with 0 requests, which fixed the problem.
> Also, one few C* nodes we see below exception immediately after the bootstrap 
> is completed and the process dies. is this contributing to the credentials 
> issue?
> NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
> are not the same.
> {quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
> Exception encountered during startup
>  java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
> salted_hash in selection clause
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
> ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
>  [apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
> [apache-cassandra-2.1.16.jar:2.1.16]
>  Caused by: org.apache.cassandra.exceptions.InvalidRequestException: 
> Undefined name salted_hash in selection clause
>  at 
> org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at 
> org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  ... 7 common frames omitted
> {quote}
> Not sure why this is happening, is this a potential bug or any other pointers 
> to fix the problem.
> C* Version: 2.1.16
>  Client: Datastax Java Driver.
>  system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Bheemsen Rao Dhanwada updated CASSANDRA-15449:
--
Description: 
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

 !Screen Shot 2019-12-12 at 11.13.52 AM.png! 
 5. We suspect some credentials sync issue and manually synced the credentials 
and restarted the nodes with 0 requests, which fixed the problem.

Also, one few C* nodes we see below exception immediately after the bootstrap 
is completed and the process dies. is this contributing to the credentials 
issue?
NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
are not the same.

{quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
Exception encountered during startup
 java.lang.AssertionError: 
org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
salted_hash in selection clause
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.16.jar:2.1.16]
 Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined 
name salted_hash in selection clause
 at 
org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 ... 7 common frames omitted
{quote}
Not sure why this is happening, is this a potential bug or any other pointers 
to fix the problem.

C* Version: 2.1.16
 Client: Datastax Java Driver.
 system_auth RF: 3, dc-1:3 and dc-2:3

  was:
Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

!Screen Shot 2019-12-12 at 11.13.52 AM.png!
 5. 

[jira] [Created] (CASSANDRA-15449) Credentials out of sync after replacing the nodes

2019-12-12 Thread Jai Bheemsen Rao Dhanwada (Jira)
Jai Bheemsen Rao Dhanwada created CASSANDRA-15449:
-

 Summary: Credentials out of sync after replacing the nodes
 Key: CASSANDRA-15449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15449
 Project: Cassandra
  Issue Type: Bug
Reporter: Jai Bheemsen Rao Dhanwada
 Attachments: Screen Shot 2019-12-12 at 11.13.52 AM.png

Hello,

We are seeing a strange issue where, after replacing multiple C* nodes from the 
clusters intermittently we see an issue where few nodes doesn't have any 
credentials and the client queries fail.

Here are the sequence of steps

1. on a Multi DC C* cluster(12 nodes in each DC), we replaced all the nodes in 
one DC. 
2. The approach we took to replace the nodes is kill one node and launch a new 
node with {{-Dcassandra.replace_address=}} and proceed with next node once the 
node is bootstrapped and CQL is enabled.
 3. This process works fine and all of a sudden, we started seeing our 
application started failing with the below errors in the logs
{quote}com.datastax.driver.core.exceptions.UnauthorizedException: User abc has 
no SELECT permission on  or any of its parents at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
 at 
com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
 at
{quote}
4. At this stage we see that 3 nodes in the cluster takes zero traffic, while 
rest of the nodes are serving ~100 requests. (attached the metrics)

!Screen Shot 2019-12-12 at 11.13.52 AM.png!
 5. We suspect some credentials sync issue and manually synced the credentials 
and restarted the nodes with 0 requests, which fixed the problem.

Also, one few C* nodes we see below exception immediately after the bootstrap 
is completed and the process dies. is this contributing to the credentials 
issue?
NOTE:  The C* nodes with zero traffic and the nodes with the below exception 
are not the same.

{quote}ERROR [main] 2019-12-12 05:34:40,412 CassandraDaemon.java:583 - 
Exception encountered during startup
 java.lang.AssertionError: 
org.apache.cassandra.exceptions.InvalidRequestException: Undefined name 
salted_hash in selection clause
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:202)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.auth.Auth.setup(Auth.java:144) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:996)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) 
~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.16.jar:2.1.16]
 at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.16.jar:2.1.16]
 Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Undefined 
name salted_hash in selection clause
 at 
org.apache.cassandra.cql3.statements.Selection.fromSelectors(Selection.java:292)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:1592)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at 
org.apache.cassandra.auth.PasswordAuthenticator.setup(PasswordAuthenticator.java:198)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 ... 7 common frames omitted
{quote}
Not sure why this is happening, is this a potential bug or any other pointers 
to fix the problem.

C* Version: 2.1.16
 Client: Datastax Java Driver.
 system_auth RF: 3, dc-1:3 and dc-2:3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15210) Streaming with CDC does not honor cdc_enabled

2019-12-12 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994963#comment-16994963
 ] 

Jeremiah Jordan commented on CASSANDRA-15210:
-

[~aprudhomme] if your patch is ready for review, then hit the "submit patch" 
button above to let people know that.

> Streaming with CDC does not honor cdc_enabled
> -
>
> Key: CASSANDRA-15210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Feature/Change Data Capture
>Reporter: Andrew Prudhomme
>Assignee: Andrew Prudhomme
>Priority: Normal
>
> When SSTables are streamed for a CDC enabled table, the updates are processed 
> through the write path to ensure they are made available through the commit 
> log. However, currently only the CDC state of the table is checked. Since CDC 
> is enabled at both the node and table level, a node with CDC disabled (with 
> cdc_enabled: false) will unnecessarily send updates through the write path if 
> CDC is enabled on the table. This seems like an oversight.
> I'd imagine the fix would be something like
>  
> {code:java}
> -   hasCDC = cfs.metadata.params.cdc;
> +   hasCDC = cfs.metadata.params.cdc && 
> DatabaseDescriptor.isCDCEnabled();{code}
> in
> org.apache.cassandra.db.streaming.CassandraStreamReceiver (4)
> org.apache.cassandra.streaming.StreamReceiveTask (3.11)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15210) Streaming with CDC does not honor cdc_enabled

2019-12-12 Thread Jeremiah Jordan (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-15210:

 Bug Category: Parent values: Correctness(12982)Level 1 values: API / 
Semantic Implementation(12988)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Streaming with CDC does not honor cdc_enabled
> -
>
> Key: CASSANDRA-15210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Streaming, Feature/Change Data Capture
>Reporter: Andrew Prudhomme
>Assignee: Andrew Prudhomme
>Priority: Normal
>
> When SSTables are streamed for a CDC enabled table, the updates are processed 
> through the write path to ensure they are made available through the commit 
> log. However, currently only the CDC state of the table is checked. Since CDC 
> is enabled at both the node and table level, a node with CDC disabled (with 
> cdc_enabled: false) will unnecessarily send updates through the write path if 
> CDC is enabled on the table. This seems like an oversight.
> I'd imagine the fix would be something like
>  
> {code:java}
> -   hasCDC = cfs.metadata.params.cdc;
> +   hasCDC = cfs.metadata.params.cdc && 
> DatabaseDescriptor.isCDCEnabled();{code}
> in
> org.apache.cassandra.db.streaming.CassandraStreamReceiver (4)
> org.apache.cassandra.streaming.StreamReceiveTask (3.11)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization

2019-12-12 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15295:
-
Authors: Dinesh Joshi, Zephyr Guo  (was: Zephyr Guo)
  Fix Version/s: 4.0
  Since Version: 4.0
Source Control Link: 
https://github.com/apache/cassandra/commit/3a8300e0b86c4acfb7b7702197d36cc39ebe94bc
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks for the patch [~gzh1992n] & review [~jrwest]!

> Running into deadlock when do CommitLog initialization
> --
>
> Key: CASSANDRA-15295
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15295
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Zephyr Guo
>Assignee: Zephyr Guo
>Priority: Normal
> Fix For: 4.0
>
> Attachments: image.png, jstack.log, pstack.log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a 
> long time.
>  I used jstack to saw what happened. The main thread stuck in 
> *AbstractCommitLogSegmentManager.awaitAvailableSegment*
>  !screenshot-1.png! 
> The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it 
> was not actually running.  
>  !screenshot-2.png! 
> And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on 
> java class initialization.
>   !screenshot-3.png! 
> This is a deadlock obviously. CommitLog waits for a CommitLogSegment when 
> initializing. In this moment, the CommitLog class is not initialized and the 
> main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a 
> CommitLogSegment with exception and call *CommitLog.handleCommitError*(static 
> method).  COMMIT-LOG-ALLOCATOR will block on this line because CommitLog 
> class is still initializing.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15447) in-jvm dtest support for subnets doesn't change seed provider subnet

2019-12-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994933#comment-16994933
 ] 

David Capwell commented on CASSANDRA-15447:
---

I modified 
org.apache.cassandra.distributed.test.DistributedReadWritePathTest#pagingTests 
(it used subnet) without your patch, and it failed matching the jira 
description; I then partially applied your patch and it worked.

PR on trunk LGTM; +1

> in-jvm dtest support for subnets doesn't change seed provider subnet
> 
>
> Key: CASSANDRA-15447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15447
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When using the `withSubnet` function on AbstractCluster.Builder, the 
> newly-selected subnet is never used when setting up the SeedProvider in the 
> constructor of InstanceConfig, which is hard-coded to 127.0.0.1. Because of 
> this, clusters with any subnet other than 0, and gossip enabled, cannot start 
> up as they have no seed provider in their subnet and what should be the seed 
> (instance 1) doesn't think it is the seed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Avoid deadlock during CommitLog initialization

2019-12-12 Thread djoshi
This is an automated email from the ASF dual-hosted git repository.

djoshi pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 3a8300e  Avoid deadlock during CommitLog initialization
3a8300e is described below

commit 3a8300e0b86c4acfb7b7702197d36cc39ebe94bc
Author: Zephyr Guo 
AuthorDate: Fri Oct 18 17:15:20 2019 -0700

Avoid deadlock during CommitLog initialization

patch by Zephyr Guo, Dinesh Joshi; reviewed by Jordan West and Dinesh Joshi 
for CASSANDRA-15295

Co-Authored-By: Zephyr Guo 
Co-Authored-By: Dinesh Joshi 
---
 .../cassandra/config/DatabaseDescriptor.java   |  18 
 .../commitlog/AbstractCommitLogSegmentManager.java |  10 +-
 .../db/commitlog/AbstractCommitLogService.java |   7 +-
 .../apache/cassandra/db/commitlog/CommitLog.java   |  56 ---
 .../apache/cassandra/service/CassandraDaemon.java  |   2 +
 .../cassandra/utils/JVMStabilityInspector.java |  20 +++-
 .../cassandra/distributed/impl/Instance.java   |   1 +
 .../CassandraIsolatedJunit4ClassRunner.java| 107 
 .../config/DatabaseDescriptorRefTest.java  |   7 ++
 test/unit/org/apache/cassandra/cql3/CQLTester.java |   2 +
 test/unit/org/apache/cassandra/db/ColumnsTest.java |   2 +
 .../apache/cassandra/db/SystemKeyspaceTest.java|   2 +
 .../commitlog/CommitLogInitWithExceptionTest.java  | 110 +
 .../cassandra/db/context/CounterContextTest.java   |   2 +
 .../apache/cassandra/db/lifecycle/HelpersTest.java |   2 +
 .../apache/cassandra/db/lifecycle/TrackerTest.java |   1 +
 .../apache/cassandra/db/lifecycle/ViewTest.java|   2 +
 .../apache/cassandra/dht/PartitionerTestCase.java  |   2 +
 .../apache/cassandra/dht/StreamStateStoreTest.java |   2 +
 .../apache/cassandra/gms/FailureDetectorTest.java  |   2 +
 .../org/apache/cassandra/gms/GossiperTest.java |   2 +
 .../org/apache/cassandra/gms/ShadowRoundTest.java  |   2 +
 .../sstable/format/SSTableFlushObserverTest.java   |   2 +
 .../cassandra/locator/AlibabaCloudSnitchTest.java  |   2 +
 .../cassandra/locator/CloudstackSnitchTest.java|   2 +
 .../apache/cassandra/locator/EC2SnitchTest.java|   2 +
 .../cassandra/locator/GoogleCloudSnitchTest.java   |   2 +
 .../metrics/HintedHandOffMetricsTest.java  |   2 +
 .../org/apache/cassandra/net/ConnectionTest.java   |   2 +
 .../org/apache/cassandra/net/HandshakeTest.java|   2 +
 .../apache/cassandra/net/MessagingServiceTest.java |   2 +
 .../net/OutboundConnectionSettingsTest.java|   2 +
 .../cassandra/net/OutboundConnectionsTest.java |   2 +
 .../org/apache/cassandra/service/RemoveTest.java   |   2 +
 .../service/StorageServiceServerTest.java  |   2 +
 .../cassandra/transport/IdleDisconnectTest.java|   4 +-
 .../concurrent/AbstractTransactionalTest.java  |   2 +
 .../apache/cassandra/stress/CompactionStress.java  |   2 +
 38 files changed, 372 insertions(+), 23 deletions(-)

diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java 
b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index 02f5a70..3c184bd 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@ -51,6 +51,10 @@ import org.apache.cassandra.auth.IRoleManager;
 import org.apache.cassandra.config.Config.CommitLogSync;
 import 
org.apache.cassandra.config.EncryptionOptions.ServerEncryptionOptions.InternodeEncryption;
 import org.apache.cassandra.db.ConsistencyLevel;
+import org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager;
+import org.apache.cassandra.db.commitlog.CommitLog;
+import org.apache.cassandra.db.commitlog.CommitLogSegmentManagerCDC;
+import org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard;
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.FSWriteError;
@@ -147,6 +151,10 @@ public class DatabaseDescriptor
 // turns some warnings into exceptions for testing
 private static final boolean strictRuntimeChecks = 
Boolean.getBoolean("cassandra.strict.runtime.checks");
 
+private static Function 
commitLogSegmentMgrProvider = c -> DatabaseDescriptor.isCDCEnabled()
+   ? new CommitLogSegmentManagerCDC(c, 
DatabaseDescriptor.getCommitLogLocation())
+   : new 
CommitLogSegmentManagerStandard(c, DatabaseDescriptor.getCommitLogLocation());
+
 public static void daemonInitialization() throws ConfigurationException
 {
 daemonInitialization(DatabaseDescriptor::loadConfig);
@@ -2968,4 +2976,14 @@ public class DatabaseDescriptor
 logger.info("Setting use_offheap_merkle_trees to {}", value);
 conf.use_offheap_merkle_trees = value;
 }
+
+public static 

[jira] [Updated] (CASSANDRA-15447) in-jvm dtest support for subnets doesn't change seed provider subnet

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15447:
--
Reviewers: David Capwell, Yifan Cai  (was: Yifan Cai)

> in-jvm dtest support for subnets doesn't change seed provider subnet
> 
>
> Key: CASSANDRA-15447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15447
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When using the `withSubnet` function on AbstractCluster.Builder, the 
> newly-selected subnet is never used when setting up the SeedProvider in the 
> constructor of InstanceConfig, which is hard-coded to 127.0.0.1. Because of 
> this, clusters with any subnet other than 0, and gossip enabled, cannot start 
> up as they have no seed provider in their subnet and what should be the seed 
> (instance 1) doesn't think it is the seed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15447) in-jvm dtest support for subnets doesn't change seed provider subnet

2019-12-12 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-15447:
--
Reviewers: Yifan Cai, Yifan Cai  (was: Yifan Cai)
   Yifan Cai, Yifan Cai
   Status: Review In Progress  (was: Patch Available)

> in-jvm dtest support for subnets doesn't change seed provider subnet
> 
>
> Key: CASSANDRA-15447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15447
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When using the `withSubnet` function on AbstractCluster.Builder, the 
> newly-selected subnet is never used when setting up the SeedProvider in the 
> constructor of InstanceConfig, which is hard-coded to 127.0.0.1. Because of 
> this, clusters with any subnet other than 0, and gossip enabled, cannot start 
> up as they have no seed provider in their subnet and what should be the seed 
> (instance 1) doesn't think it is the seed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15447) in-jvm dtest support for subnets doesn't change seed provider subnet

2019-12-12 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994914#comment-16994914
 ] 

Yifan Cai commented on CASSANDRA-15447:
---

Thanks [~drohrer]. The PRs LGTM. 

One nit: the PR to trunk has 2 parameters at the same line for the 
multiple-line statement. According to the [code 
style|http://cassandra.apache.org/doc/latest/development/code_style.html], it 
should be 1 pre line.

> in-jvm dtest support for subnets doesn't change seed provider subnet
> 
>
> Key: CASSANDRA-15447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15447
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When using the `withSubnet` function on AbstractCluster.Builder, the 
> newly-selected subnet is never used when setting up the SeedProvider in the 
> constructor of InstanceConfig, which is hard-coded to 127.0.0.1. Because of 
> this, clusters with any subnet other than 0, and gossip enabled, cannot start 
> up as they have no seed provider in their subnet and what should be the seed 
> (instance 1) doesn't think it is the seed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15447) in-jvm dtest support for subnets doesn't change seed provider subnet

2019-12-12 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai reassigned CASSANDRA-15447:
-

Assignee: Doug Rohrer

> in-jvm dtest support for subnets doesn't change seed provider subnet
> 
>
> Key: CASSANDRA-15447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15447
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When using the `withSubnet` function on AbstractCluster.Builder, the 
> newly-selected subnet is never used when setting up the SeedProvider in the 
> constructor of InstanceConfig, which is hard-coded to 127.0.0.1. Because of 
> this, clusters with any subnet other than 0, and gossip enabled, cannot start 
> up as they have no seed provider in their subnet and what should be the seed 
> (instance 1) doesn't think it is the seed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15446) Per-thread stack size is too small on aarch64 CentOS

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15446:
--
Platform: Java8,Java11,OpenJDK,Linux,ARM  (was: Java8,Java11,OpenJDK,Linux)

> Per-thread stack size is too small on aarch64 CentOS
> 
>
> Key: CASSANDRA-15446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config, Local/Startup and Shutdown
>Reporter: Heming Fu
>Assignee: Heming Fu
>Priority: Normal
> Fix For: 3.11.5, 2.1.x, 2.2.x, 3.0.x
>
>
> Hi all,
> I found an issue when I tried to start cassandra on my aarch64 CentOS7.6, 
> however no errors on Ubuntu. Of course I could increase -Xss in jvm.options 
> to fix it, but this issue also caused Cassandra's docker images from docker 
> hub could not run containers on this OS.
> The information of my current environment and root cause of this issue were 
> shown below.
> *Error*
> The stack size specified is too small, Specify at least 328k
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> *Version*
> Cassandra 2.1.21 2.2.15 3.0.19 3.11.5 
> *Environment*
> $ lscpu
> Architecture: aarch64
> Byte Order: Little Endian
> $ uname -m
> aarch64
> $ java -version
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> [root@localhost apache-cassandra-3.11.5]# cat /etc/os-release
> $ cat /etc/os-release
> NAME="CentOS Linux"
> VERSION="7 (AltArch)"
> ID="centos"
> ID_LIKE="rhel fedora"
> VERSION_ID="7"
> PRETTY_NAME="CentOS Linux 7 (AltArch)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:centos:centos:7"
> HOME_URL="https://www.centos.org/;
> BUG_REPORT_URL="https://bugs.centos.org/;
> *Root Cause*
> Checked openjdk-1.8.0 source code, the min stack size is calculated by 
> StackYellowPage, StackRedPage, StackShadowPage, OS page size. Among those 
> parameters, *default OS page size of aarch64 CentOS 7.6 is 64K, however 
> aarch64 Ubuntu 18.04 and X86 CentOS are both 4K.*
> This difference causes JVM on aarch64 Ubuntu 18.04 needs 164K per-thread 
> stack size, but 328K required on aarch64 CentOS 7.6.
> The formula is 
> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
>  (size_t)(StackYellowPages+StackRedPages+StackShadowPages) * 
> Linux::page_size() +
>  (2*BytesPerWord COMPILER2_PRESENT(+1)) * Linux::vm_default_page_size());
> *Parameters on aarch64 CentOS7.6*
> intx StackRedPages = 1 
>  intx StackShadowPages = 1 
>  intx StackYellowPages = 1 
> pageSize 64K
> BytesPerWord 8
> vm_default_page_size 8K
> As a result, we have min_stack_allowed = (1 + 1 + 1) * 64K + (2 * 8 + 1) * 8K 
> = 328K
>  
> I could see some similar issues asked for specified achitecture, but no root 
> cause analyzed. I hope this could help you decide proper stack size for all 
> common OS.
> If you have any suggestion, pls let me know.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15448) Throttle the speed of merkletree row hash

2019-12-12 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994898#comment-16994898
 ] 

David Capwell commented on CASSANDRA-15448:
---

Do you have any profiles/gc logs to show where you are seeing most of the time? 
There is some work going on to make repair more efficient; work to remove 
allocations (less GC), work to speed up interval trees (see CASSANDRA-15397), 
change implementation of hash (see CASSANDRA-15294), etc.  With all those 
applied, wonder how it would impact on your systems.

> Throttle the speed of merkletree row hash 
> --
>
> Key: CASSANDRA-15448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15448
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Tool/nodetool
>Reporter: maxwellguo
>Assignee: maxwellguo
>Priority: Normal
>
> Under our enviroment , we may got some Low-profile servers, like 4core 8G 
> memory, so when doing repair for merkletree calculate , the cpu may cost so 
> much. And we think repair can take long time so do some speed throttle may 
> increase repair time,but can make the server more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15446) Per-thread stack size is too small on aarch64 CentOS

2019-12-12 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15446:
--
 Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear 
Impact(13164)
   Complexity: Low Hanging Fruit
  Component/s: Local/Startup and Shutdown
Discovered By: User Report
 Platform: Java8,Java11,OpenJDK,Linux  (was: OpenJDK,Linux)
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Per-thread stack size is too small on aarch64 CentOS
> 
>
> Key: CASSANDRA-15446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config, Local/Startup and Shutdown
>Reporter: Heming Fu
>Assignee: Heming Fu
>Priority: Normal
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.5
>
>
> Hi all,
> I found an issue when I tried to start cassandra on my aarch64 CentOS7.6, 
> however no errors on Ubuntu. Of course I could increase -Xss in jvm.options 
> to fix it, but this issue also caused Cassandra's docker images from docker 
> hub could not run containers on this OS.
> The information of my current environment and root cause of this issue were 
> shown below.
> *Error*
> The stack size specified is too small, Specify at least 328k
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> *Version*
> Cassandra 2.1.21 2.2.15 3.0.19 3.11.5 
> *Environment*
> $ lscpu
> Architecture: aarch64
> Byte Order: Little Endian
> $ uname -m
> aarch64
> $ java -version
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> [root@localhost apache-cassandra-3.11.5]# cat /etc/os-release
> $ cat /etc/os-release
> NAME="CentOS Linux"
> VERSION="7 (AltArch)"
> ID="centos"
> ID_LIKE="rhel fedora"
> VERSION_ID="7"
> PRETTY_NAME="CentOS Linux 7 (AltArch)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:centos:centos:7"
> HOME_URL="https://www.centos.org/;
> BUG_REPORT_URL="https://bugs.centos.org/;
> *Root Cause*
> Checked openjdk-1.8.0 source code, the min stack size is calculated by 
> StackYellowPage, StackRedPage, StackShadowPage, OS page size. Among those 
> parameters, *default OS page size of aarch64 CentOS 7.6 is 64K, however 
> aarch64 Ubuntu 18.04 and X86 CentOS are both 4K.*
> This difference causes JVM on aarch64 Ubuntu 18.04 needs 164K per-thread 
> stack size, but 328K required on aarch64 CentOS 7.6.
> The formula is 
> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
>  (size_t)(StackYellowPages+StackRedPages+StackShadowPages) * 
> Linux::page_size() +
>  (2*BytesPerWord COMPILER2_PRESENT(+1)) * Linux::vm_default_page_size());
> *Parameters on aarch64 CentOS7.6*
> intx StackRedPages = 1 
>  intx StackShadowPages = 1 
>  intx StackYellowPages = 1 
> pageSize 64K
> BytesPerWord 8
> vm_default_page_size 8K
> As a result, we have min_stack_allowed = (1 + 1 + 1) * 64K + (2 * 8 + 1) * 8K 
> = 328K
>  
> I could see some similar issues asked for specified achitecture, but no root 
> cause analyzed. I hope this could help you decide proper stack size for all 
> common OS.
> If you have any suggestion, pls let me know.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15447) in-jvm dtest support for subnets doesn't change seed provider subnet

2019-12-12 Thread Doug Rohrer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994788#comment-16994788
 ] 

Doug Rohrer commented on CASSANDRA-15447:
-

PRs for all four active branches are now available. The patch is essentially 
identical for all four, and require very few changes.
Separately, it appears there's something causing FailingRepairTest to fail when 
run with other dtests - this is true on trunk w/o my changes but I'll take a 
look and see if I can figure out what's bleeding over from some previous test 
that's causing it. 

> in-jvm dtest support for subnets doesn't change seed provider subnet
> 
>
> Key: CASSANDRA-15447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15447
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Doug Rohrer
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When using the `withSubnet` function on AbstractCluster.Builder, the 
> newly-selected subnet is never used when setting up the SeedProvider in the 
> constructor of InstanceConfig, which is hard-coded to 127.0.0.1. Because of 
> this, clusters with any subnet other than 0, and gossip enabled, cannot start 
> up as they have no seed provider in their subnet and what should be the seed 
> (instance 1) doesn't think it is the seed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-12-12 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko reassigned CASSANDRA-13938:
-

Assignee: Aleksey Yeschenko  (was: Alex Petrov)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Aleksey Yeschenko
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05