date:20180926

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-26 Thread Stephen Connolly (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629803#comment-16629803
 ] 

Stephen Connolly commented on CASSANDRA-12704:
--

IIRC it was hard enough getting interest to publish releases... but were back 
in the mists of time

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread sankalp kohli (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629689#comment-16629689
 ] 

sankalp kohli commented on CASSANDRA-12126:
---

Why is end result not correct? second and third operation did not succeed 
because first 1 did not finish? Can you combine the example with the earlier 
comment please

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629685#comment-16629685
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12126:
---

To complete our scenario, here is the setup for our Cassandra:
We run the scenario with Cassandra-v2.0.15.
Here is the scheme that we use:
 * 
CREATE KEYSPACE test WITH REPLICATION = \{'class': 'SimpleStrategy', 
'replication_factor': 3};
 * 
CREATE TABLE tests ( name text PRIMARY KEY, owner text, value_1 text, value_2 
text, value_3 text);

Here are the queries that we submit:
 * client request to node X (1st): UPDATE test.tests SET value_1 = 'A' WHERE 
name = 'testing' IF owner = 'user_1';
 * client request to node Y (2nd): UPDATE test.tests SET value_2 = 'B' WHERE 
name = 'testing' IF value_1 = 'A';
 * client request to node Z (3rd): UPDATE test.tests SET value_3 = 'C' WHERE 
name = 'testing' IF value_1 = 'A';

To confirm, when the bug is manifested, the end result will be: value_1='A', 
value_2=null, value_3=null



[~jjirsa], regarding our tool, at this point, it is not open for public. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-26 Thread mck (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629667#comment-16629667
 ] 

mck commented on CASSANDRA-12704:
-

+1 to the patch.

Seemed odd when I saw it last week considering 
https://github.com/apache/cassandra/blame/trunk/build.xml#L100-L101.
But maybe [~urandom] or [~stephenc] have input?

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-26 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629632#comment-16629632
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12438:
---

Hi [~benedict] ,

Following the bug description, we integrate our model checker with 
Cassandra-v3.7.
We grabbed the code from the github repository.

Regarding the scheme, here is the initial scheme that we have prepared before 
we inject any queries in the model checker path execution:
 * CREATE KEYSPACE test WITH REPLICATION = \{'class': 'SimpleStrategy', 
'replication_factor': 3};
 * CREATE TABLE tests (name text PRIMARY KEY, owner text, value_1 text, value_2 
text, value_3 text, value_4 text, value_5 text, value_6 text, value_7 text);

Regarding the operations/queries, here are the details of them:
 * INSERT INTO test.tests (name, owner, value_1, value_2, value_3, value_4, 
value_5, value_6, value_7) VALUES ('cass-12438', 'user_1', 'A1', 'B1', 'C1', 
'D1', 'E1', 'F1', 'G1') IF NOT EXISTS;
 * Client Request 2: UPDATE test.tests SET value_1 = 'A2', owner = 'user_2' 
WHERE name = 'cass-12438' IF owner = 'user_1';
 * Client Request 3: UPDATE test.tests SET value_1 = 'A3', owner = 'user_3' 
WHERE name = 'cass-12438' IF owner = 'user_2';

The messages from these queries here are the one that the model checker control 
and reorder in some way, so that we ended up reproducing this bug.

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good nodes already has the new V_a3 value, so it 
> doesn't return this row for the select query, but the other good node and the 
> one that was down still have the old value V_a2, so those 2 nodes return what 
> they have. The one that was down doesn't yet have the original insert, just 
> the update from V_a1 -> V_a2 (again I suspect, it's not been easy to verify), 
>

[jira] [Comment Edited] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-26 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629434#comment-16629434
 ] 

Jeffrey F. Lukman edited comment on CASSANDRA-12438 at 9/27/18 1:38 AM:


Hi all,

 

Our team from UCARE University of Chicago, have been able to reproduce this bug 
consistently with our model checker.
Here are the workload and scenario of the bug:

Workload: 3 node-cluster (let's call them node X, Y, and Z), 1 Crash, 1 Reboot 
events, with 3 client requests (where node X will be the coordinator node for 
all client requests.

Scenario:
 # Start the 3 nodes and setup the CONSISTENCY = ONE.
 # Inject client request 1 as described in this bug description:
Insert  (along with many others)
 # But before any PREPARE messages have been sent by the node X, node Z has 
crashed.
 # Client request 1 is successfully committed in node X and Y.
 # Reboot node Z.
 # Inject client request 2 & 3 as described in this bug description:
Update  (along with others for which A=V_a1)
Update  (along with many others for which 
A=V_a2)
(**Although Update 3 can also be ignored if we want to simplify the bug 
scenario)
 # If only client request-2 that finished, then we expect to see:

If the client request-2 and then client request-3 are committed, then we expect 
to see:

The very least possibility is if both client request-2 & -3 failed and they 
reached timeout, then we expect to see:

 # But our model checker shows that, if we do a read request to node Z, then we 
will see:
 // some fields are null
But if we do a read request to node X or Y, then we will get a complete result.
(or as expected in step 7)
Which means we end up in an inconsistency view among the nodes (X and Y are 
different from Z).

If we run this scenario with CONSISTENCY.ALL we will not see this bug to happen.

We are happy to assist you guys to debug this issue.


was (Author: jeffreyflukman):
Hi all,

 

Our team from UCARE University of Chicago, have been able to reproduce this bug 
consistently with our model checker.
Here are the workload and scenario of the bug:

Workload: 3 node-cluster (let's call them node X, Y, and Z), 1 Crash, 1 Reboot 
events, with 3 client requests (where node X will be the coordinator node for 
all client requests.

Scenario:
 # Start the 3 nodes and setup the CONSISTENCY = ONE.
 # Inject client request 1 as described in this bug description:
Insert  (along with many others)
 # But before any PREPARE messages have been sent by the node X, node Z has 
crashed.
 # Client request 1 is successfully committed in node X and Y.
 # Reboot node Z.
 # Inject client request 2 & 3 as described in this bug description:
Update  (along with others for which A=V_a1)
Update  (along with many others for which 
A=V_a2)
(**Although Update 3 can also be ignored if we want to simplify the bug 
scenario)
 # If client request-2 finished first without being interfered by client 
request-3, then we expect to see:

If the client request-3 interfere client request-2 or is executed before client 
request-2 for any reason, then we expect to see:

 # But our model checker shows that, if we do a read request to node Z, then we 
will see: // some fields are null
But if we do a read request to node X or Y, then we will get a complete result.
Which means we end up in an inconsistency view among the nodes.

If we run this scenario with CONSISTENCY.ALL we will not see this bug to happen.

We are happy to assist you guys to debug this issue.

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed

[jira] [Commented] (CASSANDRA-14702) Cassandra Write failed even when the required nodes to Ack(consistency) are up.

2018-09-26 Thread Rohit Singh (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629626#comment-16629626
 ] 

Rohit Singh commented on CASSANDRA-14702:
-

No, Only upgrade was being performed on one node after the other.

> Cassandra Write failed even when the required nodes to Ack(consistency) are 
> up.
> ---
>
> Key: CASSANDRA-14702
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14702
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rohit Singh
>Priority: Blocker
>
> Hi,
> We have following configuration in our project for cassandra. 
> Total nodes in Cluster-5
> Replication Factor- 3
> Consistency- LOCAL_QUORUM
> We get the writetimeout exception from cassandra even when 2 nodes are up and 
> why does stack trace says that 3 replica were required when consistency is 2?
> Below is the exception we got:-
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency LOCAL_QUORUM (3 replica were required but 
> only 2 acknowledged the write)
>  at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:59)
>  at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
>  at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:289)
>  at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:269)
>  at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread Jeff Jirsa (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629542#comment-16629542
 ] 

Jeff Jirsa edited comment on CASSANDRA-12126 at 9/26/18 11:47 PM:
--

[~jeffreyflukman] thanks for this report. Suspect that most of the folks who 
are interested in this are already cc'd and received an email notification of 
your response, but explicitly tagging [~benedict] [~iamaleksey] and 
[~bdeggleston] as people who aren't yet watching it but may be interested.

Also, very much interested in the model you mentioned - is that available 
publicly at this point? 


was (Author: jjirsa):
[~jeffreyflukman] thanks for this report. Suspect that most of the folks who 
are interested in this are already cc'd and received an email notification of 
your response, but explicitly tagging [~benedict] [~iamaleksey] and 
[~bdeggleston] as people who aren't yet watching it but may be interested.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread Jeff Jirsa (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629542#comment-16629542
 ] 

Jeff Jirsa commented on CASSANDRA-12126:


[~jeffreyflukman] thanks for this report. Suspect that most of the folks who 
are interested in this are already cc'd and received an email notification of 
your response, but explicitly tagging [~benedict] [~iamaleksey] and 
[~bdeggleston] as people who aren't yet watching it but may be interested.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14792) skip TestRepair.test_dead_coordinator dtest in 4.0

2018-09-26 Thread Blake Eggleston (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14792:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed as f4888c8976c2012e9de3b92dedb0ae1a3c984a4b, thanks

> skip TestRepair.test_dead_coordinator dtest in 4.0
> --
>
> Key: CASSANDRA-14792
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14792
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> CASSANDRA-14763 changed the coordinator behavior to not cleanup old repair 
> sessions, so this test doesn't really make sense anymore. We should just skip 
> it in 4.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra-dtest git commit: skip TestRepair.test_dead_coordinator dtest in 4.0

2018-09-26 Thread bdeggleston

Repository: cassandra-dtest
Updated Branches:
  refs/heads/master 96f90eee2 -> f4888c897


skip TestRepair.test_dead_coordinator dtest in 4.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/f4888c89
Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/f4888c89
Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/f4888c89

Branch: refs/heads/master
Commit: f4888c8976c2012e9de3b92dedb0ae1a3c984a4b
Parents: 96f90ee
Author: Blake Eggleston 
Authored: Tue Sep 25 15:58:14 2018 -0700
Committer: Blake Eggleston 
Committed: Wed Sep 26 16:45:39 2018 -0700

--
 repair_tests/repair_test.py | 33 +
 1 file changed, 1 insertion(+), 32 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/f4888c89/repair_tests/repair_test.py
--
diff --git a/repair_tests/repair_test.py b/repair_tests/repair_test.py
index d4a59a8..3264671 100644
--- a/repair_tests/repair_test.py
+++ b/repair_tests/repair_test.py
@@ -1087,35 +1087,7 @@ class TestRepair(BaseRepairTest):
 node2.start(wait_for_binary_proto=True)
 node2.repair()
 
-def _cancel_open_ir_sessions(self, nodes):
-cancelled = 0;
-for node in nodes:
-stdout = node.nodetool('repair_admin').stdout.strip()
-if stdout.strip() == 'no sessions':
-continue
-
-for line in stdout.split('\n')[1:]:
-columns = [c.strip() for c in line.split('|')]
-session = columns[0]
-coordinator = columns[3]
-if coordinator == node.address_and_port():
-node.nodetool('repair_admin --cancel {}'.format(session))
-cancelled += 1
-
-time.sleep(1)
-
-# force all of the sstables out of pending repair
-for node in nodes:
-node.nodetool('compact')
-
-for node in nodes:
-stdout = node.nodetool('repair_admin').stdout.strip()
-assert stdout.strip() == 'no sessions'
-
-return cancelled
-
-
-
+@since('2.1', max_version='4')
 def test_dead_coordinator(self):
 """
 @jira_ticket CASSANDRA-11824
@@ -1151,9 +1123,6 @@ class TestRepair(BaseRepairTest):
 node1.start(wait_for_binary_proto=True, wait_other_notice=True)
 logger.debug("running second repair")
 if cluster.version() >= "2.2":
-# 4.0+ actually requires manual intervention here (CASSANDRA-14763)
-if cluster.version() >= '4.0':
-assert self._cancel_open_ir_sessions(cluster.nodelist()) == 1
 node1.repair()
 else:
 node1.nodetool('repair keyspace1 standard1 -inc -par')


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629454#comment-16629454
 ] 

Benedict edited comment on CASSANDRA-12438 at 9/26/18 9:56 PM:
---

There's a lot of information here, that I haven't fully parsed, partially 
because of the pseudo-code (it's helpful to post actual schemas and 
operations/queries).

However, if you are performing a QUORUM read of *just* {{V_a2/3}}, by itself 
(to any node; X, Y or Z), before querying node Z directly at ONE then it's 
probable you are encountering CASSANDRA-14593.

The best workaround for this would be to always query all of the columns/rows 
you want to see updated atomically. Never select a subset.  

You could also patch your Cassandra instance to not persist the results of 
read-repair.  The upcoming 4.0 release will have the ability to disable it for 
exactly this reason, but this probably won't be released for several months.


was (Author: benedict):
There's a lot of information here, that I haven't fully parsed, partially 
because of the pseudo-code (it's helpful to post actual schemas and 
operations/queries).

However, if you are performing a QUORUM read of *just* {{V_a2/3}}, by itself 
(to any node; X, Y or Z), before querying node Z directly at ONE then it's 
probable you are encountering CASSANDRA-14593.

The best workaround for this would be to always query all of the columns/rows 
you want to see updated atomically. Never select a subset.  

 

You could also patch your Cassandra instance to not persist the results of 
read-repair.  The upcoming 4.0 release will have the ability to disable it for 
exactly this reason, but this probably won't be released for several months.

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good nodes already has the new V

[jira] [Commented] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629454#comment-16629454
 ] 

Benedict commented on CASSANDRA-12438:
--

There's a lot of information here, that I haven't fully parsed, partially 
because of the pseudo-code (it's helpful to post actual schemas and 
operations/queries).

However, if you are performing a QUORUM read of *just* {{V_a2/3}}, by itself 
(to any node; X, Y or Z), before querying node Z directly at ONE then it's 
probable you are encountering CASSANDRA-14593.

The best workaround for this would be to always query all of the columns/rows 
you want to see updated atomically. Never select a subset.  

 

You could also patch your Cassandra instance to not persist the results of 
read-repair.  The upcoming 4.0 release will have the ability to disable it for 
exactly this reason, but this probably won't be released for several months.

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good nodes already has the new V_a3 value, so it 
> doesn't return this row for the select query, but the other good node and the 
> one that was down still have the old value V_a2, so those 2 nodes return what 
> they have. The one that was down doesn't yet have the original insert, just 
> the update from V_a1 -> V_a2 (again I suspect, it's not been easy to verify), 
> which would explain where  comes from, that's all it 
> knows about. However since it's a serial quorum read, I'd expect some sort of 
> exception as neither of the remaining 2 nodes with A=V_a2 would be able to 
> come to a quorum on the values for all the columns, as I'd expect the other 
> good node to return 
> I know at some point nodetool repair should be run on this node, but I'm 
> concerned about a window of time between when the node comes back up and 
> repair

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629455#comment-16629455
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12126:
---

Hi all,

Our team from UCARE University of Chicago, have been able to reproduce similar 
manifestation to this bug consistently with our model checker. (Our scenario is 
different with what [~kohlisankalp] proposed)
Here are the workload and scenario of the bug:

Workload: 3 nodes-cluster, 3 client requests (but no crash event)

Scenario:
 # Start 3-nodes cluster and inject all of 3 client requests to 3 different 
nodes (node X, Y, Z)
 # Node X sends its prepare messages (ballot number=1) to all nodes and all 
nodes accept it
 # Node X sends its propose message to itself, causing its inProgress value to 
be "X".
 # Node Y sends its prepare messages (ballot number=2) to all nodes.
This also causes the rest of node X propose messages to be invalid because its 
ballot number is smaller than node Y prepare messages.
 # In our scenario, the prepare response messages from node Y and Z comes first 
before prepare response message from node X, causing the node Y to unrecognize 
the state of node X which already accepted value "X" (step 3).
 # But since our query of client request 2 has an IF, that said IF value_1='X', 
therefore node Y will not continue on sending propose messages to all nodes.
Up to this point, it means none of the queries are committed to the server.
 # Node Z now sends its prepare messages to all nodes and all nodes accept it.
 # In our scenario, now the node X returns its response first where it also let 
node Z knows about its inProgress Value "X".
>From here, node Z will propose and commit client request-1 (with value "X") 
>instead of client-request-3.
 # Therefore, we ended up having client request 1 stored to the server, 
although client request-3 was the one that is said successful.

We are ready to assist, if any further information is needed.

 

 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14702) Cassandra Write failed even when the required nodes to Ack(consistency) are up.

2018-09-26 Thread Jeff Jirsa (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629438#comment-16629438
 ] 

Jeff Jirsa commented on CASSANDRA-14702:


Were you adding/removing/replacing hosts at this time?


> Cassandra Write failed even when the required nodes to Ack(consistency) are 
> up.
> ---
>
> Key: CASSANDRA-14702
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14702
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rohit Singh
>Priority: Blocker
>
> Hi,
> We have following configuration in our project for cassandra. 
> Total nodes in Cluster-5
> Replication Factor- 3
> Consistency- LOCAL_QUORUM
> We get the writetimeout exception from cassandra even when 2 nodes are up and 
> why does stack trace says that 3 replica were required when consistency is 2?
> Below is the exception we got:-
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency LOCAL_QUORUM (3 replica were required but 
> only 2 acknowledged the write)
>  at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:59)
>  at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
>  at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:289)
>  at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:269)
>  at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14444) Got NPE when querying Cassandra 3.11.2

2018-09-26 Thread Jeff Jirsa (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-1:
---
Fix Version/s: 3.11.x

> Got NPE when querying Cassandra 3.11.2
> --
>
> Key: CASSANDRA-1
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Ubuntu 14.04, JDK 1.8.0_171. 
> Cassandra 3.11.2
>Reporter: Xiaodong Xie
>Assignee: Xiaodong Xie
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We just upgraded our Cassandra cluster from 2.2.6 to 3.11.2
> After upgrading, we immediately got exceptions in Cassandra like this one: 
>  
> {code}
> ERROR [Native-Transport-Requests-1] 2018-05-11 17:10:21,994 
> QueryMessage.java:129 - Unexpected error during query
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:248)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:92)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:666) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:255) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_171]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.11.2.jar:3.11.2]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> {code}
>  
> The table schema is like:
> {code}
> CREATE TABLE example.example_table (
>  id bigint,
>  hash text,
>  json text,
>  PRIMARY KEY (id, hash)
> ) WITH COMPACT STORAGE
> {code}
>  
> The query is something like:
> {code}
> "select * from example.example_table;" // (We do know this is bad practise, 
> and we are trying to fix that right now)
> {code}
> with fetch-size as 200, using DataStax Java driver. 
> This table contains about 20k rows. 
>  
> Actually, the fix is quite simple, 
>  
> {code}
> --- a/src/java/org/apache/cassandra/service/pager/PagingState.java
> +++ b/src/java/org/apache/cassandra/service/pager/PagingState.java
> @@ -46,7 +46,7 @@ public class PagingState
> public PagingState(ByteBuffer partitionKey, RowMark rowMark, int remaining, 
> int remainingInPartition)
>  {
> - this.partitionKey = partitionKey;
> + this.partitionKey = partitionKey == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER 
> : partitionKey;
>  this.rowMark = rowMark;
>  this.r

[jira] [Assigned] (CASSANDRA-14444) Got NPE when querying Cassandra 3.11.2

2018-09-26 Thread Jeff Jirsa (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa reassigned CASSANDRA-1:
--

Assignee: Xiaodong Xie

> Got NPE when querying Cassandra 3.11.2
> --
>
> Key: CASSANDRA-1
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Ubuntu 14.04, JDK 1.8.0_171. 
> Cassandra 3.11.2
>Reporter: Xiaodong Xie
>Assignee: Xiaodong Xie
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We just upgraded our Cassandra cluster from 2.2.6 to 3.11.2
> After upgrading, we immediately got exceptions in Cassandra like this one: 
>  
> {code}
> ERROR [Native-Transport-Requests-1] 2018-05-11 17:10:21,994 
> QueryMessage.java:129 - Unexpected error during query
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:248)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:92)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:666) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:255) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_171]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.11.2.jar:3.11.2]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> {code}
>  
> The table schema is like:
> {code}
> CREATE TABLE example.example_table (
>  id bigint,
>  hash text,
>  json text,
>  PRIMARY KEY (id, hash)
> ) WITH COMPACT STORAGE
> {code}
>  
> The query is something like:
> {code}
> "select * from example.example_table;" // (We do know this is bad practise, 
> and we are trying to fix that right now)
> {code}
> with fetch-size as 200, using DataStax Java driver. 
> This table contains about 20k rows. 
>  
> Actually, the fix is quite simple, 
>  
> {code}
> --- a/src/java/org/apache/cassandra/service/pager/PagingState.java
> +++ b/src/java/org/apache/cassandra/service/pager/PagingState.java
> @@ -46,7 +46,7 @@ public class PagingState
> public PagingState(ByteBuffer partitionKey, RowMark rowMark, int remaining, 
> int remainingInPartition)
>  {
> - this.partitionKey = partitionKey;
> + this.partitionKey = partitionKey == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER 
> : partitionKey;
>  this.rowMark = rowMark;
>

[jira] [Commented] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-26 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629434#comment-16629434
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12438:
---

Hi all,

 

Our team from UCARE University of Chicago, have been able to reproduce this bug 
consistently with our model checker.
Here are the workload and scenario of the bug:

Workload: 3 node-cluster (let's call them node X, Y, and Z), 1 Crash, 1 Reboot 
events, with 3 client requests (where node X will be the coordinator node for 
all client requests.

Scenario:
 # Start the 3 nodes and setup the CONSISTENCY = ONE.
 # Inject client request 1 as described in this bug description:
Insert  (along with many others)
 # But before any PREPARE messages have been sent by the node X, node Z has 
crashed.
 # Client request 1 is successfully committed in node X and Y.
 # Reboot node Z.
 # Inject client request 2 & 3 as described in this bug description:
Update  (along with others for which A=V_a1)
Update  (along with many others for which 
A=V_a2)
(**Although Update 3 can also be ignored if we want to simplify the bug 
scenario)
 # If client request-2 finished first without being interfered by client 
request-3, then we expect to see:

If the client request-3 interfere client request-2 or is executed before client 
request-2 for any reason, then we expect to see:

 # But our model checker shows that, if we do a read request to node Z, then we 
will see: // some fields are null
But if we do a read request to node X or Y, then we will get a complete result.
Which means we end up in an inconsistency view among the nodes.

If we run this scenario with CONSISTENCY.ALL we will not see this bug to happen.

We are happy to assist you guys to debug this issue.

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good

[jira] [Commented] (CASSANDRA-14727) Transient Replication: EACH_QUORUM not implemented

2018-09-26 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629192#comment-16629192
 ] 

Alex Petrov commented on CASSANDRA-14727:
-

+1, thank you for expanding comments, too!

I've left a couple of minor notes here: 
https://github.com/apache/cassandra/pull/275/files

> Transient Replication: EACH_QUORUM not implemented
> --
>
> Key: CASSANDRA-14727
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14727
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Benedict
>Priority: Major
> Fix For: 4.0
>
>
> Transient replication cannot presently handle EACH_QUORUM consistency; reads 
> and writes should currently fail, though without good error messages.  Not 
> clear if this is acceptable for GA, since we cannot impose this limitation at 
> Keyspace declaration time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14792) skip TestRepair.test_dead_coordinator dtest in 4.0

2018-09-26 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629176#comment-16629176
 ] 

Alex Petrov edited comment on CASSANDRA-14792 at 9/26/18 5:34 PM:
--

+1, thank you for fixing this one! 

For completeness, the test was failing with 

{code}
java.lang.RuntimeException: java.lang.IllegalArgumentException: Invalid state 
transition FINALIZED -> FAILED
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:214)
at 
org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{code}

This happens only every once in a while, since a completed task is racing with 
cancel. We could improve a failure message on cancelation if we had a 
distinction between failed state and cancelled one, but this might be not worth 
it.


was (Author: ifesdjeen):
+1, thank you for fixing this one! 

> skip TestRepair.test_dead_coordinator dtest in 4.0
> --
>
> Key: CASSANDRA-14792
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14792
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> CASSANDRA-14763 changed the coordinator behavior to not cleanup old repair 
> sessions, so this test doesn't really make sense anymore. We should just skip 
> it in 4.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14792) skip TestRepair.test_dead_coordinator dtest in 4.0

2018-09-26 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629176#comment-16629176
 ] 

Alex Petrov commented on CASSANDRA-14792:
-

+1, thank you for fixing this one! 

> skip TestRepair.test_dead_coordinator dtest in 4.0
> --
>
> Key: CASSANDRA-14792
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14792
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> CASSANDRA-14763 changed the coordinator behavior to not cleanup old repair 
> sessions, so this test doesn't really make sense anymore. We should just skip 
> it in 4.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14767) Embedded cassandra not working after jdk10 upgrade

2018-09-26 Thread Jeff Jirsa (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-14767:
---
Priority: Minor  (was: Blocker)

> Embedded cassandra not working after jdk10 upgrade
> --
>
> Key: CASSANDRA-14767
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14767
> Project: Cassandra
>  Issue Type: Bug
>Reporter: parthiban
>Priority: Minor
>
> Embedded cassandra not working after jdk10 upgrade. Could some one help me on 
> this.
> Cassandra config:
> {{try \{ EmbeddedCassandraServerHelper.startEmbeddedCassandra(); }catch 
> (Exception e) \{ LOGGER.error(" CommonConfig ", " cluster()::Exception while 
> creating cluster ", e); System.setProperty("cassandra.config", 
> "cassandra.yaml"); DatabaseDescriptor.daemonInitialization(); 
> EmbeddedCassandraServerHelper.startEmbeddedCassandra(); } Cluster cluster = 
> Cluster.builder() 
> .addContactPoints(environment.getProperty(TextToClipConstants.CASSANDRA_CONTACT_POINTS)).withPort(Integer.parseInt(environment.getProperty(TextToClipConstants.CASSANDRA_PORT))).build();
>  Session session = cluster.connect(); 
> session.execute(KEYSPACE_CREATION_QUERY); 
> session.execute(KEYSPACE_ACTIVATE_QUERY); }}
>  
> {{build.gradle}}
> {{buildscript \{ ext { springBootVersion = '2.0.1.RELEASE' } repositories \{ 
> mavenCentral() mavenLocal() } dependencies \{ 
> classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
>  classpath ("com.bmuschko:gradle-docker-plugin:3.2.1") classpath 
> ("org.sonarsource.scanner.gradle:sonarqube-gradle-plugin:2.5") 
> classpath("au.com.dius:pact-jvm-provider-gradle_2.12:3.5.13") classpath 
> ("com.moowork.gradle:gradle-node-plugin:1.2.0") } } plugins \{ //id 
> "au.com.dius.pact" version "3.5.7" id "com.gorylenko.gradle-git-properties" 
> version "1.4.17" id "de.undercouch.download" version "3.4.2" } apply plugin: 
> 'java' apply plugin: 'eclipse' apply plugin: 'org.springframework.boot' apply 
> plugin: 'io.spring.dependency-management' apply plugin: 
> 'com.bmuschko.docker-remote-api' apply plugin: 'jacoco' apply plugin: 
> 'maven-publish' apply plugin: 'org.sonarqube' apply plugin: 
> 'au.com.dius.pact' apply plugin: 'scala' sourceCompatibility = 1.8 
> repositories \{ mavenCentral() maven { url "https://repo.spring.io/milestone"; 
> } mavenLocal() } ext \{ springCloudVersion = 'Finchley.RELEASE' } pact \{ 
> serviceProviders { rxorder { publish { pactDirectory = 
> '/Users/sv/Documents/wag-doc-text2clip/target/pacts' // defaults to 
> $buildDir/pacts pactBrokerUrl = 'http://localhost:80' version=2.0 } } } } 
> //start of integration tests changes sourceSets \{ integrationTest { java { 
> compileClasspath += main.output + test.output runtimeClasspath += main.output 
> + test.output srcDir file('test/functional-api/java') } resources.srcDir 
> file('test/functional-api/resources') } } configurations \{ 
> integrationTestCompile.extendsFrom testCompile 
> integrationTestRuntime.extendsFrom testRuntime } //end of integration tests 
> changes dependencies \{ //web (Tomcat, Logging, Rest) compile group: 
> 'org.springframework.boot', name: 'spring-boot-starter-web' // Redis 
> //compile group: 'org.springframework.boot', name: 
> 'spring-boot-starter-data-redis' //Mongo Starter compile group: 
> 'org.springframework.boot', name:'spring-boot-starter-data-mongodb' // 
> Configuration processor - To Generate MetaData Files. The files are designed 
> to let developers offer â€œcode completionâ€? as users are working with 
> application.properties compile group: 'org.springframework.boot', name: 
> 'spring-boot-configuration-processor' // Actuator - Monitoring compile group: 
> 'org.springframework.boot', name: 'spring-boot-starter-actuator' //Sleuth - 
> Tracing compile group: 'org.springframework.cloud', name: 
> 'spring-cloud-starter-sleuth' //Hystrix - Circuit Breaker compile group: 
> 'org.springframework.cloud', name: 'spring-cloud-starter-netflix-hystrix' // 
> Hystrix - Dashboard compile group: 'org.springframework.cloud', name: 
> 'spring-cloud-starter-netflix-hystrix-dashboard' // Thymeleaf compile group: 
> 'org.springframework.boot', name: 'spring-boot-starter-thymeleaf' //Voltage 
> // Device Detection compile group: 'org.springframework.boot', name: 
> 'spring-boot-starter-data-cassandra', version:'2.0.4.RELEASE' compile group: 
> 'com.google.guava', name: 'guava', version: '23.2-jre' 
> compile('com.google.code.gson:gson:2.8.0') compile('org.json:json:20170516') 
> //Swagger compile group: 'io.springfox', name: 'springfox-swagger2', 
> version:'2.8.0' compile group: 'io.springfox', name: 'springfox-swagger-ui', 
> version:'2.8.0' //jkd10 fixes compile group: 'javax.xml.bind',name: 
> 'jaxb-api', version:'2.3.0' compile group: 'javax.xml.soap', name: 
> 'javax.xml.s

[jira] [Updated] (CASSANDRA-14786) Attempted to delete non-existing file CommitLog

2018-09-26 Thread Jeff Jirsa (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-14786:
---
Priority: Critical  (was: Blocker)

> Attempted to delete non-existing file CommitLog
> ---
>
> Key: CASSANDRA-14786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14786
> Project: Cassandra
>  Issue Type: Bug
> Environment: RedHat 6.8 x64
> Apache Cassandra 2.2.9
>Reporter: Riccardo Paoli
>Priority: Critical
> Attachments: Cassandra_2209.zip
>
>
> Hi all,
> we write here for the first time so forgive us if we forget something.
> At one of our customers we have installed some Genesys applications that use 
> a Cassandra cluster with 3 nodes.
> For several weeks, at regular intervals, Cassandra's trials encounter 
> problems and start shutdown. In particular, the error that is reported on the 
> logs is as follows:
> NODE 2
>  
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2018-09-21 23:05:48,718 CommitLog.java:488 - 
> Failed managing commit log segments. Commit disk failure policy is stop; 
> terminating thread
> java.lang.AssertionError: attempted to delete non-existing file 
> CommitLog-5-1537387998650.log
> {code}
>  
> NODE 1
>  
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2018-09-22 01:04:53,488 CommitLog.java:488 - 
> Failed managing commit log segments. Commit disk failure policy is stop; 
> terminating thread
> java.lang.AssertionError: attempted to delete non-existing file 
> CommitLog-5-1537387930979.log
> {code}
>  
> NODE 3
>  
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2018-09-22 04:31:56,176 CommitLog.java:488 - 
> Failed managing commit log segments. Commit disk failure policy is stop; 
> terminating thread
> java.lang.AssertionError: attempted to delete non-existing file 
> CommitLog-5-1537388059095.log
> {code}
>  
> h6.  
> Is it possible to understand the cause?
> Attached the logs of the 22/09 error and the cluster configuration.
> Thanks to the availability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14297) Optional startup delay for peers should wait for count rather than percentage

2018-09-26 Thread Joseph Lynch (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629160#comment-16629160
 ] 

Joseph Lynch commented on CASSANDRA-14297:
--

I'm changing this to a bug since I think the current user interface is not 
possible for users to correctly configure and I hope we don't ship 4.0 with the 
percentage option instead of a count. If someone thinks that there are 
plausible settings of the existing configuration options users can use we can 
change this back to an improvement.

> Optional startup delay for peers should wait for count rather than percentage
> -
>
> Key: CASSANDRA-14297
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14297
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: 4.0-feature-freeze-review-requested, PatchAvailable
>
> As I commented in CASSANDRA-13993, the current wait for functionality is a 
> great step in the right direction, but I don't think that the current setting 
> (70% of nodes in the cluster) is the right configuration option. First I 
> think this because 70% will not protect against errors as if you wait for 70% 
> of the cluster you could still very easily have {{UnavailableException}} or 
> {{ReadTimeoutException}} exceptions. This is because if you have even two 
> nodes down in different racks in a Cassandra cluster these exceptions are 
> possible (or with the default {{num_tokens}} setting of 256 it is basically 
> guaranteed). Second I think this option is not easy for operators to set, the 
> only setting I could think of that would "just work" is 100%.
> I proposed in that ticket instead of having `block_for_peers_percentage` 
> defaulting to 70%, we instead have `block_for_peers` as a count of nodes that 
> are allowed to be down before the starting node makes itself available as a 
> coordinator. Of course, we would still have the timeout to limit startup time 
> and deal with really extreme situations (whole datacenters down etc).
> I started working on a patch for this change [on 
> github|https://github.com/jasobrown/cassandra/compare/13993...jolynch:13993], 
> and am happy to finish it up with unit tests and such if someone can 
> review/commit it (maybe [~aweisberg]?).
> I think the short version of my proposal is we replace:
> {noformat}
> block_for_peers_percentage: 
> {noformat}
> with either
> {noformat}
> block_for_peers: 
> {noformat}
> or, if we want to do even better imo and enable advanced operators to finely 
> tune this behavior (while still having good defaults that work for almost 
> everyone):
> {noformat}
> block_for_peers_local_dc:  
> block_for_peers_each_dc: 
> block_for_peers_all_dcs: 
> {noformat}
> For example if an operator knows that they must be available at 
> {{LOCAL_QUORUM}} they would set {{block_for_peers_local_dc=1}}, if they use 
> {{EACH_QUOURM}} they would set {{block_for_peers_local_dc=1}}, if they use 
> {{QUORUM}} (RF=3, dcs=2) they would set {{block_for_peers_all_dcs=2}}. 
> Naturally everything would of course have a timeout to prevent startup taking 
> too long.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14297) Optional startup delay for peers should wait for count rather than percentage

2018-09-26 Thread Joseph Lynch (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14297:
-
Issue Type: Bug  (was: Improvement)

> Optional startup delay for peers should wait for count rather than percentage
> -
>
> Key: CASSANDRA-14297
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14297
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: 4.0-feature-freeze-review-requested, PatchAvailable
>
> As I commented in CASSANDRA-13993, the current wait for functionality is a 
> great step in the right direction, but I don't think that the current setting 
> (70% of nodes in the cluster) is the right configuration option. First I 
> think this because 70% will not protect against errors as if you wait for 70% 
> of the cluster you could still very easily have {{UnavailableException}} or 
> {{ReadTimeoutException}} exceptions. This is because if you have even two 
> nodes down in different racks in a Cassandra cluster these exceptions are 
> possible (or with the default {{num_tokens}} setting of 256 it is basically 
> guaranteed). Second I think this option is not easy for operators to set, the 
> only setting I could think of that would "just work" is 100%.
> I proposed in that ticket instead of having `block_for_peers_percentage` 
> defaulting to 70%, we instead have `block_for_peers` as a count of nodes that 
> are allowed to be down before the starting node makes itself available as a 
> coordinator. Of course, we would still have the timeout to limit startup time 
> and deal with really extreme situations (whole datacenters down etc).
> I started working on a patch for this change [on 
> github|https://github.com/jasobrown/cassandra/compare/13993...jolynch:13993], 
> and am happy to finish it up with unit tests and such if someone can 
> review/commit it (maybe [~aweisberg]?).
> I think the short version of my proposal is we replace:
> {noformat}
> block_for_peers_percentage: 
> {noformat}
> with either
> {noformat}
> block_for_peers: 
> {noformat}
> or, if we want to do even better imo and enable advanced operators to finely 
> tune this behavior (while still having good defaults that work for almost 
> everyone):
> {noformat}
> block_for_peers_local_dc:  
> block_for_peers_each_dc: 
> block_for_peers_all_dcs: 
> {noformat}
> For example if an operator knows that they must be available at 
> {{LOCAL_QUORUM}} they would set {{block_for_peers_local_dc=1}}, if they use 
> {{EACH_QUOURM}} they would set {{block_for_peers_local_dc=1}}, if they use 
> {{QUORUM}} (RF=3, dcs=2) they would set {{block_for_peers_all_dcs=2}}. 
> Naturally everything would of course have a timeout to prevent startup taking 
> too long.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14762) Transient node receives full data requests in dtests

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629102#comment-16629102
 ] 

Benedict commented on CASSANDRA-14762:
--

bq. I'm not sure it is as valuable just because the race is much smaller since 
it's not O(gossip) it's O(time to switch threads).

I was thinking of programmer error more than the race condition, but I agree 
it's much less impactful.  I might rustle it up anyway, while we're here.

> Transient node receives full data requests in dtests
> 
>
> Key: CASSANDRA-14762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Benedict
>Priority: Major
> Fix For: 4.0
>
>
> I saw this running them on my laptop with rapid write protection disabled. 
> Attached is a patch for disabling rapid write protection in the transient 
> dtests.
> {noformat}
> .Exception in thread Thread-19:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py",
>  line 916, in _bootstrap_inner
> self.run()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 180, in run
> self.scan_and_report()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 173, in scan_and_report
> on_error_call(errordata)
>   File "/Users/aweisberg/repos/cassandra-dtest/dtest_setup.py", line 137, in 
> _log_error_handler
> pytest.fail("Error details: \n{message}".format(message=message))
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/lib/python3.6/site-packages/_pytest/outcomes.py",
>  line 96, in fail
> raise Failed(msg=msg, pytrace=pytrace)
> Failed: Error details:
> Errors seen in logs for: node3
> node3: ERROR [ReadStage-1] 2018-09-18 12:28:48,344 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-1,5,main]
> org.apache.cassandra.exceptions.InvalidRequestException: Attempted to serve 
> transient data request from full node in 
> org.apache.cassandra.db.ReadCommandVerbHandler@3c55e0ff
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.validateTransientStatus(ReadCommandVerbHandler.java:104)
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:53)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:110)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-26 Thread Jay Zhuang (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629010#comment-16629010
 ] 

Jay Zhuang commented on CASSANDRA-14791:


Hi [~mshuler], [~spo...@gmail.com], any idea if there's a permission setting we 
could set for the Jenkins Job/Slave?

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-14793) Improve system table handling when losing a disk when using JBOD

2018-09-26 Thread Marcus Eriksson (JIRA)

Marcus Eriksson created CASSANDRA-14793:
---

 Summary: Improve system table handling when losing a disk when 
using JBOD
 Key: CASSANDRA-14793
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14793
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
 Fix For: 4.0


We should improve the way we handle disk failures when losing a disk in a JBOD 
setup

 One way could be to pin the system tables to a special data directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14762) Transient node receives full data requests in dtests

2018-09-26 Thread Ariel Weisberg (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628901#comment-16628901
 ] 

Ariel Weisberg edited comment on CASSANDRA-14762 at 9/26/18 3:08 PM:
-

bq. Is this simply you reasoning out the rationale for issuing the requests, or 
have you spotted an issue with the patch that means we are not doing so?
Sorry it's just socratic code review. I don't see any problems.

+1

Checking locally is nice, but I'm not sure it is as valuable just because the 
race is much smaller since it's not O(gossip) it's O(time to switch threads). 

If you want to do it here or in another ticket it's still good to have.


was (Author: aweisberg):
bq. Is this simply you reasoning out the rationale for issuing the requests, or 
have you spotted an issue with the patch that means we are not doing so?
Sorry it's just socratic code review. I don't see any problems.

+1

Checking locally is nice, but I'm not sure it is as valuable just because the 
race is much smaller since it's not O(gossip) it's O(time to switch threads). 

> Transient node receives full data requests in dtests
> 
>
> Key: CASSANDRA-14762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Benedict
>Priority: Major
> Fix For: 4.0
>
>
> I saw this running them on my laptop with rapid write protection disabled. 
> Attached is a patch for disabling rapid write protection in the transient 
> dtests.
> {noformat}
> .Exception in thread Thread-19:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py",
>  line 916, in _bootstrap_inner
> self.run()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 180, in run
> self.scan_and_report()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 173, in scan_and_report
> on_error_call(errordata)
>   File "/Users/aweisberg/repos/cassandra-dtest/dtest_setup.py", line 137, in 
> _log_error_handler
> pytest.fail("Error details: \n{message}".format(message=message))
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/lib/python3.6/site-packages/_pytest/outcomes.py",
>  line 96, in fail
> raise Failed(msg=msg, pytrace=pytrace)
> Failed: Error details:
> Errors seen in logs for: node3
> node3: ERROR [ReadStage-1] 2018-09-18 12:28:48,344 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-1,5,main]
> org.apache.cassandra.exceptions.InvalidRequestException: Attempted to serve 
> transient data request from full node in 
> org.apache.cassandra.db.ReadCommandVerbHandler@3c55e0ff
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.validateTransientStatus(ReadCommandVerbHandler.java:104)
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:53)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:110)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14762) Transient node receives full data requests in dtests

2018-09-26 Thread Ariel Weisberg (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628901#comment-16628901
 ] 

Ariel Weisberg commented on CASSANDRA-14762:


bq. Is this simply you reasoning out the rationale for issuing the requests, or 
have you spotted an issue with the patch that means we are not doing so?
Sorry it's just socratic code review. I don't see any problems.

+1

Checking locally is nice, but I'm not sure it is as valuable just because the 
race is much smaller since it's not O(gossip) it's O(time to switch threads). 

> Transient node receives full data requests in dtests
> 
>
> Key: CASSANDRA-14762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Benedict
>Priority: Major
> Fix For: 4.0
>
>
> I saw this running them on my laptop with rapid write protection disabled. 
> Attached is a patch for disabling rapid write protection in the transient 
> dtests.
> {noformat}
> .Exception in thread Thread-19:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py",
>  line 916, in _bootstrap_inner
> self.run()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 180, in run
> self.scan_and_report()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 173, in scan_and_report
> on_error_call(errordata)
>   File "/Users/aweisberg/repos/cassandra-dtest/dtest_setup.py", line 137, in 
> _log_error_handler
> pytest.fail("Error details: \n{message}".format(message=message))
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/lib/python3.6/site-packages/_pytest/outcomes.py",
>  line 96, in fail
> raise Failed(msg=msg, pytrace=pytrace)
> Failed: Error details:
> Errors seen in logs for: node3
> node3: ERROR [ReadStage-1] 2018-09-18 12:28:48,344 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-1,5,main]
> org.apache.cassandra.exceptions.InvalidRequestException: Attempted to serve 
> transient data request from full node in 
> org.apache.cassandra.db.ReadCommandVerbHandler@3c55e0ff
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.validateTransientStatus(ReadCommandVerbHandler.java:104)
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:53)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:110)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628602#comment-16628602
 ] 

Benedict edited comment on CASSANDRA-14742 at 9/26/18 11:20 AM:


Patch looks good overall, just a few nits:

# Right now, {{ReplicaPlans}} is organised into counter writes, regular writes, 
regular write utilities, reads, reads utilities; I think it would be cleanest 
to keep the batch write utilities similarly proximal to the batch writes 
themselves, for consistency
# {{syncWriteBatchedMutations}} and {{forBatchlogWrite}} each accept a 
{{localDc}} parameter - this seems a bit weird, since it's a global variable, 
and only ever invoked with this (but also, we obtain it inconsistently, by 
asking the snitch instead of the cached {{localDc}}.  Perhaps they should each 
just use the latter, without requiring it as a parameter?  (I realise this is 
pre-existing)
# Unused imports in {{ReplicaPlans}}


was (Author: benedict):
Patch looks good overall, just a few nits:

# Right now, {{ReplicaPlans}} is organised into counter writes, regular writes, 
regular write utilities, reads, reads utilities; I think it would be cleanest 
to keep the batch write utilities similarly proximal to the batch writes 
themselves, for consistency
# {{syncWriteBatchedMutations}} and {{forBatchlogWrite}} each accept a 
{{localDc}} parameter - this seems a bit weird, since it's a global variable, 
and only ever invoked with this (but also, we obtain it inconsistently, by 
asking the snitch instead of the cached {{localDc}}.  Perhaps they should each 
just use the latter, without requiring it as a parameter?
# Unused imports in {{ReplicaPlans}}

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628602#comment-16628602
 ] 

Benedict commented on CASSANDRA-14742:
--

Patch looks good overall, just a few nits:

# Right now, {{ReplicaPlans}} is organised into counter writes, regular writes, 
regular write utilities, reads, reads utilities; I think it would be cleanest 
to keep the batch write utilities similarly proximal to the batch writes 
themselves, for consistency
# {{syncWriteBatchedMutations}} and {{forBatchlogWrite}} each accept a 
{{localDc}} parameter - this seems a bit weird, since it's a global variable, 
and only ever invoked with this (but also, we obtain it inconsistently, by 
asking the snitch instead of the cached {{localDc}}.  Perhaps they should each 
just use the latter, without requiring it as a parameter?
# Unused imports in {{ReplicaPlans}}

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14770) Introduce RangesAtEndpoint.unwrap to simplify StreamSession.addTransferRanges

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628519#comment-16628519
 ] 

Benedict commented on CASSANDRA-14770:
--

Thanks.  Committed as 
[914c66685c5bebe1624d827a9b4562b73a08c297|https://github.com/apache/cassandra/commit/914c66685c5bebe1624d827a9b4562b73a08c297]

> Introduce RangesAtEndpoint.unwrap to simplify StreamSession.addTransferRanges
> -
>
> Key: CASSANDRA-14770
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14770
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Benedict
>Assignee: Benedict
>Priority: Trivial
> Fix For: 4.0
>
>
> Arguably, since this is only performed in one place, we could leave it in 
> {{addTransferRanges}}, but it should be a helper method anyway, and given 
> {{unwrap()}} is a feature of {{Range}}, we should implement that in 
> {{RangesAtEndpoint}} IMO.  I have introduced this method, which avoids 
> allocating a new collection unnecessarily, corroborates we have at most one 
> wrap-around range, and introduced unit tests for the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14770) Introduce RangesAtEndpoint.unwrap to simplify StreamSession.addTransferRanges

2018-09-26 Thread Benedict (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14770:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Introduce RangesAtEndpoint.unwrap to simplify StreamSession.addTransferRanges
> -
>
> Key: CASSANDRA-14770
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14770
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Benedict
>Assignee: Benedict
>Priority: Trivial
> Fix For: 4.0
>
>
> Arguably, since this is only performed in one place, we could leave it in 
> {{addTransferRanges}}, but it should be a helper method anyway, and given 
> {{unwrap()}} is a feature of {{Range}}, we should implement that in 
> {{RangesAtEndpoint}} IMO.  I have introduced this method, which avoids 
> allocating a new collection unnecessarily, corroborates we have at most one 
> wrap-around range, and introduced unit tests for the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra git commit: Introduce RangesAtEndpoint.unwrap; simplify StreamSession.addTransferRanges

2018-09-26 Thread benedict

Repository: cassandra
Updated Branches:
  refs/heads/trunk 8554d6b35 -> 914c66685


Introduce RangesAtEndpoint.unwrap; simplify StreamSession.addTransferRanges


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/914c6668
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/914c6668
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/914c6668

Branch: refs/heads/trunk
Commit: 914c66685c5bebe1624d827a9b4562b73a08c297
Parents: 8554d6b
Author: Benedict Elliott Smith 
Authored: Tue Sep 18 13:17:15 2018 +0100
Committer: Benedict Elliott Smith 
Committed: Wed Sep 26 11:12:12 2018 +0100

--
 CHANGES.txt |  1 +
 .../cassandra/locator/RangesAtEndpoint.java | 31 +
 .../cassandra/streaming/StreamSession.java  | 11 +
 .../locator/ReplicaCollectionTest.java  | 46 +++-
 4 files changed, 69 insertions(+), 20 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/914c6668/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 9139822..e227c40 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * Introduce RangesAtEndpoint.unwrap to simplify 
StreamSession.addTransferRanges (CASSANDRA-14770)
  * LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead 
of Unavailable (CASSANDRA-14735)
  * Avoid creating empty compaction tasks after truncate (CASSANDRA-14780)
  * Fail incremental repair prepare phase if it encounters sstables from 
un-finalized sessions (CASSANDRA-14763)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/914c6668/src/java/org/apache/cassandra/locator/RangesAtEndpoint.java
--
diff --git a/src/java/org/apache/cassandra/locator/RangesAtEndpoint.java 
b/src/java/org/apache/cassandra/locator/RangesAtEndpoint.java
index f57c28e..8319d92 100644
--- a/src/java/org/apache/cassandra/locator/RangesAtEndpoint.java
+++ b/src/java/org/apache/cassandra/locator/RangesAtEndpoint.java
@@ -165,6 +165,37 @@ public class RangesAtEndpoint extends 
AbstractReplicaCollection range : replica.range().unwrap())
+builder.add(replica.decorateSubrange(range));
+}
+return builder.build();
+}
+
 public static Collector 
collector(InetAddressAndPort endpoint)
 {
 return collector(ImmutableSet.of(), () -> new Builder(endpoint));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/914c6668/src/java/org/apache/cassandra/streaming/StreamSession.java
--
diff --git a/src/java/org/apache/cassandra/streaming/StreamSession.java 
b/src/java/org/apache/cassandra/streaming/StreamSession.java
index d7d0836..80fcebb 100644
--- a/src/java/org/apache/cassandra/streaming/StreamSession.java
+++ b/src/java/org/apache/cassandra/streaming/StreamSession.java
@@ -335,15 +335,8 @@ public class StreamSession implements 
IEndpointStateChangeSubscriber
 //Was it safe to remove this normalize, sorting seems not to matter, 
merging? Maybe we should have?
 //Do we need to unwrap here also or is that just making it worse?
 //Range and if it's transient
-RangesAtEndpoint.Builder unwrappedRanges = 
RangesAtEndpoint.builder(replicas.endpoint(), replicas.size());
-for (Replica replica : replicas)
-{
-for (Range unwrapped : replica.range().unwrap())
-{
-unwrappedRanges.add(new Replica(replica.endpoint(), unwrapped, 
replica.isFull()));
-}
-}
-List streams = 
getOutgoingStreamsForRanges(unwrappedRanges.build(), stores, pendingRepair, 
previewKind);
+RangesAtEndpoint unwrappedRanges = replicas.unwrap();
+List streams = 
getOutgoingStreamsForRanges(unwrappedRanges, stores, pendingRepair, 
previewKind);
 addTransferStreams(streams);
 Set> toBeUpdated = 
transferredRangesPerKeyspace.get(keyspace);
 if (toBeUpdated == null)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/914c6668/test/unit/org/apache/cassandra/locator/ReplicaCollectionTest.java
--
diff --git a/test/unit/org/apache/cassandra/locator/ReplicaCollectionTest.java 
b/test/unit/org/apache/cassandra/locator/ReplicaCollectionTest.java
index f937f96..c289d50 100644
--- a/test/unit/org/apache/cassandra/locator/ReplicaCollectionTest.java
+++ b/test/unit/org/apache/cassandra/locator/ReplicaCollectionTest.java
@@ -33,6 +33,7 @@ import org.junit.Assert;
 import org.junit.Test;
 
 import java.net.UnknownHostException;
+import java.util.ArrayList;
 import java.util.Comparator;
 import ja

[jira] [Comment Edited] (CASSANDRA-14735) LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead of Unavailable

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628496#comment-16628496
 ] 

Benedict edited comment on CASSANDRA-14735 at 9/26/18 9:57 AM:
---

Thanks, committed as 
[8554d6b35dcc5eec46ed7edc809a36c1f7fa588f|https://github.com/apache/cassandra/commit/8554d6b35dcc5eec46ed7edc809a36c1f7fa588f]


was (Author: benedict):
Thanks, committed

> LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead 
> of Unavailable
> --
>
> Key: CASSANDRA-14735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
> Fix For: 4.0
>
>
> This issue applies to all of: rapid read protection, read repair's rapid read 
> protection and read repair's rapid write protection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14735) LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead of Unavailable

2018-09-26 Thread Benedict (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14735:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks, committed

> LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead 
> of Unavailable
> --
>
> Key: CASSANDRA-14735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14735
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Minor
> Fix For: 4.0
>
>
> This issue applies to all of: rapid read protection, read repair's rapid read 
> protection and read repair's rapid write protection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra git commit: LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead of Unavailable

2018-09-26 Thread benedict

Repository: cassandra
Updated Branches:
  refs/heads/trunk 0379201c7 -> 8554d6b35


LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead of 
Unavailable

patch by Benedict; reviewed by Ariel Weisberg for CASSANDRA-14735


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/8554d6b3
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/8554d6b3
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/8554d6b3

Branch: refs/heads/trunk
Commit: 8554d6b35dcc5eec46ed7edc809a36c1f7fa588f
Parents: 0379201
Author: Benedict Elliott Smith 
Authored: Thu Sep 20 08:54:55 2018 +0100
Committer: Benedict Elliott Smith 
Committed: Wed Sep 26 10:55:11 2018 +0100

--
 CHANGES.txt |   1 +
 .../apache/cassandra/db/ConsistencyLevel.java   | 233 ++-
 .../apache/cassandra/locator/InOurDcTester.java |  93 
 .../apache/cassandra/locator/ReplicaPlan.java   |   3 -
 .../apache/cassandra/locator/ReplicaPlans.java  | 193 +--
 .../org/apache/cassandra/locator/Replicas.java  |  65 +-
 .../service/DatacenterWriteResponseHandler.java |   7 +-
 .../apache/cassandra/service/StorageProxy.java  |   6 +-
 .../reads/repair/BlockingPartitionRepair.java   |  27 ++-
 .../reads/repair/BlockingReadRepairTest.java|  10 +-
 .../DiagEventsBlockingReadRepairTest.java   |  23 +-
 .../service/reads/repair/ReadRepairTest.java|   9 +-
 12 files changed, 373 insertions(+), 297 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/8554d6b3/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 9f7958c..9139822 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead 
of Unavailable (CASSANDRA-14735)
  * Avoid creating empty compaction tasks after truncate (CASSANDRA-14780)
  * Fail incremental repair prepare phase if it encounters sstables from 
un-finalized sessions (CASSANDRA-14763)
  * Add a check for receiving digest response from transient node 
(CASSANDRA-14750)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/8554d6b3/src/java/org/apache/cassandra/db/ConsistencyLevel.java
--
diff --git a/src/java/org/apache/cassandra/db/ConsistencyLevel.java 
b/src/java/org/apache/cassandra/db/ConsistencyLevel.java
index 5a4baf7..9e884a7 100644
--- a/src/java/org/apache/cassandra/db/ConsistencyLevel.java
+++ b/src/java/org/apache/cassandra/db/ConsistencyLevel.java
@@ -17,26 +17,18 @@
  */
 package org.apache.cassandra.db;
 
-import java.util.HashMap;
-import java.util.Map;
 
-import com.google.common.collect.Iterables;
+import com.carrotsearch.hppc.ObjectIntOpenHashMap;
 import org.apache.cassandra.locator.Endpoints;
-import org.apache.cassandra.locator.ReplicaCollection;
-import org.apache.cassandra.locator.Replicas;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import org.apache.cassandra.locator.InetAddressAndPort;
-import org.apache.cassandra.locator.Replica;
 import org.apache.cassandra.schema.TableMetadata;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.exceptions.InvalidRequestException;
-import org.apache.cassandra.exceptions.UnavailableException;
 import org.apache.cassandra.locator.AbstractReplicationStrategy;
 import org.apache.cassandra.locator.NetworkTopologyStrategy;
 import org.apache.cassandra.transport.ProtocolException;
 
+import static org.apache.cassandra.locator.Replicas.countInOurDc;
+
 public enum ConsistencyLevel
 {
 ANY (0),
@@ -52,8 +44,6 @@ public enum ConsistencyLevel
 LOCAL_ONE   (10, true),
 NODE_LOCAL  (11, true);
 
-private static final Logger logger = 
LoggerFactory.getLogger(ConsistencyLevel.class);
-
 // Used by the binary protocol
 public final int code;
 private final boolean isDCLocal;
@@ -90,18 +80,27 @@ public enum ConsistencyLevel
 return codeIdx[code];
 }
 
-private int quorumFor(Keyspace keyspace)
+public static int quorumFor(Keyspace keyspace)
 {
 return 
(keyspace.getReplicationStrategy().getReplicationFactor().allReplicas / 2) + 1;
 }
 
-private int localQuorumFor(Keyspace keyspace, String dc)
+public static int localQuorumFor(Keyspace keyspace, String dc)
 {
 return (keyspace.getReplicationStrategy() instanceof 
NetworkTopologyStrategy)
  ? (((NetworkTopologyStrategy) 
keyspace.getReplicationStrategy()).getReplicationFactor(dc).allReplicas / 2) + 1
  : quorumFor(keyspace);
 }
 
+public static ObjectIntOpenHashMap eachQuorumFor(Keyspace keyspace)
+{
+NetworkTopology

[jira] [Updated] (CASSANDRA-14756) Transient Replication - range movement improvements

2018-09-26 Thread Alex Petrov (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-14756:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Transient Replication - range movement improvements
> ---
>
> Key: CASSANDRA-14756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14756
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Simplify iteration in calculateRangesToFetchWithPreferredEndpoints
> * Minor changes to calculateRangesToFetchWithPreferredEndpoints to improve 
> readability:
> * Simplify RangeRelocator code
> * Fix range relocation
> * Simplify calculateStreamAndFetchRanges
> * Unify request/transfer ranges interface (Added benefit of this change is 
> that we have a check for non-intersecting ranges)
> * Simplify iteration in calculateRangesToFetchWithPreferredEndpoints
> * Improve error messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[1/2] cassandra git commit: Transient replication: range movement improvements

2018-09-26 Thread ifesdjeen

Repository: cassandra
Updated Branches:
  refs/heads/trunk 210da3dc0 -> 0379201c7


http://git-wip-us.apache.org/repos/asf/cassandra/blob/0379201c/test/unit/org/apache/cassandra/dht/BootStrapperTest.java
--
diff --git a/test/unit/org/apache/cassandra/dht/BootStrapperTest.java 
b/test/unit/org/apache/cassandra/dht/BootStrapperTest.java
index 8ae6853..2f412ad 100644
--- a/test/unit/org/apache/cassandra/dht/BootStrapperTest.java
+++ b/test/unit/org/apache/cassandra/dht/BootStrapperTest.java
@@ -105,7 +105,6 @@ public class BootStrapperTest
 InetAddressAndPort myEndpoint = 
InetAddressAndPort.getByName("127.0.0.1");
 
 assertEquals(numOldNodes, tmd.sortedTokens().size());
-RangeStreamer s = new RangeStreamer(tmd, null, myEndpoint, 
StreamOperation.BOOTSTRAP, true, DatabaseDescriptor.getEndpointSnitch(), new 
StreamStateStore(), false, 1);
 IFailureDetector mockFailureDetector = new IFailureDetector()
 {
 public boolean isAlive(InetAddressAndPort ep)
@@ -120,26 +119,20 @@ public class BootStrapperTest
 public void remove(InetAddressAndPort ep) { throw new 
UnsupportedOperationException(); }
 public void forceConviction(InetAddressAndPort ep) { throw new 
UnsupportedOperationException(); }
 };
-s.addSourceFilter(new 
RangeStreamer.FailureDetectorSourceFilter(mockFailureDetector));
+RangeStreamer s = new RangeStreamer(tmd, null, myEndpoint, 
StreamOperation.BOOTSTRAP, true, DatabaseDescriptor.getEndpointSnitch(), new 
StreamStateStore(), mockFailureDetector, false, 1);
 assertNotNull(Keyspace.open(keyspaceName));
 s.addRanges(keyspaceName, 
Keyspace.open(keyspaceName).getReplicationStrategy().getPendingAddressRanges(tmd,
 myToken, myEndpoint));
 
 
-Collection> toFetch = 
s.toFetch().get(keyspaceName);
+Multimap toFetch = 
s.toFetch().get(keyspaceName);
 
 // Check we get get RF new ranges in total
-long rangesCount = toFetch.stream()
-   .map(Multimap::values)
-   .flatMap(Collection::stream)
-   .map(f -> f.remote)
-   .map(Replica::range)
-   .count();
-assertEquals(replicationFactor, rangesCount);
+assertEquals(replicationFactor, toFetch.size());
 
 // there isn't any point in testing the size of these collections for 
any specific size.  When a random partitioner
 // is used, they will vary.
-assert 
toFetch.stream().map(Multimap::values).flatMap(Collection::stream).count() > 0;
-assert 
toFetch.stream().map(Multimap::keySet).map(Collection::stream).noneMatch(myEndpoint::equals);
+assert toFetch.values().size() > 0;
+assert toFetch.keys().stream().noneMatch(myEndpoint::equals);
 return s;
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/0379201c/test/unit/org/apache/cassandra/dht/RangeFetchMapCalculatorTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/dht/RangeFetchMapCalculatorTest.java 
b/test/unit/org/apache/cassandra/dht/RangeFetchMapCalculatorTest.java
index 07d6377..cee4bb9 100644
--- a/test/unit/org/apache/cassandra/dht/RangeFetchMapCalculatorTest.java
+++ b/test/unit/org/apache/cassandra/dht/RangeFetchMapCalculatorTest.java
@@ -195,18 +195,26 @@ public class RangeFetchMapCalculatorTest
 addNonTrivialRangeAndSources(rangesWithSources, 21, 30, "127.0.0.3");
 
 //Return false for all except 127.0.0.5
-final Predicate filter = replica ->
+final RangeStreamer.SourceFilter filter = new 
RangeStreamer.SourceFilter()
 {
-try
+public boolean apply(Replica replica)
 {
-if 
(replica.endpoint().equals(InetAddressAndPort.getByName("127.0.0.5")))
-return false;
-else
+try
+{
+if 
(replica.endpoint().equals(InetAddressAndPort.getByName("127.0.0.5")))
+return false;
+else
+return true;
+}
+catch (UnknownHostException e)
+{
 return true;
+}
 }
-catch (UnknownHostException e)
+
+public String message(Replica replica)
 {
-return true;
+return "Doesn't match 127.0.0.5";
 }
 };
 
@@ -230,7 +238,18 @@ public class RangeFetchMapCalculatorTest
 addNonTrivialRangeAndSources(rangesWithSources, 11, 20, "127.0.0.2");
 addNonTrivialRangeAndSources(rangesWithSources, 21, 30, "127.0.0.3");
 
-final Predicate allDeadFilter = replica -> false;
+final RangeStreamer.SourceFilter allDeadFilter = new 
RangeStreamer.SourceFilter()
+{
+

[jira] [Commented] (CASSANDRA-14756) Transient Replication - range movement improvements

2018-09-26 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628486#comment-16628486
 ] 

Alex Petrov commented on CASSANDRA-14756:
-

Thank you for the review, committed to trunk as 
[0379201c7057f6bac4abf1e0f3d81a12d90abd08|https://github.com/apache/cassandra/commit/0379201c7057f6bac4abf1e0f3d81a12d90abd08]

> Transient Replication - range movement improvements
> ---
>
> Key: CASSANDRA-14756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14756
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Simplify iteration in calculateRangesToFetchWithPreferredEndpoints
> * Minor changes to calculateRangesToFetchWithPreferredEndpoints to improve 
> readability:
> * Simplify RangeRelocator code
> * Fix range relocation
> * Simplify calculateStreamAndFetchRanges
> * Unify request/transfer ranges interface (Added benefit of this change is 
> that we have a check for non-intersecting ranges)
> * Simplify iteration in calculateRangesToFetchWithPreferredEndpoints
> * Improve error messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[2/2] cassandra git commit: Transient replication: range movement improvements

2018-09-26 Thread ifesdjeen

Transient replication: range movement improvements

Patch by Alex Petrov; reviewed by Ariel Weisberg and Benedict Elliott Smith for 
CASSANDRA-14756

Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0379201c
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0379201c
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0379201c

Branch: refs/heads/trunk
Commit: 0379201c7057f6bac4abf1e0f3d81a12d90abd08
Parents: 210da3d
Author: Alex Petrov 
Authored: Mon Sep 17 11:51:56 2018 +0200
Committer: Alex Petrov 
Committed: Wed Sep 26 11:42:46 2018 +0200

--
 .../org/apache/cassandra/db/SystemKeyspace.java |  31 +-
 .../org/apache/cassandra/dht/BootStrapper.java  |   3 -
 .../cassandra/dht/RangeFetchMapCalculator.java  |   2 +-
 .../org/apache/cassandra/dht/RangeStreamer.java | 448 ++-
 .../apache/cassandra/dht/StreamStateStore.java  |  12 +-
 .../cassandra/locator/RangesAtEndpoint.java |   6 +
 .../cassandra/service/RangeRelocator.java   | 324 ++
 .../cassandra/service/StorageService.java   | 314 +
 .../apache/cassandra/streaming/StreamPlan.java  |  17 +-
 .../cassandra/streaming/StreamSession.java  |   8 +-
 .../apache/cassandra/dht/BootStrapperTest.java  |  17 +-
 .../dht/RangeFetchMapCalculatorTest.java|  79 +++-
 .../locator/OldNetworkTopologyStrategyTest.java |   3 +-
 .../service/BootstrapTransientTest.java | 113 +++--
 .../cassandra/service/MoveTransientTest.java| 321 +++--
 .../cassandra/service/StorageServiceTest.java   |  18 +-
 16 files changed, 981 insertions(+), 735 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/0379201c/src/java/org/apache/cassandra/db/SystemKeyspace.java
--
diff --git a/src/java/org/apache/cassandra/db/SystemKeyspace.java 
b/src/java/org/apache/cassandra/db/SystemKeyspace.java
index ff070a3..0f904ce 100644
--- a/src/java/org/apache/cassandra/db/SystemKeyspace.java
+++ b/src/java/org/apache/cassandra/db/SystemKeyspace.java
@@ -32,12 +32,11 @@ import javax.management.openmbean.TabularData;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.HashMultimap;
 import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
 import com.google.common.collect.SetMultimap;
 import com.google.common.collect.Sets;
 import com.google.common.io.ByteStreams;
 import com.google.common.util.concurrent.ListenableFuture;
-import org.apache.cassandra.locator.RangesAtEndpoint;
-import org.apache.cassandra.locator.Replica;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -1285,24 +1284,40 @@ public final class SystemKeyspace
 keyspace);
 }
 
-public static synchronized RangesAtEndpoint getAvailableRanges(String 
keyspace, IPartitioner partitioner)
+/**
+ * List of the streamed ranges, where transientness is encoded based on 
the source, where range was streamed from.
+ */
+public static synchronized AvailableRanges getAvailableRanges(String 
keyspace, IPartitioner partitioner)
 {
 String query = "SELECT * FROM system.%s WHERE keyspace_name=?";
 UntypedResultSet rs = executeInternal(format(query, 
AVAILABLE_RANGES_V2), keyspace);
-InetAddressAndPort endpoint = InetAddressAndPort.getLocalHost();
-RangesAtEndpoint.Builder builder = RangesAtEndpoint.builder(endpoint);
+
+ImmutableSet.Builder> full = new ImmutableSet.Builder<>();
+ImmutableSet.Builder> trans = new 
ImmutableSet.Builder<>();
 for (UntypedResultSet.Row row : rs)
 {
 Optional.ofNullable(row.getSet("full_ranges", BytesType.instance))
 .ifPresent(full_ranges -> full_ranges.stream()
 .map(buf -> byteBufferToRange(buf, partitioner))
-.forEach(range -> 
builder.add(fullReplica(endpoint, range;
+.forEach(full::add));
 Optional.ofNullable(row.getSet("transient_ranges", 
BytesType.instance))
 .ifPresent(transient_ranges -> transient_ranges.stream()
 .map(buf -> byteBufferToRange(buf, partitioner))
-.forEach(range -> 
builder.add(transientReplica(endpoint, range;
+.forEach(trans::add));
+}
+return new AvailableRanges(full.build(), trans.build());
+}
+
+public static class AvailableRanges
+{
+public Set> full;
+public Set> trans;
+
+private AvailableRanges(Set> full, Set> 
trans)
+{
+this.full = full;
+this.trans = trans;
 }
-re

[jira] [Updated] (CASSANDRA-14467) Add option to sanity check tombstones on reads/compaction

2018-09-26 Thread Marcus Eriksson (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14467:

   Resolution: Fixed
Fix Version/s: (was: 4.x)
   4.0
   Status: Resolved  (was: Patch Available)

committed as \{{96f90eee28247cf9a8520e6962b0388f193c7ca8}}, thanks!

new test result: 
https://circleci.com/workflow-run/ffc46ccd-42d8-41ce-b319-7d29db444e20

> Add option to sanity check tombstones on reads/compaction
> -
>
> Key: CASSANDRA-14467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14467
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0
>
>
> We should add an option to do a quick sanity check of tombstones on reads + 
> compaction. It should either log the error or throw an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14467) Add option to sanity check tombstones on reads/compaction

2018-09-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-14467:
---
Labels: pull-request-available  (was: )

> Add option to sanity check tombstones on reads/compaction
> -
>
> Key: CASSANDRA-14467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14467
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.x
>
>
> We should add an option to do a quick sanity check of tombstones on reads + 
> compaction. It should either log the error or throw an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14467) Add option to sanity check tombstones on reads/compaction

2018-09-26 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628462#comment-16628462
 ] 

ASF GitHub Bot commented on CASSANDRA-14467:


Github user asfgit closed the pull request at:

https://github.com/apache/cassandra-dtest/pull/30


> Add option to sanity check tombstones on reads/compaction
> -
>
> Key: CASSANDRA-14467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14467
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.x
>
>
> We should add an option to do a quick sanity check of tombstones on reads + 
> compaction. It should either log the error or throw an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra-dtest git commit: always enable tombstone validation exceptions during tests

2018-09-26 Thread marcuse

Repository: cassandra-dtest
Updated Branches:
  refs/heads/master 02c1cd774 -> 96f90eee2


always enable tombstone validation exceptions during tests

Patch by marcuse; reviewed by Ariel Weisberg for CASSANDRA-14467

Closes #30


Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/96f90eee
Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/96f90eee
Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/96f90eee

Branch: refs/heads/master
Commit: 96f90eee28247cf9a8520e6962b0388f193c7ca8
Parents: 02c1cd7
Author: Marcus Eriksson 
Authored: Thu May 31 08:41:11 2018 +0200
Committer: Marcus Eriksson 
Committed: Wed Sep 26 11:13:16 2018 +0200

--
 dtest_setup.py | 2 ++
 ttl_test.py| 2 ++
 2 files changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/96f90eee/dtest_setup.py
--
diff --git a/dtest_setup.py b/dtest_setup.py
index 295fa3f..9e3f330 100644
--- a/dtest_setup.py
+++ b/dtest_setup.py
@@ -417,6 +417,8 @@ class DTestSetup:
 # No more thrift in 4.0, and start_rpc doesn't exists anymore
 if self.cluster.version() >= '4' and 'start_rpc' in values:
 del values['start_rpc']
+if self.cluster.version() >= '4':
+values['corrupted_tombstone_strategy'] = 'exception'
 
 self.cluster.set_configuration_options(values)
 logger.debug("Done setting configuration options:\n" + 
pprint.pformat(self.cluster._config_options, indent=4))

http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/96f90eee/ttl_test.py
--
diff --git a/ttl_test.py b/ttl_test.py
index 4a7ad06..d89ca6a 100644
--- a/ttl_test.py
+++ b/ttl_test.py
@@ -567,6 +567,8 @@ class TestRecoverNegativeExpirationDate(TestHelper):
 Check that row with negative overflowed ttl is recovered by offline 
scrub
 """
 cluster = self.cluster
+if self.cluster.version() >= '4':
+
cluster.set_configuration_options(values={'corrupted_tombstone_strategy': 
'disabled'})
 cluster.populate(1).start(wait_for_binary_proto=True)
 [node] = cluster.nodelist()
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14733) AbstractReadRepair sends unnecessary data read(s)

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628435#comment-16628435
 ] 

Benedict commented on CASSANDRA-14733:
--

Ah, thanks for that insight.  Perhaps it's better to wait until we fix 
monotonic reads with transient replication, then, as at that point we may be 
requesting a separate repaired/unrepaired digest as a matter of course.

At the moment, at least transient requests already implicitly 'track' this (or, 
cannot track it, however you want to view it), so we could at least not 
re-issue these requests.  It looks like we're already special casing the 
receipt of transient responses because of this, so it would have no effect to 
reuse the existing responses.

> AbstractReadRepair sends unnecessary data read(s)
> -
>
> Key: CASSANDRA-14733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14733
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Priority: Minor
>  Labels: Availability, performance
>
> We already have one or more data responses (two in case of 'always' 
> speculation, and potentially more if transient replication is enabled, though 
> the two do not presently interact).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14733) AbstractReadRepair sends unnecessary data read(s)

2018-09-26 Thread Sam Tunnicliffe (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628422#comment-16628422
 ] 

Sam Tunnicliffe commented on CASSANDRA-14733:
-

If repaired data tracking (CASSANDRA-14145) is enabled, we'll need to re-issue 
the original requests as those responses won't include the tracking info

> AbstractReadRepair sends unnecessary data read(s)
> -
>
> Key: CASSANDRA-14733
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14733
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Priority: Minor
>  Labels: Availability, performance
>
> We already have one or more data responses (two in case of 'always' 
> speculation, and potentially more if transient replication is enabled, though 
> the two do not presently interact).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14762) Transient node receives full data requests in dtests

2018-09-26 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628397#comment-16628397
 ] 

Benedict commented on CASSANDRA-14762:
--

Thanks for the review.

bq. Is this just a clarification?

Yes; it is a no-effect change.

bq. It seems to me we use read repair to assemble the read after digest 
mismatch between full replicas and that is why we it might send messages to 
transient replicas? That is what it seems like to me.
Even though we don't send repair mutations after read-repair from transient 
replicas, on digest mismatch we still perform reads to other replicas - 
including those that we may not have contacted initially, which might include 
new transient nodes.

bq. A minor out of scope improvement would be to use the existing response and 
not repeat the read?
Agreed, see CASSANDRA-14733.  I considered doing that optimisation for this 
ticket, but given the above fact that we might issue new transient reads it 
seemed to unnecessarily complicate fixing this bug.

bq. It seems to me also that we would read from transients as part of short 
read protection (they are just another member of the group), and they aren't 
special so we should issue them the query.

Is this simply you reasoning out the rationale for issuing the requests, or 
have you spotted an issue with the patch that means we are not doing so?

It does look to me like we should be perhaps switching to {{acceptsTransient}} 
for the local query also, and validating transient status in 
{{LocalReadRunnable}} - but, for now, we don't do this.  Perhaps we could fix 
this also in this patch, or otherwise file a follow up.

> Transient node receives full data requests in dtests
> 
>
> Key: CASSANDRA-14762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Benedict
>Priority: Major
> Fix For: 4.0
>
>
> I saw this running them on my laptop with rapid write protection disabled. 
> Attached is a patch for disabling rapid write protection in the transient 
> dtests.
> {noformat}
> .Exception in thread Thread-19:
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py",
>  line 916, in _bootstrap_inner
> self.run()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 180, in run
> self.scan_and_report()
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line 
> 173, in scan_and_report
> on_error_call(errordata)
>   File "/Users/aweisberg/repos/cassandra-dtest/dtest_setup.py", line 137, in 
> _log_error_handler
> pytest.fail("Error details: \n{message}".format(message=message))
>   File 
> "/Users/aweisberg/repos/cassandra-dtest/venv/lib/python3.6/site-packages/_pytest/outcomes.py",
>  line 96, in fail
> raise Failed(msg=msg, pytrace=pytrace)
> Failed: Error details:
> Errors seen in logs for: node3
> node3: ERROR [ReadStage-1] 2018-09-18 12:28:48,344 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-1,5,main]
> org.apache.cassandra.exceptions.InvalidRequestException: Attempted to serve 
> transient data request from full node in 
> org.apache.cassandra.db.ReadCommandVerbHandler@3c55e0ff
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.validateTransientStatus(ReadCommandVerbHandler.java:104)
>   at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:53)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:110)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

49 matches

Mail list logo