[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-07-18 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15861:
-
Test and Documentation Plan: 
https://circleci.com/workflow-run/fde45c54-e845-4040-b59e-abcdabda2b29  (was: 
https://circleci.com/workflow-run/3a2fed2c-c469-4f3f-a620-07079f0dc0db)

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> {{CassandraEntireSSTableStreamReader#read}}.
> {code}
> Currently, entire-sstable-streaming requires sstable components to be 
> immutable, because \{{ComponentManifest}}
> with component sizes are sent before sending actual files. This isn't a 
> problem in legacy streaming as STATS 

[jira] [Commented] (CASSANDRA-15925) Jenkins pipeline can copy wrong test report artefacts from stage builds

2020-07-18 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160428#comment-17160428
 ] 

Michael Semb Wever commented on CASSANDRA-15925:



bq. https://ci-cassandra.apache.org/job/Cassandra-devbranch/211/ reports no 
overall failures but stages have failures #collaborating

Cheers.
Test results were either not copied or copied from the wrong build.
>From the pipeline's console
{noformat}
…
11:17:50  Starting building: Cassandra-devbranch-test-compression #140
…
12:35:53  [Pipeline] copyArtifacts
12:35:53  Unable to find a build for artifact copy from: 
Cassandra-devbranch-test
…
12:37:45  [Pipeline] copyArtifacts
12:41:03  Copied 587 artifacts from "Cassandra-devbranch-test-compression" 
build number 139
…
23:32:01  [Pipeline] copyArtifacts
23:32:01  Unable to find a build for artifact copy from: 
Cassandra-devbranch-dtest
{noformat}

I've put a 
[fix|https://github.com/apache/cassandra-builds/commit/a8683629d4a5d66c280443c27a1c26217928b531]
 in for using separate build step wrapper variables for each build. Let's see 
if that helps. 

> Jenkins pipeline can copy wrong test report artefacts from stage builds
> ---
>
> Key: CASSANDRA-15925
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15925
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Spotted in 
> https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/196/console
> Looks like copyArtifact will need to be specific to a build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated: In Jenkins devbranch scope the build step wrappers correctly (name them separately) (CASSANDRA-15925)

2020-07-18 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git


The following commit(s) were added to refs/heads/master by this push:
 new a868362  In Jenkins devbranch scope the build step wrappers correctly 
(name them separately) (CASSANDRA-15925)
a868362 is described below

commit a8683629d4a5d66c280443c27a1c26217928b531
Author: mck 
AuthorDate: Sat Jul 18 11:16:04 2020 +0200

In Jenkins devbranch scope the build step wrappers correctly (name them 
separately) (CASSANDRA-15925)
---
 jenkins-dsl/cassandra_pipeline.groovy | 60 +--
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/jenkins-dsl/cassandra_pipeline.groovy 
b/jenkins-dsl/cassandra_pipeline.groovy
index 1b09c4b..b13b818 100644
--- a/jenkins-dsl/cassandra_pipeline.groovy
+++ b/jenkins-dsl/cassandra_pipeline.groovy
@@ -27,7 +27,7 @@ pipeline {
   steps {
   warnError('Tests unstable') {
   script {
-built = build job: "${env.JOB_NAME}-stress-test", 
parameters: [string(name: 'REPO', value: params.REPO), string(name: 'BRANCH', 
value: params.BRANCH)]
+stress = build job: "${env.JOB_NAME}-stress-test", 
parameters: [string(name: 'REPO', value: params.REPO), string(name: 'BRANCH', 
value: params.BRANCH)]
   }
   }
   }
@@ -35,14 +35,14 @@ pipeline {
 success {
 warnError('missing test xml files') {
 script {
-copyTestResults('stress-test', built.getNumber())
+copyTestResults('stress-test', stress.getNumber())
 }
 }
 }
 unstable {
 warnError('missing test xml files') {
 script {
-copyTestResults('stress-test', built.getNumber())
+copyTestResults('stress-test', stress.getNumber())
 }
 }
 }
@@ -52,7 +52,7 @@ pipeline {
   steps {
   warnError('Tests unstable') {
   script {
-built = build job: "${env.JOB_NAME}-fqltool-test", 
parameters: [string(name: 'REPO', value: params.REPO), string(name: 'BRANCH', 
value: params.BRANCH)]
+fqltool = build job: "${env.JOB_NAME}-fqltool-test", 
parameters: [string(name: 'REPO', value: params.REPO), string(name: 'BRANCH', 
value: params.BRANCH)]
   }
   }
   }
@@ -60,14 +60,14 @@ pipeline {
 success {
 warnError('missing test xml files') {
 script {
-copyTestResults('fqltool-test', built.getNumber())
+copyTestResults('fqltool-test', 
fqltool.getNumber())
 }
 }
 }
 unstable {
 warnError('missing test xml files') {
 script {
-copyTestResults('fqltool-test', built.getNumber())
+copyTestResults('fqltool-test', 
fqltool.getNumber())
 }
 }
 }
@@ -77,7 +77,7 @@ pipeline {
   steps {
   warnError('Tests unstable') {
   script {
-built = build job: "${env.JOB_NAME}-jvm-dtest", 
parameters: [string(name: 'REPO', value: params.REPO), string(name: 'BRANCH', 
value: params.BRANCH)]
+jvm_dtest = build job: "${env.JOB_NAME}-jvm-dtest", 
parameters: [string(name: 'REPO', value: params.REPO), string(name: 'BRANCH', 
value: params.BRANCH)]
   }
   }
   }
@@ -85,14 +85,14 @@ pipeline {
 success {
 warnError('missing test xml files') {
 script {
-copyTestResults('jvm-dtest', built.getNumber())
+copyTestResults('jvm-dtest', jvm_dtest.getNumber())
 }
 }
 }
 unstable {
 warnError('missing test xml files') {
 script {
-copyTestResults('jvm-dtest', built.getNumber())
+copyTestResults('jvm-dtest', jvm_dtest.getNumber())
 }
 }
 }
@@ -102,7 +102,7 @@ pipeline {
 steps {
   warnError('Tests unstable') {
   script {
-built = build job: 

[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160407#comment-17160407
 ] 

Stefan Miklosovic commented on CASSANDRA-15191:
---

[~dcapwell] please review again, I have added a test (hopefully that is 
something you expect otherwise I am out of ideas here) + I have moved the 
logging from ALAES to inspector.

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-beta1
>
> Attachments: log.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15944) ASF CI unit tests on JDK11

2020-07-18 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15944:
---
Test and Documentation Plan: 
Jenkins should display 2 JDK matrix configurations: 
jdk=JDK 1.8 (latest) & jdk=JDK 11 (latest)
;for cassandra-trunk-* & cassandra-devbranch-* test jobs.

 

  was:
With this patch, Jenkins will display 2 JDK configurations 

jdk=JDK 1.8 (latest) & jdk=JDK 11 (latest)

for cassandra-trunk-* & cassandra-devbranch-* test jobs.

 


> ASF CI unit tests on JDK11
> --
>
> Key: CASSANDRA-15944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15944
> Project: Cassandra
>  Issue Type: Task
>  Components: Build, CI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
>
> ASF CI tests today only run on JDK1.8
> On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
> screenshot for naming specifics, attached in  CASSANDRA-15809
>  
> This ticket is to add JDK11 test targets on Cassandra-trunk and 
> Cassandra-devbranch, for parity to CircleCI's workflows.
>  
> The JDK is specified in the groovy DSL:
>  
> [https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
>  
>  This is a continuation from CASSANDRA-15809 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15944) ASF CI unit tests on JDK11

2020-07-18 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15944:
---
Status: Open  (was: Triage Needed)

> ASF CI unit tests on JDK11
> --
>
> Key: CASSANDRA-15944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15944
> Project: Cassandra
>  Issue Type: Task
>  Components: Build, CI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
>
> ASF CI tests today only run on JDK1.8
> On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
> screenshot for naming specifics, attached in  CASSANDRA-15809
>  
> This ticket is to add JDK11 test targets on Cassandra-trunk and 
> Cassandra-devbranch, for parity to CircleCI's workflows.
>  
> The JDK is specified in the groovy DSL:
>  
> [https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
>  
>  This is a continuation from CASSANDRA-15809 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15944) ASF CI unit tests on JDK11

2020-07-18 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15944:
---
Description: 
ASF CI tests today only run on JDK1.8

On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
screenshot for naming specifics, attached in  CASSANDRA-15809

 

This ticket is to add JDK11 test targets on Cassandra-trunk and 
Cassandra-devbranch, for parity to CircleCI's workflows.

 

The JDK is specified in the groovy DSL:
 
[https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
 

 This is a continuation from CASSANDRA-15809 

  was:
ASF CI tests today only run on JDK1.8

On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
screenshot for naming specifics, attached in 
https://issues.apache.org/jira/secure/attachment/13002796/Screenshot%202020-05-13%20at%2009.39.56.png

 

This ticket is to add JDK11 test targets on Cassandra-trunk and 
Cassandra-devbranch, for parity to CircleCI's workflows.

 

The JDK is specified in the groovy DSL:
 
[https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
 

 This is a continuation from CASSANDRA-15809 


> ASF CI unit tests on JDK11
> --
>
> Key: CASSANDRA-15944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15944
> Project: Cassandra
>  Issue Type: Task
>  Components: Build, CI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
>
> ASF CI tests today only run on JDK1.8
> On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
> screenshot for naming specifics, attached in  CASSANDRA-15809
>  
> This ticket is to add JDK11 test targets on Cassandra-trunk and 
> Cassandra-devbranch, for parity to CircleCI's workflows.
>  
> The JDK is specified in the groovy DSL:
>  
> [https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
>  
>  This is a continuation from CASSANDRA-15809 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15944) ASF CI unit tests on JDK11

2020-07-18 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15944:
---
Description: 
ASF CI tests today only run on JDK1.8

On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
screenshot for naming specifics, attached in 
https://issues.apache.org/jira/secure/attachment/13002796/Screenshot%202020-05-13%20at%2009.39.56.png

 

This ticket is to add JDK11 test targets on Cassandra-trunk and 
Cassandra-devbranch, for parity to CircleCI's workflows.

 

The JDK is specified in the groovy DSL:
 
[https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
 

 This is a continuation from CASSANDRA-15809 

  was:
ASF CI tests today only run on JDK1.8

On the Jenkins cluster JDKs from 1.4 through to 15 are available. See attached 
screenshot for naming specifics.

 

This ticket is to add JDK11 test targets on Cassandra-trunk and 
Cassandra-devbranch, for parity to CircleCI's workflows.

 

The JDK is specified in the groovy DSL:
 
[https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
 

 This is a continuation from CASSANDRA-15809 


> ASF CI unit tests on JDK11
> --
>
> Key: CASSANDRA-15944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15944
> Project: Cassandra
>  Issue Type: Task
>  Components: Build, CI
>Reporter: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
>
> ASF CI tests today only run on JDK1.8
> On the Jenkins cluster JDKs from 1.4 through to 15 are available. See 
> screenshot for naming specifics, attached in 
> https://issues.apache.org/jira/secure/attachment/13002796/Screenshot%202020-05-13%20at%2009.39.56.png
>  
> This ticket is to add JDK11 test targets on Cassandra-trunk and 
> Cassandra-devbranch, for parity to CircleCI's workflows.
>  
> The JDK is specified in the groovy DSL:
>  
> [https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L11]
>  
>  This is a continuation from CASSANDRA-15809 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org