[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-15 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215186#comment-17215186
 ] 

Berenguer Blasi commented on CASSANDRA-15996:
-

That could be it indeed imo. Given NoSpamLogger is to be used in hot paths and 
'currentTimeMillis' resolution issues I'd got for the 'Long.MIN_VALUE' route. 
Also that keeps things within the 'nanotime()' world, sort to speak, so we 
don't inadvertently introduce some perf profile change . Also 
[https://stackoverflow.com/a/54566928/3432945|http://example.com] read was 
interesting.

I have +1'ed the 'Long.MIN_VALUE' PR pending sbdy that knows about the upgrade 
test failures confirming they are indeed unrelated. Nice catch either if it 
turn out to be it or not! :-)

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16210 at 10/16/20, 3:57 AM:


The issue is that it was possible to open a new Keyspace instance in the middle 
of Schema.dropKeyspace(). To see the problem the drop has to progress to the 
following 
[state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]:
 1) Keyspace instance doesn't exist - it has been already removed. 
 2) KeyspaceMetadata still exists
 Keyspace.open in this state creates a new Keyspace instance (with 
ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is 
an object leak.

[3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run: [Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

[4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62]
CI run: [Java 8 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef]
 | [Java 11 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a]


was (Author: e.dimitrova):
The issue is that it was possible to open a new Keyspace instance in the middle 
of Schema.dropKeyspace(). To see the problem the drop has to progress to the 
following 
[state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]:
 1) Keyspace instance doesn't exist - it has been already removed. 
 2) KeyspaceMetadata still exists
 Keyspace.open in this state creates a new Keyspace instance (with 
ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is 
an object leak.

[3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

[4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62]
CI run:
[Java 8 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef]
 | [Java 11 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a]

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta3
>
>
> DTest failure: 
> dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
>  (vnodes) - one random failure was reported which pointed to a race condition 
> to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16210 at 10/16/20, 3:56 AM:


The issue is that it was possible to open a new Keyspace instance in the middle 
of Schema.dropKeyspace(). To see the problem the drop has to progress to the 
following 
[state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]:
 1) Keyspace instance doesn't exist - it has been already removed. 
 2) KeyspaceMetadata still exists
 Keyspace.open in this state creates a new Keyspace instance (with 
ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is 
an object leak.

[3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

[4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62]
CI run:
[Java 8 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef]
 | [Java 11 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a]


was (Author: e.dimitrova):
The issue is, it was possible to open a new Keyspace instance in the middle of 
Schema.dropKeyspace(). To see the problem the drop has to progress to the 
following 
[state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]:
 1) Keyspace instance doesn't exist - it has been already removed. 
 2) KeyspaceMetadata still exists
 Keyspace.open in this state creates a new Keyspace instance (with 
ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is 
an object leak.

[3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

[4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62]
CI run:
[Java 8 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef]
 | [Java 11 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a]

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta3
>
>
> DTest failure: 
> dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
>  (vnodes) - one random failure was reported which pointed to a race condition 
> to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Test and Documentation Plan:   (was: It turned out the issue is already 
solved for 4.0 with CASSANDRA-9425

Posting patch for [3.11 
|https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 
|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] )

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta3
>
>
> DTest failure: 
> dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
>  (vnodes) - one random failure was reported which pointed to a race condition 
> to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-16210 at 10/16/20, 3:55 AM:


The issue is, it was possible to open a new Keyspace instance in the middle of 
Schema.dropKeyspace(). To see the problem the drop has to progress to the 
following 
[state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]:
 1) Keyspace instance doesn't exist - it has been already removed. 
 2) KeyspaceMetadata still exists
 Keyspace.open in this state creates a new Keyspace instance (with 
ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is 
an object leak.

[3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

[4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62]
CI run:
[Java 8 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef]
 | [Java 11 | 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a]


was (Author: e.dimitrova):
It turned out the issue is already solved for 4.0 with CASSANDRA-9425

Posting patch for [3.11 | 
https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> DTest failure: 
> dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
>  (vnodes) - one random failure was reported which pointed to a race condition 
> to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Fix Version/s: 4.0-beta3

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta3
>
>
> DTest failure: 
> dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
>  (vnodes) - one random failure was reported which pointed to a race condition 
> to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16048) Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and Value Columns

2020-10-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215155#comment-17215155
 ] 

Jordan West edited comment on CASSANDRA-16048 at 10/16/20, 3:55 AM:


Updated the branch to address [~marcuse]'s comment re: updating the flags in 
{{system_schema.tables}}. Had to move things around to account for the changes 
in CASSANDRA-16063. Updated the test as well. 

I skipped adding a flag since we can't detect and "undo" updating the tables 
that were updated. If folks feel strongly about the flag I can add it. 

Branch: [https://github.com/apache/cassandra/compare/trunk...jrwest:jwest/16048]

Tests: 
[https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F16048]


was (Author: jrwest):
Updated the branch to address [~marcuse]'s comment re: updated the flags in 
{{system_schema.tables}}. Had to move things around to account for the changes 
in CASSANDRA-16063. Updated the test as well. 

I skipped adding a flag since we can't detect and "undo" updating the tables 
that were updated. If folks feel strongly about the flag I can add it. 

Branch: https://github.com/apache/cassandra/compare/trunk...jrwest:jwest/16048

Tests: 
https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F16048

> Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and 
> Value Columns
> --
>
> Key: CASSANDRA-16048
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16048
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Some compact storage tables, specifically those where the user has defined 
> both at least one clustering and the value column, can be safely handled in 
> 4.0 because besides the DENSE flag they are not materially different post 3.0 
> and there is no visible change to the user facing schema after dropping 
> compact storage. We can detect this case and allow these tables to silently 
> drop the DENSE flag while still throwing a start-up error for COMPACT STORAGE 
> tables that don’t meet the criteria. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16048) Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and Value Columns

2020-10-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215155#comment-17215155
 ] 

Jordan West commented on CASSANDRA-16048:
-

Updated the branch to address [~marcuse]'s comment re: updated the flags in 
{{system_schema.tables}}. Had to move things around to account for the changes 
in CASSANDRA-16063. Updated the test as well. 

I skipped adding a flag since we can't detect and "undo" updating the tables 
that were updated. If folks feel strongly about the flag I can add it. 

Branch: https://github.com/apache/cassandra/compare/trunk...jrwest:jwest/16048

Tests: 
https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F16048

> Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and 
> Value Columns
> --
>
> Key: CASSANDRA-16048
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16048
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Some compact storage tables, specifically those where the user has defined 
> both at least one clustering and the value column, can be safely handled in 
> 4.0 because besides the DENSE flag they are not materially different post 3.0 
> and there is no visible change to the user facing schema after dropping 
> compact storage. We can detect this case and allow these tables to silently 
> drop the DENSE flag while still throwing a start-up error for COMPACT STORAGE 
> tables that don’t meet the criteria. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar

2020-10-15 Thread maxwellguo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215124#comment-17215124
 ] 

maxwellguo commented on CASSANDRASC-27:
---

Thank you [~tharanga] . 

> CDC reader in Apache Cassandra Sidecar
> --
>
> Key: CASSANDRASC-27
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-27
> Project: Sidecar for Apache Cassandra
>  Issue Type: New Feature
>Reporter: Vinay Chella
>Assignee: Tharanga Sampath Gamaethige
>Priority: Normal
>
> Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This 
> is further enhanced with (CASS-12148) in Cassandra 4.0.
> However, there’s no generally available mechanism to stream changes out of a 
> Cassandra database; hence the utility of this feature is limited if not 
> absent.
> Many applications use Cassandra as their primary data store. For various 
> reasons(Caching, analyzing, indexing, etc), this data needs to be 
> synchronized with derived/secondary data stores.  We would like to emit 
> change streams in real-time to consumers so that changes to Cassandra can be 
> used for various purposes.
> *Goals*
>  * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit 
> changes in real-time. Priority for the initial implementation is safety and 
> correctness, performance enhancements will follow in subsequent iterations
> *Nongoals*
>  * Modify Cassandra storage engine to emit changes
>  
> *Proposal*
> [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing]
>  
> *PR*
> https://github.com/apache/cassandra-sidecar/pull/16
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar

2020-10-15 Thread maxwellguo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maxwellguo reassigned CASSANDRASC-27:
-

Assignee: Tharanga Sampath Gamaethige  (was: maxwellguo)

> CDC reader in Apache Cassandra Sidecar
> --
>
> Key: CASSANDRASC-27
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-27
> Project: Sidecar for Apache Cassandra
>  Issue Type: New Feature
>Reporter: Vinay Chella
>Assignee: Tharanga Sampath Gamaethige
>Priority: Normal
>
> Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This 
> is further enhanced with (CASS-12148) in Cassandra 4.0.
> However, there’s no generally available mechanism to stream changes out of a 
> Cassandra database; hence the utility of this feature is limited if not 
> absent.
> Many applications use Cassandra as their primary data store. For various 
> reasons(Caching, analyzing, indexing, etc), this data needs to be 
> synchronized with derived/secondary data stores.  We would like to emit 
> change streams in real-time to consumers so that changes to Cassandra can be 
> used for various purposes.
> *Goals*
>  * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit 
> changes in real-time. Priority for the initial implementation is safety and 
> correctness, performance enhancements will follow in subsequent iterations
> *Nongoals*
>  * Modify Cassandra storage engine to emit changes
>  
> *Proposal*
> [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing]
>  
> *PR*
> https://github.com/apache/cassandra-sidecar/pull/16
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-15 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215014#comment-17215014
 ] 

Adam Holmberg edited comment on CASSANDRA-15996 at 10/15/20, 9:24 PM:
--

Created two patches for consideration
|[currentTimeMillis|https://github.com/aholmberg/cassandra/pull/13/files#diff-e2c5319b6d6b31133eb6f8daf05716ee2358471ae66ac8dedb1df5fd669e088b]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996]|
|[Long.MIN_VALUE|https://github.com/aholmberg/cassandra/pull/14]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996-alt]|

(also fixing what I believe to be incorrect behavior shown in one of the unit 
tests)


was (Author: aholmber):
Created two patches for consideration

|[currentTimeMillis|https://github.com/aholmberg/cassandra/pull/13/files#diff-e2c5319b6d6b31133eb6f8daf05716ee2358471ae66ac8dedb1df5fd669e088b]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996]|
|[Long.MIN_VALUE|https://github.com/aholmberg/cassandra/pull/14]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996-alt]|

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-15 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215014#comment-17215014
 ] 

Adam Holmberg commented on CASSANDRA-15996:
---

Created two patches for consideration

|[currentTimeMillis|https://github.com/aholmberg/cassandra/pull/13/files#diff-e2c5319b6d6b31133eb6f8daf05716ee2358471ae66ac8dedb1df5fd669e088b]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996]|
|[Long.MIN_VALUE|https://github.com/aholmberg/cassandra/pull/14]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996-alt]|

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread Paulo Motta (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-16213:

Reviewers: Brandon Williams, Paulo Motta

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214972#comment-17214972
 ] 

David Capwell commented on CASSANDRA-16213:
---

[~paulo]. Brandon told me in slack you would be a good person to review as 
well, would you be able to?

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214966#comment-17214966
 ] 

David Capwell commented on CASSANDRA-16213:
---

Sorry, I misspoke, on startup we do add it back into the ring, see 
https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/service/StorageService.java#L604-L617.

So currently, each node will add it back into the ring, and will add it back 
into gossip.

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214963#comment-17214963
 ] 

David Capwell commented on CASSANDRA-16213:
---

Thanks for the replay [~brandon.williams]!

bq. If you shutdown the entire ring in non-rolling fashion then it is no 
surprise

We see this in rolling fashion as well, full cluster was easier to reproduce; 
so the issue isn't isolated to full cluster outage.

bq. You can no longer replace as a consequence

What is the recommendation in these cases?

bq. A node injecting states that don't belong to itself is generally forbidden 
as it is dangerous

In the case I call out we don't add the node to the ring, but we do add it to 
gossip, see 
https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/gms/Gossiper.java#L1754-L1780.

We will try to evict it from gossip (see 
https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/gms/Gossiper.java#L960-L969),
 but we also see in the wild that this eviction doesn't happen and it stays 
there forever; here is a sample from gossipinfo on a real cluster

{code}
/
  generation:0
  heartbeat:0
  TOKENS: not present
{code}



> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Description: DTest failure: 
dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
 (vnodes) - one random failure was reported which pointed to a race condition 
to be spotted.   (was: DTest failure: 
TestRepairDataSystemTable.repair_table_test (vnodes) - one random failure was 
reported which pointed to a race condition to be spotted. )

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> DTest failure: 
> dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table
>  (vnodes) - one random failure was reported which pointed to a race condition 
> to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Description: DTest failure: TestRepairDataSystemTable.repair_table_test 
(vnodes) - one random failure was reported which pointed to a race condition to 
be spotted.   (was: Unit Test failure: 
TestRepairDataSystemTable.repair_table_test (vnodes) - one random failure was 
reported which pointed to a race condition to be spotted. )

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> DTest failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Discovered By: User Report  (was: Unit Test)

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar

2020-10-15 Thread Tharanga Sampath Gamaethige (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214938#comment-17214938
 ] 

Tharanga Sampath Gamaethige commented on CASSANDRASC-27:


WIP version of the PR is out : 
https://github.com/apache/cassandra-sidecar/pull/16

> CDC reader in Apache Cassandra Sidecar
> --
>
> Key: CASSANDRASC-27
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-27
> Project: Sidecar for Apache Cassandra
>  Issue Type: New Feature
>Reporter: Vinay Chella
>Assignee: maxwellguo
>Priority: Normal
>
> Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This 
> is further enhanced with (CASS-12148) in Cassandra 4.0.
> However, there’s no generally available mechanism to stream changes out of a 
> Cassandra database; hence the utility of this feature is limited if not 
> absent.
> Many applications use Cassandra as their primary data store. For various 
> reasons(Caching, analyzing, indexing, etc), this data needs to be 
> synchronized with derived/secondary data stores.  We would like to emit 
> change streams in real-time to consumers so that changes to Cassandra can be 
> used for various purposes.
> *Goals*
>  * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit 
> changes in real-time. Priority for the initial implementation is safety and 
> correctness, performance enhancements will follow in subsequent iterations
> *Nongoals*
>  * Modify Cassandra storage engine to emit changes
>  
> *Proposal*
> [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing]
>  
> *PR*
> https://github.com/apache/cassandra-sidecar/pull/16
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar

2020-10-15 Thread Tharanga Sampath Gamaethige (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharanga Sampath Gamaethige updated CASSANDRASC-27:
---
Description: 
Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This is 
further enhanced with (CASS-12148) in Cassandra 4.0.

However, there’s no generally available mechanism to stream changes out of a 
Cassandra database; hence the utility of this feature is limited if not absent.

Many applications use Cassandra as their primary data store. For various 
reasons(Caching, analyzing, indexing, etc), this data needs to be synchronized 
with derived/secondary data stores.  We would like to emit change streams in 
real-time to consumers so that changes to Cassandra can be used for various 
purposes.

*Goals*
 * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit 
changes in real-time. Priority for the initial implementation is safety and 
correctness, performance enhancements will follow in subsequent iterations

*Nongoals*
 * Modify Cassandra storage engine to emit changes

 

*Proposal*

[https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing]

 

*PR*

https://github.com/apache/cassandra-sidecar/pull/16

 

  was:
Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This is 
further enhanced with (CASS-12148) in Cassandra 4.0.

However, there’s no generally available mechanism to stream changes out of a 
Cassandra database; hence the utility of this feature is limited if not absent.

Many applications use Cassandra as their primary data store. For various 
reasons(Caching, analyzing, indexing, etc), this data needs to be synchronized 
with derived/secondary data stores.  We would like to emit change streams in 
real-time to consumers so that changes to Cassandra can be used for various 
purposes.

*Goals*
 * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit 
changes in real-time. Priority for the initial implementation is safety and 
correctness, performance enhancements will follow in subsequent iterations

*Nongoals*
 * Modify Cassandra storage engine to emit changes

 

*Proposal*

https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing

 


> CDC reader in Apache Cassandra Sidecar
> --
>
> Key: CASSANDRASC-27
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-27
> Project: Sidecar for Apache Cassandra
>  Issue Type: New Feature
>Reporter: Vinay Chella
>Assignee: maxwellguo
>Priority: Normal
>
> Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This 
> is further enhanced with (CASS-12148) in Cassandra 4.0.
> However, there’s no generally available mechanism to stream changes out of a 
> Cassandra database; hence the utility of this feature is limited if not 
> absent.
> Many applications use Cassandra as their primary data store. For various 
> reasons(Caching, analyzing, indexing, etc), this data needs to be 
> synchronized with derived/secondary data stores.  We would like to emit 
> change streams in real-time to consumers so that changes to Cassandra can be 
> used for various purposes.
> *Goals*
>  * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit 
> changes in real-time. Priority for the initial implementation is safety and 
> correctness, performance enhancements will follow in subsequent iterations
> *Nongoals*
>  * Modify Cassandra storage engine to emit changes
>  
> *Proposal*
> [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing]
>  
> *PR*
> https://github.com/apache/cassandra-sidecar/pull/16
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16057:
--
Status: Ready to Commit  (was: Review In Progress)

> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-15 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214937#comment-17214937
 ] 

Adam Holmberg commented on CASSANDRA-15996:
---

bq. I think we should switch NoSpamLogger to use currentTimeMillis.

This, or we could initialize the {{NoSpamLogStatement}} to {{Long.MIN_VALUE}} 
instead of zero. I have the changes for either.

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-15 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214936#comment-17214936
 ] 

David Capwell commented on CASSANDRA-16057:
---

+1

> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16057:
--
Reviewers: Alex Petrov, David Capwell, David Capwell  (was: Alex Petrov, 
David Capwell)
   Alex Petrov, David Capwell, David Capwell  (was: Alex Petrov, 
David Capwell)
   Status: Review In Progress  (was: Patch Available)

> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214935#comment-17214935
 ] 

Brandon Williams commented on CASSANDRA-16213:
--

This affects all versions since the inception of replacement.  If you shutdown 
the entire ring in non-rolling fashion then it is no surprise that any gossip 
state not persisted (and specific to an existing live node, which will 
repopulate it) will be lost.  You can no longer replace as a consequence. A 
node injecting states that don't belong to itself is generally forbidden as it 
is dangerous, with the except that proves the rule be assassinate (which also 
sleeps to careful.)  No node should need to know about any dead states upon a 
full ring restart,  with the exception of replacement.



> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16210:
---
Reviewers: Michael Semb Wever, Michael Semb Wever  (was: Michael Semb Wever)
   Michael Semb Wever, Michael Semb Wever
   Status: Review In Progress  (was: Patch Available)

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16213:
--
Test and Documentation Plan: tests added
 Status: Patch Available  (was: Open)

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16213:
--
 Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)
   Complexity: Challenging
Discovered By: User Report
Fix Version/s: 4.0-beta
 Severity: Critical
   Status: Open  (was: Triage Needed)

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-10-15 Thread David Capwell (Jira)
David Capwell created CASSANDRA-16213:
-

 Summary: Cannot replace_address /X because it doesn't exist in 
gossip
 Key: CASSANDRA-16213
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip, Cluster/Membership
Reporter: David Capwell
Assignee: David Capwell


We see this exception around nodes crashing and trying to do a host 
replacement; this error appears to be correlated around multiple node failures.

A simplified case to trigger this is the following

*) Have a N node cluster
*) Shutdown all N nodes
*) Bring up N-1 nodes (at least 1 seed, else replace seed)
*) Host replace the N-1th node -> this will fail with the above

The reason this happens is that the N-1th node isn’t gossiping anymore, and the 
existing nodes do not have its details in gossip (but have the details in the 
peers table), so the host replacement fails as the node isn’t known in gossip.

This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-15 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214926#comment-17214926
 ] 

Adam Holmberg commented on CASSANDRA-15996:
---

bq. instead of relying on the patient cql connection, lets add flags to wait 
for the binary protocol and other startup stuff to complete
The test already waits for binary protocol log-wise, and the connection is 
established after that. I'm not sure what else we would add. 

bq. NoSpamLogger has some shuffling of instances around that maybe have a 
concurrency hole, maybe I am just imagining things. 
I've stared at this quite a bit and I am reasonably confident there is not an 
issue with those mappings. Reasoning in part is as we have mentioned there is 
only a single request in-flight. The other is that no matter what kind of race 
we could come up with, worst case scenario is we create new wrappers -- there 
are no runtime errors and it's still using the same logger internally (if it 
was even the same key). Incidentally I have also never seen another 
{{NoSpamLogger}} message across thousands of runs of this test.

With that in mind I stared a bit more at the [other 
thing|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/utils/NoSpamLogger.java#L78-L82]
 that could cause this not to be logged. {{minIntervalNanos}} is coming from a 
[static 
field|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/db/ExpirationDateOverflowHandling.java#L39]
 and guaranteed to be set to a known value. {{expected}} is the default 
zero-initialized value of an AtomicInteger. {{nowNanos}}, on the other hand, is 
coming from 
[{{System.nanoTime}}|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/utils/NoSpamLogger.java#L59-L62],
 which (TIL) can be 
[negative|https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime--]:

bq. This method can only be used to measure elapsed time and is not related to 
any other notion of system or wall-clock time. The value returned represents 
nanoseconds since some fixed but arbitrary origin time (perhaps in the future, 
so values may be negative). 

I haven't found a way to prove it, but presently this is my only plausible 
theory. I think we should switch NoSpamLogger to use {{currentTimeMillis}}. We 
know its non monotonic and may be less precise, but I think it fits the bill 
for the spirit of this class, where callers are specifying intervals on the 
order of whole seconds and minutes.

Please let me know if anyone has thoughts on that.

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-15 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214884#comment-17214884
 ] 

Yifan Cai commented on CASSANDRA-16057:
---

CI result from the latest in each branch.

3.11: 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra/131/workflows/0fb514dd-3bed-4c07-a87f-981996b6fcfe]
 (unrelated failures)

3.0: 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra/132/workflows/83facbf4-3b82-468c-aa7d-78f90b01cc09]
 (unrelated failures)

2.2: 
[https://app.circleci.com/pipelines/github/yifan-c/cassandra/121/workflows/d5d71199-342b-45f8-a1d1-3d57af414142]
 (unrelated failures)

Both 3.11 and 3.0 dtest have failed test "test_closing_connections - 
thrift_hsha_test.TestThriftHSHA".

cc: [~dcapwell]

> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-15 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-16157:
--
Reviewers: David Capwell, Yifan Cai  (was: David Capwell)

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff

2020-10-15 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214877#comment-17214877
 ] 

Yifan Cai commented on CASSANDRA-16211:
---

cc: [~marcuse]

> Improve job metadata queries exception handling in cassandra-diff
> -
>
> Key: CASSANDRA-16211
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16211
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The job metadata tracks the progress of the diff job. Sometimes, a job can 
> fail due to the progress update query failures. 
> The progress update queries can be categorized into 2 groups, critical and 
> trivial one. 
> When a query failed to update a trivial status (e.g. ProgressTracker), we 
> would mostly hope to continue the job and just log the failure. 
> When a query failed to update a critical status (e.g. JobLifeCycle), we can 
> apply the client-side retry strategy (e.g. exponential backoff) in addition 
> to the retry policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff

2020-10-15 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-16211:
--
Test and Documentation Plan: unit test
 Status: Patch Available  (was: Open)

PR: [https://github.com/apache/cassandra-diff/pull/13]

The patch does what mentioned in the description. 
 * Ignore query exceptions from queries in ProgressTracker
 * Retry (when a retry strategy is specified) queries in JobLifeCycle

> Improve job metadata queries exception handling in cassandra-diff
> -
>
> Key: CASSANDRA-16211
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16211
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The job metadata tracks the progress of the diff job. Sometimes, a job can 
> fail due to the progress update query failures. 
> The progress update queries can be categorized into 2 groups, critical and 
> trivial one. 
> When a query failed to update a trivial status (e.g. ProgressTracker), we 
> would mostly hope to continue the job and just log the failure. 
> When a query failed to update a critical status (e.g. JobLifeCycle), we can 
> apply the client-side retry strategy (e.g. exponential backoff) in addition 
> to the retry policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff

2020-10-15 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-16211:
--
Change Category: Operability
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Improve job metadata queries exception handling in cassandra-diff
> -
>
> Key: CASSANDRA-16211
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16211
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The job metadata tracks the progress of the diff job. Sometimes, a job can 
> fail due to the progress update query failures. 
> The progress update queries can be categorized into 2 groups, critical and 
> trivial one. 
> When a query failed to update a trivial status (e.g. ProgressTracker), we 
> would mostly hope to continue the job and just log the failure. 
> When a query failed to update a critical status (e.g. JobLifeCycle), we can 
> apply the client-side retry strategy (e.g. exponential backoff) in addition 
> to the retry policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch master updated: Revert "Add test_truncate_failure"

2020-10-15 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new 016a0eb  Revert "Add test_truncate_failure"
016a0eb is described below

commit 016a0eb38db25ab36e1adabbc0bfe9575212b2ec
Author: Brandon Williams 
AuthorDate: Thu Oct 15 11:46:48 2020 -0500

Revert "Add test_truncate_failure"

This reverts commit 8cb6bd23e62c4d3b4e208d3909361d6812182bc6.
---
 byteman/truncate_fail.btm |  8 
 cql_test.py   | 33 -
 2 files changed, 41 deletions(-)

diff --git a/byteman/truncate_fail.btm b/byteman/truncate_fail.btm
deleted file mode 100644
index fa9caba..000
--- a/byteman/truncate_fail.btm
+++ /dev/null
@@ -1,8 +0,0 @@
-RULE Throw during truncate operation
-CLASS org.apache.cassandra.db.ColumnFamilyStore
-METHOD truncateBlocking()
-AT ENTRY
-IF TRUE
-DO
-   throw new RuntimeException("Dummy failure");
-ENDRULE
\ No newline at end of file
diff --git a/cql_test.py b/cql_test.py
index dde7b7d..eced21d 100644
--- a/cql_test.py
+++ b/cql_test.py
@@ -1,5 +1,4 @@
 import itertools
-import re
 import struct
 import time
 import pytest
@@ -765,38 +764,6 @@ class TestMiscellaneousCQL(CQLTester):
 [2, None, 2, None],
 [3, None, 3, None]])
 
-@since("4.0")
-def test_truncate_failure(self):
-"""
-@jira_ticket CASSANDRA-16208
-Tests that if a TRUNCATE query fails on some replica, the coordinator 
will immediately return an error to the
-client instead of waiting to time out because it couldn't get the 
necessary number of success acks.
-"""
-cluster = self.cluster
-cluster.populate(3, install_byteman=True).start()
-node1, _, node3 = cluster.nodelist()
-node3.byteman_submit(['./byteman/truncate_fail.btm'])
-
-session = self.patient_exclusive_cql_connection(node1)
-create_ks(session, 'ks', 3)
-
-logger.debug("Creating data table")
-session.execute("CREATE TABLE data (id int PRIMARY KEY, data text)")
-session.execute("UPDATE data SET data = 'Awesome' WHERE id = 1")
-
-self.fixture_dtest_setup.ignore_log_patterns = ['Dummy failure']
-logger.debug("Truncating data table (error expected)")
-
-thrown = False
-exception = None
-try:
-session.execute("TRUNCATE data")
-except Exception as e:
-exception = e
-thrown = True
-
-assert thrown, "No exception has been thrown"
-assert re.search("Truncate failed on replica /127.0.0.3", 
str(exception)) is not None
 
 @since('3.2')
 class AbortedQueryTester(CQLTester):


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15977) 4.0 quality testing: Read Repair

2020-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214813#comment-17214813
 ] 

Andres de la Peña edited comment on CASSANDRA-15977 at 10/15/20, 4:21 PM:
--

Here are the CI results:

3.11
 
[https://app.circleci.com/pipelines/github/adelapena/cassandra/116/workflows/4007a648-0a65-45a9-bcf6-4ef83017fbba]
 [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/78/] 
 
[https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/284/]
 
 
[https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/76/]
 
 [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/114/]

trunk
 
[https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/7efe9b9a-e2c6-40d4-a183-86ddd9e599f3]
 
[https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/18fe4de3-faaf-4bf9-a74c-f4fe04bf844f]
 [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/79/] 
 
[https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/]
 
 
[https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/77/]
 
 [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/115/]

I'm seeing some timeout errors in {{ReadRepairQueryTypesTest}} (like [this 
one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testPointQueryOnWideTable_13__strategy_BLOCKING_coordinator_2_flush_false_paging_true_/]
 or [this other 
one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/282/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testRangeQueryWithFilterOnSelectedColumnOnSkinnyTable_14__strategy_BLOCKING_coordinator_2_flush_true_paging_false_/])
 that I can't reproduce locally. Not sure whether they might be caused by the 
CI environment or there's a real problem. Increasing the request timeout 
doesn't help, so we could try to not so aggressively reuse the cluster. Right 
now 544 tests use the same cluster, working with a cluster per test like most 
dtests do might improve this, although that would make the test significantly 
slower. CC [~maedhroz]


was (Author: adelapena):
Here are the CI results:

3.11
https://app.circleci.com/pipelines/github/adelapena/cassandra/116/workflows/4007a648-0a65-45a9-bcf6-4ef83017fbba
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/78/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/284/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/76/
 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/114/

trunk
https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/7efe9b9a-e2c6-40d4-a183-86ddd9e599f3
https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/18fe4de3-faaf-4bf9-a74c-f4fe04bf844f
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/79/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/77/
 
[https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/115/]

I'm seeing some timeout errors in {{ReadRepairQueryTypesTest}} (like [this 
one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testPointQueryOnWideTable_13__strategy_BLOCKING_coordinator_2_flush_false_paging_true_/]
 or [this other 
one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/282/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testRangeQueryWithFilterOnSelectedColumnOnSkinnyTable_14__strategy_BLOCKING_coordinator_2_flush_true_paging_false_/])
 that I can't reproduce locally. Not sure whether they might be caused by the 
CI environment or there's a real problem. Increasing the request timeout 
doesn't help, so we could try to not so aggressively reuse the cluster. Right 
now 224 tests use the same cluster, working with a cluster per test like most 
dtests do might improve this, although that would make the test significantly 
slower. CC [~maedhroz]

> 4.0 quality testing: Read Repair
> 
>
> Key: CASSANDRA-15977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15977
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 13h 20m
>  Remaining 

[jira] [Updated] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16177:

Test and Documentation Plan: 
[CI|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/411/workflows/ba075847-0c8b-4838-ad47-0e0c4324dc0a/jobs/2377]

[Patch|https://github.com/ekaterinadimitrova2/cassandra/pull/61]
 Status: Patch Available  (was: In Progress)

> jvm_upgrade_dtests job issue in CircleCI MIDRES
> ---
>
> Key: CASSANDRA-16177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16177
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with 
> MIDRES:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES

2020-10-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214817#comment-17214817
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16177:
-

[~dcapwell] can you review and commit this patch, please? I believe it is the 
solution we discussed, CI run also proves it. Thanks!

[CI|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/411/workflows/ba075847-0c8b-4838-ad47-0e0c4324dc0a/jobs/2377]

[Patch|https://github.com/ekaterinadimitrova2/cassandra/pull/61]

 

> jvm_upgrade_dtests job issue in CircleCI MIDRES
> ---
>
> Key: CASSANDRA-16177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16177
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with 
> MIDRES:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15977) 4.0 quality testing: Read Repair

2020-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214813#comment-17214813
 ] 

Andres de la Peña commented on CASSANDRA-15977:
---

Here are the CI results:

3.11
https://app.circleci.com/pipelines/github/adelapena/cassandra/116/workflows/4007a648-0a65-45a9-bcf6-4ef83017fbba
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/78/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/284/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/76/
 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/114/

trunk
https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/7efe9b9a-e2c6-40d4-a183-86ddd9e599f3
https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/18fe4de3-faaf-4bf9-a74c-f4fe04bf844f
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/79/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/ 
https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/77/
 
[https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/115/]

I'm seeing some timeout errors in {{ReadRepairQueryTypesTest}} (like [this 
one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testPointQueryOnWideTable_13__strategy_BLOCKING_coordinator_2_flush_false_paging_true_/]
 or [this other 
one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/282/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testRangeQueryWithFilterOnSelectedColumnOnSkinnyTable_14__strategy_BLOCKING_coordinator_2_flush_true_paging_false_/])
 that I can't reproduce locally. Not sure whether they might be caused by the 
CI environment or there's a real problem. Increasing the request timeout 
doesn't help, so we could try to not so aggressively reuse the cluster. Right 
now 224 tests use the same cluster, working with a cluster per test like most 
dtests do might improve this, although that would make the test significantly 
slower. CC [~maedhroz]

> 4.0 quality testing: Read Repair
> 
>
> Key: CASSANDRA-15977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15977
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> This is a subtask of CASSANDRA-15579 focusing on read repair.
> [This 
> document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing]
>  lists and describes the existing functional tests for read repair, so we can 
> have a broad view of what is currently covered. We can comment on this 
> document and add ideas for new cases/tests, so it can gradually evolve to a 
> more or less detailed test plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-15 Thread Zhao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214770#comment-17214770
 ] 

Zhao Yang edited comment on CASSANDRA-15229 at 10/15/20, 3:20 PM:
--

thanks for the review and feedback, merged to 
[trunk|https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4]


was (Author: jasonstack):
thanks for the review and feedback, merged to 
[trunk](https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4)

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-15 Thread Zhao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yang updated CASSANDRA-15229:
--
Source Control Link: 
https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4
  (was: https://github.com/apache/cassandra/pull/535)

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-15 Thread Zhao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yang updated CASSANDRA-15229:
--
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

thanks for the review and feedback, merged to 
[trunk](https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4)

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: CASSANDRA-15229: Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-15 Thread jasonstack
This is an automated email from the ASF dual-hosted git repository.

jasonstack pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 699a1f7  CASSANDRA-15229: Segregate Network and Chunk Cache 
BufferPools and Recirculate Partially Freed Chunks
699a1f7 is described below

commit 699a1f74fcc1da1952da6b2b0309c9e2474c67f4
Author: Zhao Yang 
AuthorDate: Thu Oct 15 22:53:44 2020 +0800

CASSANDRA-15229: Segregate Network and Chunk Cache BufferPools and 
Recirculate Partially Freed Chunks

* initiate multiple buffer pool for different lifespan and usages
  - Chunk Cache Buffer Pool - conf.file_cache_size_in_mb=512mb
  - Networking Buffer Pool - conf.temporary_cache_size_in_mb=128mb

* Add overflowSize and usedSize to buffer pool metrics

* re-circulate buffer pool Chunk for ChunkCache whenever it has free space, 
even thoughput it may not be able to allocate due to fragmentation

patch by Zhao Yang; reviewed by Caleb Rackliffe and Aleksey Yeschenko for 
CASSANDRA-15229
---
 CHANGES.txt|   1 +
 conf/cassandra.yaml|  13 +-
 .../org/apache/cassandra/cache/ChunkCache.java |  14 +-
 src/java/org/apache/cassandra/config/Config.java   |   2 +
 .../cassandra/config/DatabaseDescriptor.java   |  14 +
 .../db/streaming/CassandraStreamWriter.java|   6 +-
 .../cassandra/hints/ChecksummedDataInput.java  |   6 +-
 .../hints/CompressedChecksummedDataInput.java  |  13 +-
 .../io/util/BufferManagingRebufferer.java  |   6 +-
 .../cassandra/metrics/BufferPoolMetrics.java   |  45 +-
 .../cassandra/net/AsyncStreamingOutputPlus.java|  13 +-
 .../apache/cassandra/net/BufferPoolAllocator.java  |  13 +-
 .../cassandra/net/FrameDecoderLegacyLZ4.java   |  11 +-
 .../org/apache/cassandra/net/FrameEncoder.java |   9 +-
 .../org/apache/cassandra/net/FrameEncoderCrc.java  |   2 +-
 .../org/apache/cassandra/net/FrameEncoderLZ4.java  |   9 +-
 .../cassandra/net/FrameEncoderLegacyLZ4.java   |   8 +-
 .../cassandra/net/FrameEncoderUnprotected.java |   2 +-
 .../apache/cassandra/net/HandshakeProtocol.java|   6 +-
 .../cassandra/net/InboundConnectionInitiator.java  |   6 +-
 .../cassandra/net/LocalBufferPoolAllocator.java|   3 +-
 .../cassandra/net/OutboundConnectionInitiator.java |   4 +-
 .../org/apache/cassandra/net/ShareableBytes.java   |   6 +-
 .../apache/cassandra/utils/memory/BufferPool.java  | 466 -
 .../apache/cassandra/utils/memory/BufferPools.java |  79 
 .../apache/cassandra/net/ConnectionBurnTest.java   |   4 +-
 .../cassandra/utils/memory/LongBufferPoolTest.java | 111 ++---
 test/data/jmxdump/cassandra-4.0-jmx.yaml   |  75 +++-
 .../cassandra/distributed/impl/Instance.java   |   4 +-
 .../cassandra/metrics/BufferPoolMetricsTest.java   | 125 --
 .../unit/org/apache/cassandra/net/FramingTest.java |   6 +-
 .../cassandra/utils/memory/BufferPoolTest.java | 361 +++-
 32 files changed, 1067 insertions(+), 376 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index fe3fef8..543a1cf 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-beta3
+ * Segregate Network and Chunk Cache BufferPools and Recirculate Partially 
Freed Chunks (CASSANDRA-15229)
  * Fail truncation requests when they fail on a replica (CASSANDRA-16208)
  * Move compact storage validation earlier in startup process (CASSANDRA-16063)
  * Fix ByteBufferAccessor cast exceptions are thrown when trying to query a 
virtual table (CASSANDRA-16155)
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index ff414ed..37b18f9 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -469,13 +469,22 @@ concurrent_counter_writes: 32
 # be limited by the less of concurrent reads or concurrent writes.
 concurrent_materialized_view_writes: 32
 
+# Maximum memory to use for inter-node and client-server networking buffers.
+#
+# Defaults to the smaller of 1/16 of heap or 128MB. This pool is allocated 
off-heap,
+# so is in addition to the memory allocated for heap. The cache also has 
on-heap
+# overhead which is roughly 128 bytes per chunk (i.e. 0.2% of the reserved size
+# if the default 64k chunk size is used).
+# Memory is only allocated when needed.
+# networking_cache_size_in_mb: 128
+
 # Enable the sstable chunk cache.  The chunk cache will store recently accessed
 # sections of the sstable in-memory as uncompressed buffers.
 # file_cache_enabled: false
 
 # Maximum memory to use for sstable chunk cache and buffer pooling.
-# 32MB of this are reserved for pooling buffers, the rest is used as an
-# cache that holds uncompressed sstable chunks.
+# 32MB of this are reserved for pooling buffers, the rest is used for chunk 
cache
+# that holds uncompressed sstable chunks.
 # Defaults to the smaller of 1/4 of heap or 

[jira] [Assigned] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova reassigned CASSANDRA-16177:
---

Assignee: Ekaterina Dimitrova

> jvm_upgrade_dtests job issue in CircleCI MIDRES
> ---
>
> Key: CASSANDRA-16177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16177
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with 
> MIDRES:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16177:

Complexity: Low Hanging Fruit  (was: Normal)

> jvm_upgrade_dtests job issue in CircleCI MIDRES
> ---
>
> Key: CASSANDRA-16177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16177
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with 
> MIDRES:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES

2020-10-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214745#comment-17214745
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16177:
-

The issue is that number of workers shouldn't be more than  the number of tests.

> jvm_upgrade_dtests job issue in CircleCI MIDRES
> ---
>
> Key: CASSANDRA-16177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16177
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with 
> MIDRES:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16210:
-

It turned out the issue is already solved for 4.0 with CASSANDRA-9425

Posting patch for [3.11 | 
https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Test and Documentation Plan: 
It turned out the issue is already solved for 4.0 with CASSANDRA-9425

Posting patch for [3.11 
|https://github.com/ekaterinadimitrova2/cassandra/pull/59]

CI run:

[Java8 
|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] 
 Status: Patch Available  (was: In Progress)

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16200) Nodetool ring unit testing

2020-10-15 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16200:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams
   Status: Review In Progress  (was: Patch Available)

> Nodetool ring unit testing
> --
>
> Key: CASSANDRA-16200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16200
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add nodetool ring testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow

2020-10-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214669#comment-17214669
 ] 

Brandon Williams commented on CASSANDRA-15865:
--

Go for it. :)

bq. So the HH setting, along the rest of operations, are being sent to node2 
imo.

Sam's comment was similar, and I posted the trace above with a patch like the 
PR applied, since that was clearly wrong.  When I dug in it looked like node2 
flapped once after shutdown which was causing this.  I can usually repro in a 
few hundred runs on j11, I'll see what happens.

> Flaky dtest 
> hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
> ---
>
> Key: CASSANDRA-15865
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15865
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Sam Tunnicliffe
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've seen this fail a couple of times under JDK11, when it doesn't appear to 
> be related to the changes under test.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16212) Cassandra version above 3.11.0 failing for ARM64

2020-10-15 Thread odidev (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214668#comment-17214668
 ] 

odidev commented on CASSANDRA-16212:


Hi Team

I am working on adding ARM64 support to ‘zipkin’. Zipkin uses the 
‘zipkin-cassandra’ docker image in their build. But the image is available only 
for AMD64 platform. 

‘Zipkin-cassandra’ is built from a Dockerfile which downloads and uses  
Cassandra. I have checked that Cassandra has included AArch64 support from 
version 3.11.X and above, and cassandra docker images are also available for 
ARM64 platform <[https://hub.docker.com/_/cassandra?tab=tags]>.
 For generating zipkin-cassandra docker images for ARM64, I have used cassandra 
version 3.11.0, and the image has been built fine. But if I use cassandra 
version above 3.11.0, say 3.11.8, then docker build fails with the below error 
after starting the server with “bin/cassandra -f” command:

 
{code:java}
ERROR [main] 2020-10-15 09:04:39,771 NativeLibraryLinux.java:64 - Failed to 
link the C library against JNA. Native methods will be unavailable.

java.lang.UnsatisfiedLinkError: /tmp/jna-3506402/jna3214742498288082263.tmp: 
Error loading shared library ld-linux-aarch64.so.1: No such file or directory 
(needed by /tmp/jna-3506402/jna3214742498288082263.tmp)

at java.lang.ClassLoader$NativeLibrary.load(Native Method) 
~[na:1.8.0_252]

at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1946) 
~[na:1.8.0_252]

at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1828) 
~[na:1.8.0_252]

at java.lang.Runtime.load0(Runtime.java:809) ~[na:1.8.0_252]

at java.lang.System.load(System.java:1088) ~[na:1.8.0_252]

at 
com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:851) 
~[jna-4.2.2.jar:4.2.2 (b0)]

at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:826) 
~[jna-4.2.2.jar:4.2.2 (b0)]

at com.sun.jna.Native.(Native.java:140) ~[jna-4.2.2.jar:4.2.2 
(b0)]

at com.sun.jna.NativeLibrary.(NativeLibrary.java:84) 
~[jna-4.2.2.jar:4.2.2 (b0)]

at 
org.apache.cassandra.utils.NativeLibraryLinux.(NativeLibraryLinux.java:55)
 ~[apache-cassandra-3.11.8.jar:3.11.8]

at 
org.apache.cassandra.utils.NativeLibrary.(NativeLibrary.java:95) 
[apache-cassandra-3.11.8.jar:3.11.8]

at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:203) 
[apache-cassandra-3.11.8.jar:3.11.8]

at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:628) 
[apache-cassandra-3.11.8.jar:3.11.8]

at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) 
[apache-cassandra-3.11.8.jar:3.11.8]

{code}

The build environment is based on the docker image *‘alpine:3.12’*, with 
*C.UTF-8* locale and *openjdk-8*. Here is the Dockerfile for building 
‘zipkin-cassandra’ docker image 
<[https://github.com/openzipkin/zipkin/blob/2.21.5/docker/storage/cassandra/Dockerfile]>
 and please find the Dockerfile here for the base image used in 
zipkin-cassandra dockerfile 
<[https://github.com/openzipkin/docker-java/blob/1.8.0_252-b09/Dockerfile]>.

There is a limitation in project ‘zipkin’, that the source code supports 
Cassandra version 3.11.3 and above. But if I use any other version other than 
3.11.0, I get the above error that *ld-linux-aarch64.so.1*  file is missing. 
Another constraint is to use an ‘alpine’ environment only,  as building 
‘zipkin-cassandra’ docker image involves installation script file, which has an 
alpine based coding format.

For the resolution, I followed below JIRAs raised for similar issues in 3.11.X 
series:
 # https://issues.apache.org/jira/browse/CASSANDRA-13072
 # https://issues.apache.org/jira/browse/CASSANDRA-13791
 Accordingly, I have tried removing the jna-4.2.2 jar file from /lib and 
downloaded jna-4.4.0 jar; but this has not solved the problem.
 Also, I have downloaded ‘*ld-linux-aarch64.so.1*’ from here 
<[https://ughe.github.io/data/2018/ld-linux-aarch64.so.1]> and placed it at 
/lib/, but facing the same issue.

Zipkin requires a cassandra version greater than v3.11.3 but it seems cassandra 
versions greater than v3.11.0 does not support  ARM64 platform. It will be 
helpful if we have ARM64 support in current versions or please provide me with 
some pointers on the above issue so that I can add the same.

 

> Cassandra version above 3.11.0 failing for ARM64 
> -
>
> Key: CASSANDRA-16212
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16212
> Project: Cassandra
>  Issue Type: Task
>Reporter: odidev
>Priority: Normal
>
> Cassandra versions above 3.11.0 are failing on ARM64 platform with below 
> issue:
> java.lang.UnsatisfiedLinkError: /tmp/jna-3506402/jna3214742498288082263.tmp: 
> *Error loading 

[jira] [Created] (CASSANDRA-16212) Cassandra version above 3.11.0 failing for ARM64

2020-10-15 Thread odidev (Jira)
odidev created CASSANDRA-16212:
--

 Summary: Cassandra version above 3.11.0 failing for ARM64 
 Key: CASSANDRA-16212
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16212
 Project: Cassandra
  Issue Type: Task
Reporter: odidev


Cassandra versions above 3.11.0 are failing on ARM64 platform with below issue:

java.lang.UnsatisfiedLinkError: /tmp/jna-3506402/jna3214742498288082263.tmp: 
*Error loading shared library ld-linux-aarch64.so.1: No such file or directory 
(needed by /tmp/jna-3506402/jna3214742498288082263.tmp)*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15977) 4.0 quality testing: Read Repair

2020-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214624#comment-17214624
 ] 

Andres de la Peña edited comment on CASSANDRA-15977 at 10/15/20, 11:36 AM:
---

[~jmckenzie] Indeed, I've rebased the branches and I'm running CI, I'll post 
the results once it's finished.

Also, as [discussed in 
Slack|https://the-asf.slack.com/archives/CK23JSY2K/p1602182196170300?thread_ts=1602111667.128800=CK23JSY2K],
 I'm adding a new {{@ported_to_in_jvm}} Python marker for dtests, so they are 
only skipped without vnodes, and they are easy to identify when/if we add 
support for virtual nodes in-JVM.


was (Author: adelapena):
[~jmckenzie] Indeed, I've rebased the branches and running CI, I'll post the 
results once it's finished.

Also, as [discussed in 
Slack|https://the-asf.slack.com/archives/CK23JSY2K/p1602182196170300?thread_ts=1602111667.128800=CK23JSY2K],
 I'm adding a new {{@ported_to_in_jvm}} Python marker for dtests, so they are 
only skipped without vnodes, and they are easy to identify when/if we add 
support for virtual nodes in-JVM.

> 4.0 quality testing: Read Repair
> 
>
> Key: CASSANDRA-15977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15977
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> This is a subtask of CASSANDRA-15579 focusing on read repair.
> [This 
> document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing]
>  lists and describes the existing functional tests for read repair, so we can 
> have a broad view of what is currently covered. We can comment on this 
> document and add ideas for new cases/tests, so it can gradually evolve to a 
> more or less detailed test plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15977) 4.0 quality testing: Read Repair

2020-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214624#comment-17214624
 ] 

Andres de la Peña commented on CASSANDRA-15977:
---

[~jmckenzie] Indeed, I've rebased the branches and running CI, I'll post the 
results once it's finished.

Also, as [discussed in 
Slack|https://the-asf.slack.com/archives/CK23JSY2K/p1602182196170300?thread_ts=1602111667.128800=CK23JSY2K],
 I'm adding a new {{@ported_to_in_jvm}} Python marker for dtests, so they are 
only skipped without vnodes, and they are easy to identify when/if we add 
support for virtual nodes in-JVM.

> 4.0 quality testing: Read Repair
> 
>
> Key: CASSANDRA-15977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15977
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> This is a subtask of CASSANDRA-15579 focusing on read repair.
> [This 
> document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing]
>  lists and describes the existing functional tests for read repair, so we can 
> have a broad view of what is currently covered. We can comment on this 
> document and add ideas for new cases/tests, so it can gradually evolve to a 
> more or less detailed test plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15977) 4.0 quality testing: Read Repair

2020-10-15 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214622#comment-17214622
 ] 

Josh McKenzie commented on CASSANDRA-15977:
---

ping [~adelapena] - think both those reqs are good now

> 4.0 quality testing: Read Repair
> 
>
> Key: CASSANDRA-15977
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15977
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> This is a subtask of CASSANDRA-15579 focusing on read repair.
> [This 
> document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing]
>  lists and describes the existing functional tests for read repair, so we can 
> have a broad view of what is currently covered. We can comment on this 
> document and add ideas for new cases/tests, so it can gradually evolve to a 
> more or less detailed test plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15585) 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation

2020-10-15 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214620#comment-17214620
 ] 

Josh McKenzie commented on CASSANDRA-15585:
---

{quote}Harry running regularly seems like a good "done" condition 4.0.0
{quote}
Is this you volunteering to do this work? :)

> 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation
> -
>
> Key: CASSANDRA-15585
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15585
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Jordan West*
> This area refers to contributions to test frameworks/tooling (e.g., dtests, 
> QuickTheories, CASSANDRA-14821), and automation enabling those tools to be 
> applied at scale (e.g., replay testing via Spark-based replay of captured FQL 
> logs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15580) 4.0 quality testing: Repair

2020-10-15 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214619#comment-17214619
 ] 

Josh McKenzie commented on CASSANDRA-15580:
---

Great points about the current mismatch of infra (k8s vs. else) fallout brings 
to the table. Agreed we shouldn't delay 4.0 on bringing that infra up to speed. 
There's a few other new testing frameworks that are falling into the "do 
one-off or point testing instead of coupling 4.0 GA to wiring these up in 
CI/CD" as well.

Would have to dig into the code to answer the question about the specificity on 
targeting latest on mixed version cluster repair. trunk and previous major is 
probably quite fine.

IMO, what you've enumerated above is a respectable set of new coverage for us 
to wire up that should hit the big targets we have and reduce our uncertainty 
in the process. 

> 4.0 quality testing: Repair
> ---
>
> Key: CASSANDRA-15580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15580
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Assignee: Alexander Dejanovski
>Priority: Normal
> Fix For: 4.0-rc
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Alexander Dejanovski*
> We aim for 4.0 to have the first fully functioning incremental repair 
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of 
> repair: (full range, sub range, incremental) function as expected as well as 
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an 
> experimental option to reduce the amount of data streamed during repair, we 
> should write more tests and see how it works with big nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-10-15 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208057#comment-17208057
 ] 

Josh McKenzie edited comment on CASSANDRA-15579 at 10/15/20, 11:05 AM:
---

[~bdeggleston] - confirming - you still have cycles to shepherd this?

Update: removing due to inactivity.


was (Author: jmckenzie):
[~bdeggleston] - confirming - you still have cycles to shepherd this?

> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: {color:#de350b}None{color}*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair

2020-10-15 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15579:
--
Description: 
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: {color:#de350b}None{color}*

Testing in this area focuses on non-node-local aspects of the read-write path: 
coordination, replication, read repair, etc.

  was:
Reference [doc from 
NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
 for context.

*Shepherd: Blake Eggleston*

Testing in this area focuses on non-node-local aspects of the read-write path: 
coordination, replication, read repair, etc.


> 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, 
> and Read Repair
> 
>
> Key: CASSANDRA-15579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15579
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Josh McKenzie
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: {color:#de350b}None{color}*
> Testing in this area focuses on non-node-local aspects of the read-write 
> path: coordination, replication, read repair, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas

2020-10-15 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214614#comment-17214614
 ] 

Josh McKenzie commented on CASSANDRA-15538:
---

So to summarize a checkpoint of where we are:
 # The scope on this ticket as described is quite large and untargated
 # The historical pain in the StorageEngine seems to center around LegacyLayout 
stuff w/CASSANDRA-8099
 # We believe randomized schema testing and query combinations are one of our 
best ways to increase confidence in this space
 # Harry is well suited to this work
 # Harry isn't quite ready for this in terms of us coupling the 4.0 GA with this
 # Orthogonally, reverse queries and range tombstones have historically been a 
little sketchy

1-5 imply to me a subsequent iterative approach to improving our coverage there 
as we don't necessarily have reason to believe there's major regression between 
3.0 and 4.0, or 3.11 and 4.0, in this area of the codebase. There's also been a 
significant raft of both real workload and real schema testing done against 
mixed version clusters straddling 2.1 and 3.0 so there's an argument we should 
be reasonably confident in the post CASSANDRA-8099 mixed version state.

So an option here would be to do the following (trying to keep things moving 
along; not married to this):
 # Pre 4.0: Flesh out more testing for reverse queries and range tombstones
 # Pre 4.0: Selectively take a look at code coverage analysis for unit testing 
in this domain and look for obvious gaps and beef up unit testing there
 # Post 4.0 (4.0.x): incrementally work to wire up Harry, Fallout, generative 
cassandra-diff framework testing w/user schemas (coming soon)

[~ifesdjeen] / [~aleksey]: Thoughts on ^?

 

> 4.0 quality testing: Local Read/Write Path: Other Areas
> ---
>
> Key: CASSANDRA-15538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15538
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Aleksey Yeschenko*
> Testing in this area refers to the local read/write path (StorageProxy, 
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
> finding numerous bugs and issues with the 3.0 storage engine rewrite 
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the 
> local read/write path with techniques such as property-based testing, fuzzing 
> ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
>  and a source audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow

2020-10-15 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-15865:

Test and Documentation Plan: CI in PR. The only hints failure is unrelated 
and fails also on ci-cass.
 Status: Patch Available  (was: In Progress)

> Flaky dtest 
> hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
> ---
>
> Key: CASSANDRA-15865
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15865
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Sam Tunnicliffe
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've seen this fail a couple of times under JDK11, when it doesn't appear to 
> be related to the changes under test.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

2020-10-15 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214606#comment-17214606
 ] 

Josh McKenzie commented on CASSANDRA-14746:
---

{quote}Our hope is that we can invest the time and money ahead of time instead 
of after the release for 4.0.
{quote}
There's that saying that "an ounce of prevention is worth a pound of cure" for 
a reason. :)

I'll dig around and see if I can surface any other large-scale performance 
testing. While I'd like us to move the needle on this one (as you and Vinay et 
al's work is doing), the ruthless pragmatist in me advocates for confidence in 
>= the performance of previous C* versions and getting the GA out for users and 
us iterating.

With as extensive as the changes in the MS are, the testing you all have 
enumerated here on top of all the other smaller cluster perf testing devs have 
done seems like a reasonable suite to have adequate confidence in the "no 
regression" stake in the ground. 

> Ensure Netty Internode Messaging Refactor is Solid
> --
>
> Key: CASSANDRA-14746
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14746
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Streaming and Messaging
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>  Labels: 4.0-QA
> Fix For: 4.0-beta
>
>
> Before we release 4.0 let's ensure that the internode messaging refactor is 
> 100% solid. As internode messaging is naturally used in many code paths and 
> widely configurable we have a large number of cluster configurations and test 
> configurations that must be vetted.
> We plan to vary the following:
>  * Version of Cassandra 3.0.17 vs 4.0-alpha
>  * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes
>  * Client request rates varying between 1k QPS and 100k QPS of varying sizes 
> and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...)
>  * Internode compression
>  * Internode SSL (as well as openssl vs jdk)
>  * Internode Coalescing options
> We are looking to measure the following as appropriate:
>  * Latency distributions of reads and writes (lower is better)
>  * Scaling limit, aka maximum throughput before violating p99 latency 
> deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% 
> writes, 100% reads and 50-50 writes+reads (higher is better)
>  * Thread counts (lower is better)
>  * Context switches (lower is better)
>  * On-CPU time of tasks (higher periods without context switch is better)
>  * GC allocation rates / throughput for a fixed size heap (lower allocation 
> better)
>  * Streaming recovery time for a single node failure, i.e. can Cassandra 
> saturate the NIC
>  
> The goal is that 4.0 should have better latency, more throughput, fewer 
> threads, fewer context switches, less GC allocation, and faster recovery 
> time. I'm putting Jason Brown as the reviewer since he implemented most of 
> the internode refactor.
> Current collaborators driving this QA task: Dinesh Joshi, Jordan West, Joey 
> Lynch (Netflix), Vinay Chella (Netflix)
> Owning committer(s): Jason Brown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow

2020-10-15 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214495#comment-17214495
 ] 

Berenguer Blasi commented on CASSANDRA-15865:
-

Stealing this one from [~brandon.williams] I hope you don't mid :-)

I would argue {{statushandoff}}'s output is not being used as per the test's 
code unless I am missing sthg. In fact looking at the loop {{node}} stays set 
to {{node2}} as this is the last value the initial loop leaves it to. So the HH 
setting, along the rest of operations, are being sent to {{node2}} imo.

I couldn't repro as much as I tried but I am putting a PR up. It only makes 
sure the HH of 1m is effective on all nodes before proceding.

> Flaky dtest 
> hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
> ---
>
> Key: CASSANDRA-15865
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15865
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Sam Tunnicliffe
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've seen this fail a couple of times under JDK11, when it doesn't appear to 
> be related to the changes under test.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow

2020-10-15 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi reassigned CASSANDRA-15865:
---

Assignee: Berenguer Blasi  (was: Brandon Williams)

> Flaky dtest 
> hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
> ---
>
> Key: CASSANDRA-15865
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15865
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Sam Tunnicliffe
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've seen this fail a couple of times under JDK11, when it doesn't appear to 
> be related to the changes under test.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org