[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-17 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138929#comment-17138929
 ] 

David Capwell commented on CASSANDRA-15234:
---

This isn’t a review and I’m aware I said I didn’t have time to review it a 
month ago, but I glanced at the patch and it seems quite different from the 
approach discussed in 
https://the-asf.slack.com/archives/DSP76MVPC/p1589983705004900 and 
https://the-asf.slack.com/archives/DSP76MVPC/p1590006263021200. 

Can you say a bit about why you changed your approach?

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138924#comment-17138924
 ] 

Dinesh Joshi commented on CASSANDRA-15879:
--

LGTM. [~marcuse] any thoughts?

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15879:
-
Reviewers: Dinesh Joshi, Marcus Eriksson, Dinesh Joshi  (was: Dinesh Joshi, 
Marcus Eriksson)
   Dinesh Joshi, Marcus Eriksson, Dinesh Joshi  (was: Dinesh Joshi, 
Marcus Eriksson)
   Status: Review In Progress  (was: Patch Available)

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table

2020-06-17 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-14888:
-
Status: Ready to Commit  (was: Review In Progress)

+1

> Several mbeans are not unregistered when dropping a keyspace and table
> --
>
> Key: CASSANDRA-14888
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14888
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Ariel Weisberg
>Assignee: Alex Deparvu
>Priority: Urgent
>  Labels: patch-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: CASSANDRA-14888.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> CasCommit, CasPrepare, CasPropose, ReadRepairRequests, 
> ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, 
> PartitionsValidated, RepairPrepareTime, RepairSyncTime, 
> RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, 
> WriteFailedIdealCL
> Basically for 3 years people haven't known what they are doing because the 
> entire thing is kind of obscure. Fix it and also add a dtest that detects if 
> any mbeans are left behind after dropping a table and keyspace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch circle deleted (was 6534330)

2020-06-17 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch circle
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


 was 6534330  Update config.yml

This change permanently discards the following revisions:

 discard 6534330  Update config.yml


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138694#comment-17138694
 ] 

Caleb Rackliffe commented on CASSANDRA-15879:
-

{{SASIIndexTest.testInsertingIncorrectValuesIntoAgeIndex}} is already failing 
in {{cassandra-3.11}} #justfyi

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138670#comment-17138670
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15879 at 6/17/20, 5:45 PM:
---

So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.


||3.0||3.11||trunk||
|[patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|[patch|https://github.com/apache/cassandra/pull/640]|
|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra/8/workflows/c4704991-07c8-473b-b99e-97d93b4c5f90]|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra/9/workflows/14c42f46-bb04-488f-a2d8-2718d600b848]|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-15879-trunk]|


was (Author: maedhroz):
So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.

||3.0||3.11||trunk||
|patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|[patch|https://github.com/apache/cassandra/pull/640]|
|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra/8/workflows/c4704991-07c8-473b-b99e-97d93b4c5f90]|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra/9/workflows/14c42f46-bb04-488f-a2d8-2718d600b848]|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-15879-trunk]|

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138670#comment-17138670
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15879 at 6/17/20, 5:45 PM:
---

So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.

||3.0||3.11||trunk||
|patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|[patch|https://github.com/apache/cassandra/pull/640]|
|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra/8/workflows/c4704991-07c8-473b-b99e-97d93b4c5f90]|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra/9/workflows/14c42f46-bb04-488f-a2d8-2718d600b848]|[utest|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-15879-trunk]|


was (Author: maedhroz):
So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.

||3.0||3.11||trunk||
|[patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|[patch|https://github.com/apache/cassandra/pull/640]|


> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138670#comment-17138670
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15879 at 6/17/20, 5:40 PM:
---

So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.

||3.0||3.11||trunk||
|[patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|[patch|https://github.com/apache/cassandra/pull/640]|



was (Author: maedhroz):
So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.

||3.0||3.11||
|[patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|


> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15879:

Fix Version/s: 4.0-beta
   3.11.x
   3.0.x

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15879:

Status: Patch Available  (was: In Progress)

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138670#comment-17138670
 ] 

Caleb Rackliffe commented on CASSANDRA-15879:
-

So {{CorruptedSSTablesCompactionsTest}} was inserting rows with invalid 
{{LongType}} partition keys (i.e. keys with length 1, rather than 0 or 8). That 
wasn't really a problem by itself, but the seed {{9823169134884L}} somehow 
generated some duplicate clusterings, and when {{DuplicateRowChecker}} is 
notified of the partition closing in {{onPartitionClose()}}, it [tries to 
print|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/transform/DuplicateRowChecker.java#L93]
 a human-readable partition key, which of course fails validation. Simply 
fixing {{CorruptedSSTablesCompactionsTest}} to generate valid partition keys 
seems to solve the problem.

||3.0||3.11||
|[patch|https://github.com/apache/cassandra/pull/638/commits/17a7842dd6f71935f8e5c5b8f7c0a1dd7e85c566]|[patch|https://github.com/apache/cassandra/pull/639]|


> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15879:

Test and Documentation Plan: CircleCI unit tests, as this is a test-only 
change  (was: CircleCI: TODO)

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15879:

Source Control Link:   (was: https://github.com/apache/cassandra/pull/638 
(3.0))

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15752) Range read concurrency factor didn't consider range merger

2020-06-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-15752:
--
Status: Ready to Commit  (was: Review In Progress)

> Range read concurrency factor didn't consider range merger
> --
>
> Key: CASSANDRA-15752
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15752
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> During range read, coordinator computes concurrency factor which is the 
> number of vnode ranges to contact in parallel for the next batch.
> But in {{RangeCommandIterator}}, vnode ranges are merged by {{RangeMerger}} 
> if vnode ranges share enough replicas to satisfy consistency level. eg. vnode 
> range [a,b) has replica n1,n2,n3 and vnode range [b,c) has replica n2,n3,n4, 
> so they can be merged as range [a,c) with replica n2, n3 for Quorum.
> Currently it counts number of merged ranges towards concurrency factor. 
> Coordinator may fetch more ranges than needed.
> 
> Another issue is that when executing range read on table with very small 
> amount of data, concurrency factor can be bumped to {{size of total vnode 
> ranges}}, eg. 10k, depending on the num of vnodes and cluster size. As a 
> result, coordinator will send large number of concurrent range requests, 
> potentially slowing down the cluster.. We should cap the max concurrency 
> factor..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15867) Update Jackson version to 2.9.10.1 because there are security issues in 2.9.5

2020-06-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138620#comment-17138620
 ] 

Brandon Williams commented on CASSANDRA-15867:
--

Commited, thanks!

> Update Jackson version to 2.9.10.1 because there are security issues in 2.9.5
> -
>
> Key: CASSANDRA-15867
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15867
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0-alpha5
>
> Attachments: dependency-check-report.html
>
>
> Please see attached HTML report from OWASP dependency check for current 
> 4.0-alpha5 trunk branch.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15867) Update Jackson version to 2.9.10.1 because there are security issues in 2.9.5

2020-06-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15867:
-
Fix Version/s: 3.11.7

> Update Jackson version to 2.9.10.1 because there are security issues in 2.9.5
> -
>
> Key: CASSANDRA-15867
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15867
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-alpha5
>
> Attachments: dependency-check-report.html
>
>
> Please see attached HTML report from OWASP dependency check for current 
> 4.0-alpha5 trunk branch.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.11 updated: update Jackson to 2.9.10

2020-06-17 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-3.11 by this push:
 new 576cb2b  update Jackson to 2.9.10
576cb2b is described below

commit 576cb2b8a4267467b507cb88e841462314d2aaee
Author: Stefan Miklosovic 
AuthorDate: Sat Jun 13 17:09:00 2020 +0200

update Jackson to 2.9.10

Patch by Stefan Miklosovic, reviewed by brandonwilliams for
CASSANDRA-15867
---
 CHANGES.txt   |   1 +
 build.xml |  10 ++
 lib/jackson-annotations-2.9.10.jar| Bin 0 -> 66894 bytes
 lib/jackson-core-2.9.10.jar   | Bin 0 -> 325636 bytes
 lib/jackson-core-asl-1.9.13.jar   | Bin 232248 -> 0 bytes
 lib/jackson-databind-2.9.10.4.jar | Bin 0 -> 1349395 bytes
 lib/jackson-mapper-asl-1.9.13.jar | Bin 780664 -> 0 bytes
 src/java/org/apache/cassandra/cql3/Json.java  |   4 ++--
 .../org/apache/cassandra/tools/JsonTransformer.java   |  12 +---
 src/java/org/apache/cassandra/utils/FBUtilities.java  |   5 ++---
 .../apache/cassandra/index/sasi/SASIIndexTest.java|  14 +++---
 11 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 0f730d4..114ce07 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.11.7
+ * Upgrade Jackson to 2.9.10 (CASSANDRA-15867)
  * Fix CQL formatting of read command restrictions for slow query log 
(CASSANDRA-15503)
  * Allow sstableloader to use SSL on the native port (CASSANDRA-14904)
 Merged from 3.0:
diff --git a/build.xml b/build.xml
index 0724dbb..25a4733 100644
--- a/build.xml
+++ b/build.xml
@@ -406,8 +406,9 @@
   
   
   
-  
-  
+  
+  
+  
   
   
   
@@ -627,8 +628,9 @@
 
 
 
-
-
+
+
+
 
 
 
diff --git a/lib/jackson-annotations-2.9.10.jar 
b/lib/jackson-annotations-2.9.10.jar
new file mode 100644
index 000..de054c6
Binary files /dev/null and b/lib/jackson-annotations-2.9.10.jar differ
diff --git a/lib/jackson-core-2.9.10.jar b/lib/jackson-core-2.9.10.jar
new file mode 100644
index 000..1b5e87c
Binary files /dev/null and b/lib/jackson-core-2.9.10.jar differ
diff --git a/lib/jackson-core-asl-1.9.13.jar b/lib/jackson-core-asl-1.9.13.jar
deleted file mode 100644
index bb4fe1d..000
Binary files a/lib/jackson-core-asl-1.9.13.jar and /dev/null differ
diff --git a/lib/jackson-databind-2.9.10.4.jar 
b/lib/jackson-databind-2.9.10.4.jar
new file mode 100644
index 000..9045f2f
Binary files /dev/null and b/lib/jackson-databind-2.9.10.4.jar differ
diff --git a/lib/jackson-mapper-asl-1.9.13.jar 
b/lib/jackson-mapper-asl-1.9.13.jar
deleted file mode 100644
index 0f2073f..000
Binary files a/lib/jackson-mapper-asl-1.9.13.jar and /dev/null differ
diff --git a/src/java/org/apache/cassandra/cql3/Json.java 
b/src/java/org/apache/cassandra/cql3/Json.java
index 2e67a1e..af004a8 100644
--- a/src/java/org/apache/cassandra/cql3/Json.java
+++ b/src/java/org/apache/cassandra/cql3/Json.java
@@ -20,6 +20,8 @@ package org.apache.cassandra.cql3;
 import java.io.IOException;
 import java.util.*;
 
+import com.fasterxml.jackson.core.io.JsonStringEncoder;
+import com.fasterxml.jackson.databind.ObjectMapper;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.functions.Function;
@@ -27,8 +29,6 @@ import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.UTF8Type;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.serializers.MarshalException;
-import org.codehaus.jackson.io.JsonStringEncoder;
-import org.codehaus.jackson.map.ObjectMapper;
 
 /** Term-related classes for INSERT JSON support. */
 public class Json
diff --git a/src/java/org/apache/cassandra/tools/JsonTransformer.java 
b/src/java/org/apache/cassandra/tools/JsonTransformer.java
index 22b8fc7..315bb7f 100644
--- a/src/java/org/apache/cassandra/tools/JsonTransformer.java
+++ b/src/java/org/apache/cassandra/tools/JsonTransformer.java
@@ -30,6 +30,9 @@ import java.util.List;
 import java.util.concurrent.TimeUnit;
 import java.util.stream.Stream;
 
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.db.*;
@@ -49,11 +52,6 @@ import org.apache.cassandra.db.rows.UnfilteredRowIterator;
 import 

[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2020-06-17 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 4d1463f601439279bd632beca2c0b2edec9c9347
Merge: eacdfc4 576cb2b
Author: Brandon Williams 
AuthorDate: Wed Jun 17 11:21:54 2020 -0500

Merge branch 'cassandra-3.11' into trunk

 CHANGES.txt | 1 +
 1 file changed, 1 insertion(+)

diff --cc CHANGES.txt
index e6ecb42,114ce07..c51a870
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,47 -1,7 +1,48 @@@
 -3.11.7
 +4.0-alpha5
 + * Fix missing topology events when running multiple nodes on the same 
network interface (CASSANDRA-15677)
 + * Create config.yml.MIDRES (CASSANDRA-15712)
 + * Fix handling of fully purged static rows in repaired data tracking 
(CASSANDRA-15848)
 + * Prevent validation request submission from blocking ANTI_ENTROPY stage 
(CASSANDRA-15812)
 + * Add fqltool and auditlogviewer to rpm and deb packages (CASSANDRA-14712)
 + * Include DROPPED_COLUMNS in schema digest computation (CASSANDRA-15843)
 + * Fix Cassandra restart from rpm install (CASSANDRA-15830)
 + * Improve handling of 2i initialization failures (CASSANDRA-13606)
 + * Add completion_ratio column to sstable_tasks virtual table (CASANDRA-15759)
 + * Add support for adding custom Verbs (CASSANDRA-15725)
 + * Speed up entire-file-streaming file containment check and allow 
entire-file-streaming for all compaction strategies 
(CASSANDRA-15657,CASSANDRA-15783)
 + * Provide ability to configure IAuditLogger (CASSANDRA-15748)
 + * Fix nodetool enablefullquerylog blocking param parsing (CASSANDRA-15819)
 + * Add isTransient to SSTableMetadataView (CASSANDRA-15806)
 + * Fix tools/bin/fqltool for all shells (CASSANDRA-15820)
 + * Fix clearing of legacy size_estimates (CASSANDRA-15776)
 + * Update port when reconnecting to pre-4.0 SSL storage (CASSANDRA-15727)
 + * Only calculate dynamicBadnessThreshold once per loop in 
DynamicEndpointSnitch (CASSANDRA-15798)
 + * Cleanup redundant nodetool commands added in 4.0 (CASSANDRA-15256)
 + * Update to Python driver 3.23 for cqlsh (CASSANDRA-15793)
 + * Add tunable initial size and growth factor to RangeTombstoneList 
(CASSANDRA-15763)
 + * Improve debug logging in SSTableReader for index summary (CASSANDRA-15755)
 + * bin/sstableverify should support user provided token ranges 
(CASSANDRA-15753)
 + * Improve logging when mutation passed to commit log is too large 
(CASSANDRA-14781)
 + * replace LZ4FastDecompressor with LZ4SafeDecompressor (CASSANDRA-15560)
 + * Fix buffer pool NPE with concurrent release due to in-progress tiny pool 
eviction (CASSANDRA-15726)
 + * Avoid race condition when completing stream sessions (CASSANDRA-15666)
 + * Flush with fast compressors by default (CASSANDRA-15379)
 + * Fix CqlInputFormat regression from the switch to system.size_estimates 
(CASSANDRA-15637)
 + * Allow sending Entire SSTables over SSL (CASSANDRA-15740)
 + * Fix CQLSH UTF-8 encoding issue for Python 2/3 compatibility 
(CASSANDRA-15739)
 + * Fix batch statement preparation when multiple tables and parameters are 
used (CASSANDRA-15730)
 + * Fix regression with traceOutgoingMessage printing message size 
(CASSANDRA-15687)
 + * Ensure repaired data tracking reads a consistent amount of data across 
replicas (CASSANDRA-15601)
 + * Fix CQLSH to avoid arguments being evaluated (CASSANDRA-15660)
 + * Correct Visibility and Improve Safety of Methods in LatencyMetrics 
(CASSANDRA-15597)
 + * Allow cqlsh to run with Python2.7/Python3.6+ 
(CASSANDRA-15659,CASSANDRA-15573)
 + * Improve logging around incremental repair (CASSANDRA-15599)
 + * Do not check cdc_raw_directory filesystem space if CDC disabled 
(CASSANDRA-15688)
 + * Replace array iterators with get by index (CASSANDRA-15394)
 + * Minimize BTree iterator allocations (CASSANDRA-15389)
 +Merged from 3.11:
+  * Upgrade Jackson to 2.9.10 (CASSANDRA-15867)
   * Fix CQL formatting of read command restrictions for slow query log 
(CASSANDRA-15503)
 - * Allow sstableloader to use SSL on the native port (CASSANDRA-14904)
  Merged from 3.0:
   * Catch exception on bootstrap resume and init native transport 
(CASSANDRA-15863)
   * Fix replica-side filtering returning stale data with CL > ONE 
(CASSANDRA-8272, CASSANDRA-8273)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated (eacdfc4 -> 4d1463f)

2020-06-17 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from eacdfc4  Merge branch 'cassandra-3.11' into trunk
 new 576cb2b  update Jackson to 2.9.10
 new 4d1463f  Merge branch 'cassandra-3.11' into trunk

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15863) Bootstrap resume and TestReplaceAddress fixes

2020-06-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138606#comment-17138606
 ] 

Brandon Williams edited comment on CASSANDRA-15863 at 6/17/20, 4:05 PM:


And indeed, I grabbed the patch.  Also, I don't see it here: 
https://github.com/apache/cassandra/pull/622/commits/c6b87ee779e3e1e6d74b80eafe48b99175a7af1a#diff-b76a607445d53f18a98c9df14323c7ddR1331

Scary and strange, indeed.


was (Author: brandon.williams):
And indeed, I grabbed, the patch.  Also, I don't see it here: 
https://github.com/apache/cassandra/pull/622/commits/c6b87ee779e3e1e6d74b80eafe48b99175a7af1a#diff-b76a607445d53f18a98c9df14323c7ddR1331

Scary and strange, indeed.

> Bootstrap resume and TestReplaceAddress fixes
> -
>
> Key: CASSANDRA-15863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15863
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission, Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been 
> [broken|https://ci-cassandra.apache.org/job/Cassandra-trunk/159/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace/history/]
>  for ages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15863) Bootstrap resume and TestReplaceAddress fixes

2020-06-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138606#comment-17138606
 ] 

Brandon Williams commented on CASSANDRA-15863:
--

And indeed, I grabbed, the patch.  Also, I don't see it here: 
https://github.com/apache/cassandra/pull/622/commits/c6b87ee779e3e1e6d74b80eafe48b99175a7af1a#diff-b76a607445d53f18a98c9df14323c7ddR1331

Scary and strange, indeed.

> Bootstrap resume and TestReplaceAddress fixes
> -
>
> Key: CASSANDRA-15863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15863
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission, Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been 
> [broken|https://ci-cassandra.apache.org/job/Cassandra-trunk/159/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace/history/]
>  for ages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14825) Expose table schema for drivers

2020-06-17 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-14825:
---
Status: In Progress  (was: Patch Available)

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Robert Stupp
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14825) Expose table schema for drivers

2020-06-17 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-14825:
---
Status: Patch Available  (was: Review In Progress)

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Robert Stupp
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14825) Expose table schema for drivers

2020-06-17 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-14825:
---
Status: Review In Progress  (was: Changes Suggested)

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Robert Stupp
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138578#comment-17138578
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

Ok, I understand what you are suggesting now and I agree this should work as 
well. And it does is more optimal.

I like to think of our algorithm as "pure Paxos instances" separated by the MRC 
to tell us when we can forget the previous instance and start a new one.  
Committing empty updates as any other updates still fits that mental model, 
while your suggestion adds a bit of a special case in that it bends the Paxos 
rules slightly, allowing to sometime ignore a previously accepted value in a 
promise (when it's empty). Which is not a criticism, just thinking out loud.  
It's more performant and this is likely worth the slight special casing since 
it's not too hard to reason about its correctness.

I'll sleep on it and modify to your suggestion tomorrow (which is trivial, just 
need to massage an appropriate comment to explain it).


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15879:

Status: Open  (was: Patch Available)

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138569#comment-17138569
 ] 

Caleb Rackliffe commented on CASSANDRA-15879:
-

After a bit more digging, it seems like the seed used in the failing case above 
(9823169134884L), only triggers failure on {{cassandra-3.11}} and {{trunk}}. 
For some reason, it doesn't affect {{cassandra-3.0}}.

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138536#comment-17138536
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:


To say it another way: the only purpose of an empty proposal is to poison 
earlier proposals, and this can be done just as well without moving the 
proposal to the MRC column in the table.  If we treat it as any other "in 
progress" proposal for invalidating earlier proposals, then once we reach a 
quorum we must in future be witnessed alongside any earlier proposals and 
invalidate them.  If we didn't reach a quorum, then it doesn't matter if we are 
witnessed or not, or if any earlier proposals are invalidated.



> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15879) Flaky unit test: BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy

2020-06-17 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137977#comment-17137977
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15879 at 6/17/20, 2:52 PM:
---

It seems the test has been renamed (just a few days ago) to 
{{CorruptedSSTablesCompactionsTest}}.


was (Author: maedhroz):
It seems the test has been renamed to {{CorruptedSSTablesCompactionsTest}}.

> Flaky unit test: 
> BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy
> -
>
> Key: CASSANDRA-15879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15879
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-14238 addressed the failure in 
> {{BlacklistingCompactionsTest.testBlacklistingWithSizeTieredCompactionStrategy}},
>  but only on 2.2. While working on CASSANDRA-14888, we’ve reproduced [the 
> failure|https://app.circleci.com/pipelines/github/dineshjoshi/cassandra/47/workflows/de5f7cdb-06b6-4869-9d19-81a145e79f3f/jobs/2516/tests]
>  on trunk.
> It looks like this should be a clean merge forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138503#comment-17138503
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 6/17/20, 2:42 PM:
--

I'm not proposing we do anything different for your patch, just clarifying that 
this isn't strictly necessary - it is quite possible to modify the algorithm to 
never commit empty proposals.  The problem today is that we:

# "Refresh" a quorum with the MRC if not witnessed by all promisers 
# Filter out empty proposals when deciding if we have an in progress proposal 
({{mostRecentInProgressCommit}} vs {{mostRecentInProgressCommitWithUpdate}})

If instead we did not refresh empty commits, and we did not filter out empty 
proposals when _updating_ {{mostRecentInProgressCommitWithUpdate}} but did not 
_complete_ any empty proposals we found then everything would be fine.

{{mostRecentInProgressCommitWithUpdate}} confuses matters because it is poorly 
named, and is updated by its naming rather than intent - I think it is _meant_ 
to be {{mostRecentInProgressProposal}} whereas {{mostRecentInProgressCommit}} 
should be e.g. {{mostRecentInProgressPromiseOrProposal}}, and 
{{mostRecentInProgressProposal}} would gain empty proposals as well as 
non-empty ones, and correctly discount the older in progress proposal that was 
invalidated by the newer read that did not witness it.

To be clear, I'm mostly participating in this discussion for my own benefit and 
for the benefit of future work, not trying to solicit changes to your work.


was (Author: benedict):
I'm not proposing we do anything different for your patch, just clarifying that 
this isn't strictly necessary - it is quite possible to modify the algorithm to 
never commit empty proposals.  The problem today is that we:

# "Refresh" a quorum with the MRC if not witnessed by all promisers 
# Filter out empty proposals when deciding if we have an in progress proposal 
({{mostRecentInProgressCommit}} vs {{mostRecentInProgressCommitWithUpdate}})

If instead we did not refresh empty commits, and we did not filter out empty 
proposals when _updating_ {{mostRecentInProgressCommitWithUpdate}} but did not 
_complete_ any empty proposals we found then everything would be fine.

{{mostRecentInProgressCommitWithUpdate}} confuses matters because it is poorly 
named, and is updated by its naming rather than intent - I think it is _meant_ 
to be {{mostRecentInProgressProposal}} whereas {{mostRecentInProgressCommit}} 
should be e.g. {{mostRecentInProgressPromiseOrProposal}}, and 
{{mostRecentInProgressProposal}} would gain empty proposals as well as 
non-empty ones, and correctly discount the older in progress proposal that was 
invalidated by the newer read that did not witness it.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no 

[jira] [Commented] (CASSANDRA-15752) Range read concurrency factor didn't consider range merger

2020-06-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138501#comment-17138501
 ] 

Andres de la Peña commented on CASSANDRA-15752:
---

+1

> Range read concurrency factor didn't consider range merger
> --
>
> Key: CASSANDRA-15752
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15752
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> During range read, coordinator computes concurrency factor which is the 
> number of vnode ranges to contact in parallel for the next batch.
> But in {{RangeCommandIterator}}, vnode ranges are merged by {{RangeMerger}} 
> if vnode ranges share enough replicas to satisfy consistency level. eg. vnode 
> range [a,b) has replica n1,n2,n3 and vnode range [b,c) has replica n2,n3,n4, 
> so they can be merged as range [a,c) with replica n2, n3 for Quorum.
> Currently it counts number of merged ranges towards concurrency factor. 
> Coordinator may fetch more ranges than needed.
> 
> Another issue is that when executing range read on table with very small 
> amount of data, concurrency factor can be bumped to {{size of total vnode 
> ranges}}, eg. 10k, depending on the num of vnodes and cluster size. As a 
> result, coordinator will send large number of concurrent range requests, 
> potentially slowing down the cluster.. We should cap the max concurrency 
> factor..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138503#comment-17138503
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:


I'm not proposing we do anything different for your patch, just clarifying that 
this isn't strictly necessary - it is quite possible to modify the algorithm to 
never commit empty proposals.  The problem today is that we:

# "Refresh" a quorum with the MRC if not witnessed by all promisers 
# Filter out empty proposals when deciding if we have an in progress proposal 
({{mostRecentInProgressCommit}} vs {{mostRecentInProgressCommitWithUpdate}})

If instead we did not refresh empty commits, and we did not filter out empty 
proposals when _updating_ {{mostRecentInProgressCommitWithUpdate}} but did not 
_complete_ any empty proposals we found then everything would be fine.

{{mostRecentInProgressCommitWithUpdate}} confuses matters because it is poorly 
named, and is updated by its naming rather than intent - I think it is _meant_ 
to be {{mostRecentInProgressProposal}} whereas {{mostRecentInProgressCommit}} 
should be e.g. {{mostRecentInProgressPromiseOrProposal}}, and 
{{mostRecentInProgressProposal}} would gain empty proposals as well as 
non-empty ones, and correctly discount the older in progress proposal that was 
invalidated by the newer read that did not witness it.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138473#comment-17138473
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

I'll have to apologize, but I don't understand what you are suggesting.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138464#comment-17138464
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 6/17/20, 2:12 PM:
--

The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which (seems to) assume that an empty 
update is for a higher promise (since the meaning is overloaded in the response 
message) rather than an "incomplete" proposal. If the empty proposal were to be 
correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it would 
override the early non-empty incomplete proposal.

Which is a long-winded way of saying that I am fairly confident there's no need 
to update the paxos state table with the "committed" status of this empty 
proposal so long as it remains in the table _as an accepted proposal_, and so 
long as this accepted proposal continues to override earlier in progress 
proposals.



was (Author: benedict):
The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which (seems to) assume that an empty 
update is for a higher promise (since the meaning is overloaded in the response 
message) rather than an "incomplete" proposal. If the empty proposal were to be 
correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it would 
override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138464#comment-17138464
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 6/17/20, 2:12 PM:
--

The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which (seems to) assume that an empty 
update is for a higher promise (since the meaning is overloaded in the response 
message) rather than an "incomplete" proposal. If the empty proposal were to be 
correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it would 
override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.



was (Author: benedict):
The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which assumes an empty update is for 
a higher promise rather than an "incomplete" proposal. If the empty proposal 
were to be correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it 
would override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138464#comment-17138464
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:


The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which assumes an empty update is for 
a higher promise rather than an "incomplete" proposal. If the empty proposal 
were to be correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it 
would override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138455#comment-17138455
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. I'm reasonably sure it cannot be necessary for us to commit an empty 
proposal, because we do not ever need to witness it.

We may have to be precise. We do not need to "apply" an empty commit, since 
it's a no-op, and the patch actually ensures we don't bother. But "committed" 
do something else, it update the "mrc" value, and _that_ needs to be done. 
Otherwise, if we _accept_ an empty proposal, yet does not update the "mrc" 
value, we will not do progress anymore (well, without additional modification 
to the algorithm that is).

But I could be misunderstanding what you are suggesting here. I'll note though, 
just in case that help, that the logic I'm calling faulty is not the _commit_ 
of empty updates (though, as said above, I think it's necessary for the sake of 
the mrc value), it's the fact the don't replay the _proposal_ of empty updates. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138439#comment-17138439
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:


So, I'm reasonably sure it cannot be necessary for us to commit an empty 
proposal, because we do not ever need to witness it.  Either the proposal was 
agreed by a quorum (and the proposer can report this) but it has no visible 
effect on future proposals, and does not need to be witnessed by anybody else, 
or it was not agreed and it does not need to be either proposed again, 
committed or witnessed by anybody else.

However we have to be consistent about it: we either need to _never_ commit 
them, or _always_ commit them.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15459) Short read protection doesn't work on group-by queries

2020-06-17 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15459:
---
Reviewers: Benjamin Lerer

> Short read protection doesn't work on group-by queries
> --
>
> Key: CASSANDRA-15459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15459
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: ZhaoYang
>Assignee: Andres de la Peña
>Priority: Normal
>  Labels: correctness
> Fix For: 3.11.7, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [DTest to 
> reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]:
>  it affects all versions..
> {code}
> In a two-node cluster with RF = 2
> Execute only on Node1:
> * Insert pk=1 and ck=1 with timestamp 9
> * Delete pk=0 and ck=0 with timestamp 10
> * Insert pk=2 and ck=2 with timestamp 9
> Execute only on Node2:
> * Delete pk=1 and ck=1 with timestamp 10
> * Insert pk=0 and ck=0 with timestamp 9
> * Delete pk=2 and ck=2 with timestamp 10
> Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1"
> * Expect no live data, but got [0, 0]
> {code}
> Note: for group-by queries, SRP should use "group counted" to calculate 
> limits used for SRP query, rather than "row counted".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138370#comment-17138370
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

I noticed that the previous version of the patches wasn't working in all cases 
due to an existing quirk of the CAS implementation.

Namely, accepted updates that were empty were not replayed by 
{{beginAndRepairPaxos}}. Which is a problem for the new empty commits made 
during serial reads/non-applying CAS. I added tests to show that if the commit 
messages for those empty commits were lost/delayed, we could still have 
linearizability violations.

Now, the logic of not replaying empty updates looks wrong to me. There 
shouldn't be anything special about an empty update, and if one is explicitely 
accepted by a quorum of nodes, we shouldn't ignore it, or that's a break of the 
Paxos algorithm (as kind of can be demonstrated by the tests I added).

To be clear, that logic was added *by me* in CASSANDRA-6012 and that was the 
sole purpose of that ticket. Except that I can't make sense of my reasoning 
back then, and since I didn't included a test to demonstrate the problem I was 
solving back then (which was wrong, mea culpa), I have to assume that I was 
just confused (maybe I mixed in my head promised ballots and accepted ones?). 
Anyway, I think the fix here is simply to remove that bad logic, which fixes 
the issue, and I included an additional commit for that.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15459) Short read protection doesn't work on group-by queries

2020-06-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-15459:
--
Test and Documentation Plan: 
https://github.com/apache/cassandra-dtest/pull/77
 Status: Patch Available  (was: In Progress)

> Short read protection doesn't work on group-by queries
> --
>
> Key: CASSANDRA-15459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15459
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: ZhaoYang
>Assignee: Andres de la Peña
>Priority: Normal
>  Labels: correctness
> Fix For: 3.11.7, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [DTest to 
> reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]:
>  it affects all versions..
> {code}
> In a two-node cluster with RF = 2
> Execute only on Node1:
> * Insert pk=1 and ck=1 with timestamp 9
> * Delete pk=0 and ck=0 with timestamp 10
> * Insert pk=2 and ck=2 with timestamp 9
> Execute only on Node2:
> * Delete pk=1 and ck=1 with timestamp 10
> * Insert pk=0 and ck=0 with timestamp 9
> * Delete pk=2 and ck=2 with timestamp 10
> Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1"
> * Expect no live data, but got [0, 0]
> {code}
> Note: for group-by queries, SRP should use "group counted" to calculate 
> limits used for SRP query, rather than "row counted".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15459) Short read protection doesn't work on group-by queries

2020-06-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138356#comment-17138356
 ] 

Andres de la Peña commented on CASSANDRA-15459:
---

I think that the way "row counted" is used to calculate limits is not the cause 
of the problem here. The comments about this usage in 
[{{ShortReadPartitionsProtection#counted}}|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/service/DataResolver.java#L762-L768]
 and 
[{{ShortReadPartitionsProtection.ShortReadRowsProtection#countedInCurrentPartition}}|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/service/DataResolver.java#L907-L913]
 do not seem to describe what the method is actually doing. These comments 
where added during CASSANDRA-10707, replacing the original comments added 
during CASSANDRA-13794 
([here|https://github.com/apache/cassandra/commit/2bae4ca907ac4d2ab53c899e5cf5c9e4de631f52]).
 I'm restoring the original description of those methods.

It seems to me that the cause of the error is 
[here|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/db/filter/DataLimits.java#L966].
 Implementations of {{DataLimit.Counter}} are meant to both count results and 
also to limit them, being a {{StoppingTransformation}}. The method 
{{DataLimit.Counter#onlyCount}} allows to disable their result-limiting 
behaviour, so they only count results without transforming them. The counter 
[{{singleResultCounter}}|https://github.com/apache/cassandra/blob/4f50a6712ada5c4298ec860836015ea15049cbda/src/java/org/apache/cassandra/service/DataResolver.java#L630-L631]
 in short read protection uses this read-only behaviour, so it counts the 
replica results without truncating them, in case more replica results are 
needed after reconciliation. However, the method 
{{GroupByAwareCounter#applyToRow}} unconditionally returns a {{null}} row in 
case the read partition has more rows than the specified by the limit, which 
can violate the count-only behaviour. Something similar happens in 
{{GroupByAwareCounter#applyToStatic}}. The proposed PR simply takes into 
account the {{Counter.enforceLimits}} to prevent this filtering.

The dtest PR just adds the excellent test provided by [~jasonstack] in the 
description, with a minimal change to disable read-repair in 3.11 with Byteman, 
because we don't have the {{NONE}} repair strategy available in that version. 
I'm also excluding 3.0 because {{GROUP BY}} was added in 3.10.

CI results:
||branch||ci-cassandra utest||ci-cassandra dtest||CircleCI j8||CircleCI j11||
|[3.11|https://github.com/apache/cassandra/pull/635]|[126|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/126/]|[163|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/163/]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/53/workflows/ce4d2cad-a811-43af-a215-b4ea71260d0e]||
|[trunk|https://github.com/apache/cassandra/pull/636]|[127|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/127/]|[164|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/163/]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/55/workflows/9f73dadc-a963-43b6-8792-ea5e0c0e17c8]|[link|https://app.circleci.com/pipelines/github/adelapena/cassandra/55/workflows/56a025b0-4c18-4eae-9f77-2c261b1d2cc5]|

CC [~blerer]


> Short read protection doesn't work on group-by queries
> --
>
> Key: CASSANDRA-15459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15459
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: ZhaoYang
>Assignee: Andres de la Peña
>Priority: Normal
>  Labels: correctness
> Fix For: 3.11.7, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [DTest to 
> reproduce|https://github.com/apache/cassandra-dtest/compare/master...jasonstack:srp_group_by_trunk?expand=1]:
>  it affects all versions..
> {code}
> In a two-node cluster with RF = 2
> Execute only on Node1:
> * Insert pk=1 and ck=1 with timestamp 9
> * Delete pk=0 and ck=0 with timestamp 10
> * Insert pk=2 and ck=2 with timestamp 9
> Execute only on Node2:
> * Delete pk=1 and ck=1 with timestamp 10
> * Insert pk=0 and ck=0 with timestamp 9
> * Delete pk=2 and ck=2 with timestamp 10
> Query: "SELECT pk, c FROM %s GROUP BY pk LIMIT 1"
> * Expect no live data, but got [0, 0]
> {code}
> Note: for group-by queries, SRP should use "group counted" to calculate 
> limits used for SRP query, rather than "row counted".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-17 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138303#comment-17138303
 ] 

ZhaoYang commented on CASSANDRA-15861:
--

bq. One way could be to mark the sstable compacting while we stream the index 
summary and sstable metadata components

if the sstables are already in compacting state, does it mean 
entire-sstable-streaming will be blocked until compaction is finished? 

It'd be nice to minize the lock scope, so "critical section" only include 
"metadata mutation" (rewrite index summary and stats metadata) which should be 
fast.

> Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> ---
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> 

[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138278#comment-17138278
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

I have made a first pass of review and offered a few remarks above.

But I think this ticket is hang up on us deciding whether removing the KEYS 
2ndary index code is ok or not. And this yield, to me, the question of what is 
the upgrade path to 4.0 for users that still have KEYS index (which, reminder, 
could only be created with Thrift, but could _used_ with CQL and thus still be 
around).

Because, while I haven't tested this myself, I suspect we have a hole here.

Namely, KEYS index were compact tables, and 4.0 does not *start* if there is 
still compact tables. And while for user tables, user are asked to use {{DROP 
COMPACT STORAGE}} before upgrading, this cannot be done on KEYS index (there is 
just no syntax to do it), so unless there is code I'm not aware of (and please, 
someone correct me if I'm wrong), I don't think user can upgrade to 4.0 at all 
if they still have KEYS index. They'd have to drop those index first.

So If I'm right here, this technically mean removing the KEYS index code in 4.0 
is fine, since you cannot upgrade in the first place if you have KEYS index. 
But the more important question for 4.0 imo is what is the upgrade path for 
users if they have a KEYS index in 3.X?

Currently (without code changes), the only available option I can think of is 
that before upgrade to 4.0, users would have to 1) drop their KEYS index and 
then 2) re-create a "normal" (non-KEYS) equivalent index.

Are we comfortable with that being the upgrade path for KEYS index?

Personally, I'm not sure I am because this is not a seamless upgrade, as 
between the 1) and 2) above, there is a window where there is no accessible 
index, so if the user application rely on it, it means a period of downtime for 
the application to perform the upgrade. However, if we want a more seamless 
upgrade, we need to figure something out, and that probably involve non trivial 
amounts of code and testing. And, playing devil's advocate, KEYS index being so 
old, maybe nobody that plans to upgrade to 4.0 have them anymore, and maybe 
it's not worth bothering?

So I could use others opinions here.

Tl;dr, this ticket raises the point that "Oops, I'm not sure we have though 
through the question of upgrade to 4.0 for KEYS indexes". And tbc, it's not 
directly related to this ticket, only indirectly, but it is still something we 
need to figure out. And I'd say, before 4.0-alpha. But I'm happy to create a 
separate ticket specific to that question if that helps.

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-17 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138270#comment-17138270
 ] 

Marcus Eriksson commented on CASSANDRA-15861:
-

One way could be to mark the sstable compacting while we stream the index 
summary and sstable metadata components

> Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> ---
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> {{CassandraEntireSSTableStreamReader#read}}.
> {code}
> This isn't a problem in legacy streaming as STATS file length didn't matter.
> Ideally it will be great to make sstable STATS metadata immutable, just like 
> other sstable components, so we don't have to worry this special case.
> I can 

[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-17 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138135#comment-17138135
 ] 

ZhaoYang edited comment on CASSANDRA-15861 at 6/17/20, 8:57 AM:


[~dcapwell] you are right. {{IndexSummary}} can definitely cause trouble for 
entire-sstable-streaming.. Then the only option we have is to apply first 
approach to {{IndexSummary}} because we can't make {{IndexSummary}} 
fixed-length encoding..

Or we can consider a lock approach


was (Author: jasonstack):
[~dcapwell] you are right. {{IndexSummary}} can definitely cause trouble for 
entire-sstable-streaming.. Then the only option we have is to apply first 
approach to {{IndexSummary}} because we can't make {{IndexSummary}} 
fixed-length encoding..

> Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> ---
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is 

[jira] [Commented] (CASSANDRA-15782) Compression test failure

2020-06-17 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138225#comment-17138225
 ] 

Berenguer Blasi commented on CASSANDRA-15782:
-

Hey [~maedhroz] ;-)

[~jolynch] seems to be able to fix these in the drop of a hat :-) , so I'll 
defer to him for the time. Other than that we'll have to bisect etc etc

> Compression test failure
> 
>
> Key: CASSANDRA-15782
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15782
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0, 4.0-alpha5
>
>
> On CASSANDRA-15560 compression test failed. This was bisected to 
> [9c1bbf3|https://github.com/apache/cassandra/commit/9c1bbf3ac913f9bdf7a0e0922106804af42d2c1e]
>  from CASSANDRA-15379.
> Full details here
> CC/ [~jolynch] in case he can spot it quick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15863) Bootstrap resume and TestReplaceAddress fixes

2020-06-17 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138215#comment-17138215
 ] 

Berenguer Blasi commented on CASSANDRA-15863:
-

Whaaat? impossible :) I had to compile all the branches locally many times to 
run the dtests locally. I just {{git clean -xfd}} and it compiles perfectly for 
me. Also i.e. the 4.0 PR has the [right 
boolean|https://github.com/apache/cassandra/pull/622/files#diff-b76a607445d53f18a98c9df14323c7ddR1620]
 if I am following you correctly.

Might this be a GH bug? The boolean is present in the 
[commit|https://github.com/apache/cassandra/pull/622/commits/2d451c68e280ea8ba2e5580ce780d1079e19dfd0#diff-b76a607445d53f18a98c9df14323c7ddR1620]
 but not in that commit's 
[patch|https://github.com/apache/cassandra/commit/2d451c68e280ea8ba2e5580ce780d1079e19dfd0.patch]!
 :mindblown: and scary!

> Bootstrap resume and TestReplaceAddress fixes
> -
>
> Key: CASSANDRA-15863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15863
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission, Test/dtest
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been 
> [broken|https://ci-cassandra.apache.org/job/Cassandra-trunk/159/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace/history/]
>  for ages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15861) Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-06-17 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138135#comment-17138135
 ] 

ZhaoYang commented on CASSANDRA-15861:
--

[~dcapwell] you are right. {{IndexSummary}} can definitely cause trouble for 
entire-sstable-streaming.. Then the only option we have is to apply first 
approach to {{IndexSummary}} because we can't make {{IndexSummary}} 
fixed-length encoding..

> Mutating sstable STATS metadata may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> ---
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> {{CassandraEntireSSTableStreamReader#read}}.
> {code}
> This isn't a problem in legacy streaming as STATS file length didn't matter.
> Ideally it will be great to make sstable