[jira] [Commented] (CASSANDRA-15833) Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657

2020-08-11 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175955#comment-17175955
 ] 

Jordan West commented on CASSANDRA-15833:
-

Thanks [~jlewandowski]. I've taken bits of both yours and my patches and pushed 
them [here for 3.11|https://github.com/jrwest/cassandra/tree/jwest/15833-3.11] 
and [here for 
trunk|https://github.com/jrwest/cassandra/tree/jwest/15833-trunk]. I've 
included your test fixes from CASSANDRA-15946 since its not yet merged. The 
differences between this patch and the original patch are:
* The patch included here addresses the case where the 3.0 node is the 
coordinator. This is why there is an additional change in 
{{ColumnFilter.Serializer#deserialize}}
* No change to {{ColumnFilter#selection(CFMetadata, PartitionColumns)}} in 
3.11. As far as I could tell this method was only used in testing and that 
testing broke when fixing deserialization from 3.0 nodes. 
* It does not include the {{ColumnFilter#selection(TableMetadata, 
RegularAndStaticColumns)}} change either. This method does seem to be used in 
CAS but that doesn't seem to be related to the failure here -- there might be a 
separate issue with CAS however. I was curious if you hit this specifically or 
what motivated that change in your original patch? 
* To fix {{Gossiper#haveAnyMajorVersion3Nodes}}, I modified there check to 
abort if it detects the race condition with the updated gossip state. This 
fixes the issue where the method returns true when there are older nodes in the 
cluster. I did not modify the 3.11 version, {{Gossiper#isAnyNode30}}, because 
the window where its wrong is very very small and shouldn't be material in 
practice (testing shows that it settles before the node takes traffic).

Test runs are here: 
[3.11|https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F15833-3.11]
 [trunk | 
https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F15833-trunk]



> Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657
> 
>
> Key: CASSANDRA-15833
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15833
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: CASSANDRA-15833-3.11.patch, CASSANDRA-15833-4.0.patch
>
>
> CASSANDRA-10657 introduced changes in how the ColumnFilter is interpreted. 
> This results in digest mismatch when querying incomplete set of columns from 
> a table with consistency that requires reaching instances running pre 
> CASSANDRA-10657 from nodes that include CASSANDRA-10657 (it was introduced in 
> Cassandra 3.4). 
> The fix is to bring back the previous behaviour until there are no instances 
> running pre CASSANDRA-10657 version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16045) dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails

2020-08-11 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175894#comment-17175894
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16045:
-

dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl - also started 
failing for similar reasons; to be checked too

If it is not related - please open a separate ticket

> dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl  fails
> --
>
> Key: CASSANDRA-16045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16045
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Ekaterina Dimitrova
>Priority: Normal
>
> dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails on 
> trunk:
>  
> https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/10/label=cassandra,split=25/testReport/junit/dtest.cql_tracing_test/TestCqlTracing/test_tracing_default_impl/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15990) Running CQL command with non-ASCII values raises UnicodeDecodeError

2020-08-11 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175895#comment-17175895
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15990:
-

Thanks [~dcapwell].

I just went through the CI results.

In summary:
 * trunk:  
 ** dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails - 
unrelated; ticket created - CASSANDRA-16045
 ** dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl - I would 
check this with the previous one as they seem related to me; added a comment to 
CASSANDRA-16045
 ** dtest.cql_tracing_test.TestCqlTracing.test_tracing_simple - the report 
actually says this is fixed
 * 3.11 - 
dtest.materialized_views_test.TestMaterializedViews.test_view_metadata_cleanup 
- not related
 * 3.0 - no failures
 * 2.2 - there are cqlsh tests failing; the reports say it is regression but 
according the latest Jenkins reports this is not the case. The tests fail 
consistently because of config issue (_Invalid yaml. Please remove properties 
[enable_scripted_user_defined_functions] from your cassandra.yaml_) - 
CASSANDRA-15985

> Running CQL command with non-ASCII values raises UnicodeDecodeError
> ---
>
> Key: CASSANDRA-15990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15990
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: Joseph Chu
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> There are INSERT statements that contains non-ASCII values that have run fine 
> in Cassandra 3.11, but now raises a UnicodeDecodeError when I try executing 
> them in 4.0-alpha4 and 4.0-beta1. 
> Example input and output:
> {code:java}
> echo $LANG
> en_US.UTF-8
> $ cqlsh --debug
> Using CQL driver:  '/usr/share/cassandra/bin/../lib/cassandra-driver-internal-only-3.23.0.post0-1a184b99.zip/cassandra-driver-3.23.0.post0-1a184b99/cassandra/__init__.py'>
> Using connect timeout: 5 seconds
> Using 'utf-8' encoding
> Using ssl: False
> Connected to Cassandra Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 4.0-beta1 | CQL spec 3.4.5 | Native protocol v4]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE killr_video WITH replication = {'class': 
> 'NetworkTopologyStrategy', 'DC-Houston': 1};
> cqlsh> USE killr_video;
> cqlsh:killr_video> CREATE TABLE movies_by_genre ( genre TEXT, title TEXT, 
> year INT, duration INT, avg_rating FLOAT, country TEXT, PRIMARY KEY ((genre), 
> title, year));
> cqlsh:killr_video> INSERT INTO movies_by_genre (genre, title, year, duration, 
> avg_rating, country)
>  ... VALUES ('Action', 'The Extraordinary Adventures of Adèle Blanc-Sec', 
> 2010, 107, 6.30, 'France');
> Traceback (most recent call last):
>  File "/usr/share/cassandra/bin/cqlsh.py", line 937, in onecmd
>  self.handle_statement(st, statementtext)
>  File "/usr/share/cassandra/bin/cqlsh.py", line 962, in handle_statement
>  readline.add_history(new_hist)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 
> 134: ordinal not in range(128){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16045) dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails

2020-08-11 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16045:

 Bug Category: Parent values: Degradation(12984)
   Complexity: Normal
  Component/s: Test/dtest
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl  fails
> --
>
> Key: CASSANDRA-16045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16045
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Ekaterina Dimitrova
>Priority: Normal
>
> dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails on 
> trunk:
>  
> https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/10/label=cassandra,split=25/testReport/junit/dtest.cql_tracing_test/TestCqlTracing/test_tracing_default_impl/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16045) dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails

2020-08-11 Thread Ekaterina Dimitrova (Jira)
Ekaterina Dimitrova created CASSANDRA-16045:
---

 Summary: 
dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl  fails
 Key: CASSANDRA-16045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16045
 Project: Cassandra
  Issue Type: Bug
Reporter: Ekaterina Dimitrova


dtest.cql_tracing_test.TestCqlTracing.test_tracing_default_impl fails on trunk:

 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/10/label=cassandra,split=25/testReport/junit/dtest.cql_tracing_test/TestCqlTracing/test_tracing_default_impl/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15993) Fix flaky python dtest test_view_metadata_cleanup - materialized_views_test.TestMaterializedViews

2020-08-11 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175890#comment-17175890
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-15993 at 8/11/20, 11:33 PM:


Fails also in 3.11 - 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/11/label=cassandra,split=16/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_view_metadata_cleanup/

 


was (Author: e.dimitrova):
Fails also in [3.11 | 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/11/label=cassandra,split=16/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_view_metadata_cleanup/]

 

> Fix flaky python dtest test_view_metadata_cleanup - 
> materialized_views_test.TestMaterializedViews
> -
>
> Key: CASSANDRA-15993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> E   cassandra.OperationTimedOut: errors={'127.0.0.2': 'Client request 
> timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.2
> cassandra/cluster.py:4026: OperationTimedOut
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15993) Fix flaky python dtest test_view_metadata_cleanup - materialized_views_test.TestMaterializedViews

2020-08-11 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175890#comment-17175890
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15993:
-

Fails also in [3.11 | 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/11/label=cassandra,split=16/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/test_view_metadata_cleanup/]

 

> Fix flaky python dtest test_view_metadata_cleanup - 
> materialized_views_test.TestMaterializedViews
> -
>
> Key: CASSANDRA-15993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> E   cassandra.OperationTimedOut: errors={'127.0.0.2': 'Client request 
> timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.2
> cassandra/cluster.py:4026: OperationTimedOut
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15976) Incorrect parsing of the timestamp with less than 3 digits in the milliseconds

2020-08-11 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175849#comment-17175849
 ] 

Adam Holmberg edited comment on CASSANDRA-15976 at 8/11/20, 9:32 PM:
-

[potential 
solution|https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]

I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.


was (Author: aholmber):
[potential solution
|https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]
 I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
 The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.

> Incorrect parsing of the timestamp with less than 3 digits in the milliseconds
> --
>
> Key: CASSANDRA-15976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Alex Ott
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> (tested on 4.0-beta1, but should be in all versions)
> Right now, Cassandra incorrectly handles timestamps with less than 3 digits 
> in the milliseconds part.  Timestamps (valid from the Java point of view (see 
> below output in Scala) are either rejected (if we have 1 digit only), or 
> incorrectly parsed when 2 digits are specified:
> {noformat}
> cqlsh> create table test.tm (id int primary key, tm timestamp);
> cqlsh> insert into test.tm(id, tm) values (2, '2020-07-24T10:00:01.2Z');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Unable 
> to coerce '2020-07-24T10:00:01.2Z' to a formatted date (long)"
> cqlsh> insert into test.tm(id, tm) values (1, '2020-07-24T10:00:01.12Z');
> cqlsh> select * from test.tm;
>  id | tm
> +-
>   1 | 2020-07-24 10:00:01.012000+
> (1 rows)
> {noformat}
> Checking with Instant:
> {noformat}
> scala> java.time.Instant.parse("2020-07-24T10:00:01.12Z")
> res0: java.time.Instant = 2020-07-24T10:00:01.120Z
> scala> java.time.Instant.parse("2020-07-24T10:00:01.2Z")
> res1: java.time.Instant = 2020-07-24T10:00:01.200Z
> {noformat}
> Imho it should be fixed (Cc: [~aholmber])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15976) Incorrect parsing of the timestamp with less than 3 digits in the milliseconds

2020-08-11 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175849#comment-17175849
 ] 

Adam Holmberg edited comment on CASSANDRA-15976 at 8/11/20, 9:31 PM:
-

[potential solution
|https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]
 I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
 The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.


was (Author: aholmber):
[potential 
solution|https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]

[link title|http://example.com]
 I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
 The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.

> Incorrect parsing of the timestamp with less than 3 digits in the milliseconds
> --
>
> Key: CASSANDRA-15976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Alex Ott
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> (tested on 4.0-beta1, but should be in all versions)
> Right now, Cassandra incorrectly handles timestamps with less than 3 digits 
> in the milliseconds part.  Timestamps (valid from the Java point of view (see 
> below output in Scala) are either rejected (if we have 1 digit only), or 
> incorrectly parsed when 2 digits are specified:
> {noformat}
> cqlsh> create table test.tm (id int primary key, tm timestamp);
> cqlsh> insert into test.tm(id, tm) values (2, '2020-07-24T10:00:01.2Z');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Unable 
> to coerce '2020-07-24T10:00:01.2Z' to a formatted date (long)"
> cqlsh> insert into test.tm(id, tm) values (1, '2020-07-24T10:00:01.12Z');
> cqlsh> select * from test.tm;
>  id | tm
> +-
>   1 | 2020-07-24 10:00:01.012000+
> (1 rows)
> {noformat}
> Checking with Instant:
> {noformat}
> scala> java.time.Instant.parse("2020-07-24T10:00:01.12Z")
> res0: java.time.Instant = 2020-07-24T10:00:01.120Z
> scala> java.time.Instant.parse("2020-07-24T10:00:01.2Z")
> res1: java.time.Instant = 2020-07-24T10:00:01.200Z
> {noformat}
> Imho it should be fixed (Cc: [~aholmber])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15976) Incorrect parsing of the timestamp with less than 3 digits in the milliseconds

2020-08-11 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175849#comment-17175849
 ] 

Adam Holmberg edited comment on CASSANDRA-15976 at 8/11/20, 9:30 PM:
-

[potential 
solution|https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]

[link title|http://example.com]
 I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
 The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.


was (Author: aholmber):
[potential 
solution|[https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]]
I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.

> Incorrect parsing of the timestamp with less than 3 digits in the milliseconds
> --
>
> Key: CASSANDRA-15976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Alex Ott
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> (tested on 4.0-beta1, but should be in all versions)
> Right now, Cassandra incorrectly handles timestamps with less than 3 digits 
> in the milliseconds part.  Timestamps (valid from the Java point of view (see 
> below output in Scala) are either rejected (if we have 1 digit only), or 
> incorrectly parsed when 2 digits are specified:
> {noformat}
> cqlsh> create table test.tm (id int primary key, tm timestamp);
> cqlsh> insert into test.tm(id, tm) values (2, '2020-07-24T10:00:01.2Z');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Unable 
> to coerce '2020-07-24T10:00:01.2Z' to a formatted date (long)"
> cqlsh> insert into test.tm(id, tm) values (1, '2020-07-24T10:00:01.12Z');
> cqlsh> select * from test.tm;
>  id | tm
> +-
>   1 | 2020-07-24 10:00:01.012000+
> (1 rows)
> {noformat}
> Checking with Instant:
> {noformat}
> scala> java.time.Instant.parse("2020-07-24T10:00:01.12Z")
> res0: java.time.Instant = 2020-07-24T10:00:01.120Z
> scala> java.time.Instant.parse("2020-07-24T10:00:01.2Z")
> res1: java.time.Instant = 2020-07-24T10:00:01.200Z
> {noformat}
> Imho it should be fixed (Cc: [~aholmber])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15976) Incorrect parsing of the timestamp with less than 3 digits in the milliseconds

2020-08-11 Thread Adam Holmberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holmberg updated CASSANDRA-15976:
--
Test and Documentation Plan: 
Unit tests were expanded and made more rigorous.

 

Doc: may need to visit CQL timestamp discussion and see if there is any mention 
of this limitation.
 Status: Patch Available  (was: In Progress)

[potential 
solution|[https://github.com/apache/cassandra/compare/trunk...aholmberg:CASSANDRA-15976]]
I first expanded unit tests to characterize existing parsing, and also to 
actually validate parsed values.
The patch introduces DateTimeFormatter for parsing internally. I verified with 
some basic local microbenchmarks that this parsing is also much faster – up to 
10x depending on the inputs.

> Incorrect parsing of the timestamp with less than 3 digits in the milliseconds
> --
>
> Key: CASSANDRA-15976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Alex Ott
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> (tested on 4.0-beta1, but should be in all versions)
> Right now, Cassandra incorrectly handles timestamps with less than 3 digits 
> in the milliseconds part.  Timestamps (valid from the Java point of view (see 
> below output in Scala) are either rejected (if we have 1 digit only), or 
> incorrectly parsed when 2 digits are specified:
> {noformat}
> cqlsh> create table test.tm (id int primary key, tm timestamp);
> cqlsh> insert into test.tm(id, tm) values (2, '2020-07-24T10:00:01.2Z');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Unable 
> to coerce '2020-07-24T10:00:01.2Z' to a formatted date (long)"
> cqlsh> insert into test.tm(id, tm) values (1, '2020-07-24T10:00:01.12Z');
> cqlsh> select * from test.tm;
>  id | tm
> +-
>   1 | 2020-07-24 10:00:01.012000+
> (1 rows)
> {noformat}
> Checking with Instant:
> {noformat}
> scala> java.time.Instant.parse("2020-07-24T10:00:01.12Z")
> res0: java.time.Instant = 2020-07-24T10:00:01.120Z
> scala> java.time.Instant.parse("2020-07-24T10:00:01.2Z")
> res1: java.time.Instant = 2020-07-24T10:00:01.200Z
> {noformat}
> Imho it should be fixed (Cc: [~aholmber])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15976) Incorrect parsing of the timestamp with less than 3 digits in the milliseconds

2020-08-11 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175843#comment-17175843
 ] 

Adam Holmberg commented on CASSANDRA-15976:
---

The the bug seems to be with DateUtils.parseStrictly 
[here|[https://github.com/apache/cassandra/blob/54ebb19720225e176bc93e6dbc9e8943fa5e3bfc/src/java/org/apache/cassandra/serializers/TimestampSerializer.java#L159]].

There is no format pattern for single digit precision, which is why parsing 
fails. Confusingly there is also no pattern for two digits, but parsing 
succeeds with the [three digit 
pattern|[https://github.com/apache/cassandra/blob/54ebb19720225e176bc93e6dbc9e8943fa5e3bfc/src/java/org/apache/cassandra/serializers/TimestampSerializer.java#L79]]

Even with one- and two-digit patterns added, the parsing "succeeds" but comes 
back with the incorrect value as shown in the description.

> Incorrect parsing of the timestamp with less than 3 digits in the milliseconds
> --
>
> Key: CASSANDRA-15976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Alex Ott
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> (tested on 4.0-beta1, but should be in all versions)
> Right now, Cassandra incorrectly handles timestamps with less than 3 digits 
> in the milliseconds part.  Timestamps (valid from the Java point of view (see 
> below output in Scala) are either rejected (if we have 1 digit only), or 
> incorrectly parsed when 2 digits are specified:
> {noformat}
> cqlsh> create table test.tm (id int primary key, tm timestamp);
> cqlsh> insert into test.tm(id, tm) values (2, '2020-07-24T10:00:01.2Z');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Unable 
> to coerce '2020-07-24T10:00:01.2Z' to a formatted date (long)"
> cqlsh> insert into test.tm(id, tm) values (1, '2020-07-24T10:00:01.12Z');
> cqlsh> select * from test.tm;
>  id | tm
> +-
>   1 | 2020-07-24 10:00:01.012000+
> (1 rows)
> {noformat}
> Checking with Instant:
> {noformat}
> scala> java.time.Instant.parse("2020-07-24T10:00:01.12Z")
> res0: java.time.Instant = 2020-07-24T10:00:01.120Z
> scala> java.time.Instant.parse("2020-07-24T10:00:01.2Z")
> res1: java.time.Instant = 2020-07-24T10:00:01.200Z
> {noformat}
> Imho it should be fixed (Cc: [~aholmber])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15976) Incorrect parsing of the timestamp with less than 3 digits in the milliseconds

2020-08-11 Thread Adam Holmberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holmberg updated CASSANDRA-15976:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: API / 
Semantic Implementation(12988)
   Complexity: Normal
  Component/s: CQL/Interpreter
Discovered By: User Report
Fix Version/s: 4.0-beta2
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Incorrect parsing of the timestamp with less than 3 digits in the milliseconds
> --
>
> Key: CASSANDRA-15976
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15976
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Alex Ott
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> (tested on 4.0-beta1, but should be in all versions)
> Right now, Cassandra incorrectly handles timestamps with less than 3 digits 
> in the milliseconds part.  Timestamps (valid from the Java point of view (see 
> below output in Scala) are either rejected (if we have 1 digit only), or 
> incorrectly parsed when 2 digits are specified:
> {noformat}
> cqlsh> create table test.tm (id int primary key, tm timestamp);
> cqlsh> insert into test.tm(id, tm) values (2, '2020-07-24T10:00:01.2Z');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Unable 
> to coerce '2020-07-24T10:00:01.2Z' to a formatted date (long)"
> cqlsh> insert into test.tm(id, tm) values (1, '2020-07-24T10:00:01.12Z');
> cqlsh> select * from test.tm;
>  id | tm
> +-
>   1 | 2020-07-24 10:00:01.012000+
> (1 rows)
> {noformat}
> Checking with Instant:
> {noformat}
> scala> java.time.Instant.parse("2020-07-24T10:00:01.12Z")
> res0: java.time.Instant = 2020-07-24T10:00:01.120Z
> scala> java.time.Instant.parse("2020-07-24T10:00:01.2Z")
> res1: java.time.Instant = 2020-07-24T10:00:01.200Z
> {noformat}
> Imho it should be fixed (Cc: [~aholmber])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-08-11 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15861:

Status: Review In Progress  (was: Changes Suggested)

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> {{CassandraEntireSSTableStreamReader#read}}.
> {code}
> Currently, entire-sstable-streaming requires sstable components to be 
> immutable, because \{{ComponentManifest}}
> with component sizes are sent before sending actual files. This isn't a 
> problem in legacy streaming as STATS file length didn't matter.
>  
> Ideally it will be great to make sstable STATS metadata immutable, just like 
> other

[jira] [Commented] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default

2020-08-11 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175742#comment-17175742
 ] 

David Capwell commented on CASSANDRA-16036:
---

In summary of what I see

1) if cluster has no writes and only does reads, there is a small advantage 
2) if the cluster is doing compactions, there are no advantages

> Add flag to disable chunk cache and disable by default
> --
>
> Key: CASSANDRA-16036
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16036
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Histogram-11.png, Histogram-15.png, latency_selects.png
>
>
> Chunk cache is enabled by default and doesn’t have a flag to disable without 
> impacting networking.  In performance testing 4.0 against 3.0 I found that 
> reads were slower in 4.0 and after profiling found that the ChunkCache was 
> partially to blame; after disabling the chunk cache, read performance had 
> improved.
> {code}
> 40_w_cc-selects.hdr
> #[Mean= 11.50063, StdDeviation   = 13.44014]
> #[Max =482.41254, Total count=   316477]
> #[Buckets =   25, SubBuckets =   262144]
> 40_wo_cc-selects.hdr
> #[Mean=  9.82115, StdDeviation   = 10.14270]
> #[Max =522.36493, Total count=   317444]
> #[Buckets =   25, SubBuckets =   262144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default

2020-08-11 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175741#comment-17175741
 ] 

David Capwell commented on CASSANDRA-16036:
---

Added a read only workload; select = with chunk cache, select-2 is without 
chunk cache.  What I see is up to 99% they are equal, after with chunk cache 
has an advantage, then at 99.99 without chunk cache has the advantage.

{code}
selects.hdr
#[Mean= 49.88099, StdDeviation   = 37.74398]
#[Max =   1826.77504, Total count=  2880010]
#[Buckets =   25, SubBuckets =   262144]
selects-2.hdr
#[Mean= 51.66994, StdDeviation   = 44.94572]
#[Max =   1810.85798, Total count=  2880036]
#[Buckets =   25, SubBuckets =   262144]
{code}

The mean has a diff of about 1ms.

> Add flag to disable chunk cache and disable by default
> --
>
> Key: CASSANDRA-16036
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16036
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Histogram-11.png, Histogram-15.png, latency_selects.png
>
>
> Chunk cache is enabled by default and doesn’t have a flag to disable without 
> impacting networking.  In performance testing 4.0 against 3.0 I found that 
> reads were slower in 4.0 and after profiling found that the ChunkCache was 
> partially to blame; after disabling the chunk cache, read performance had 
> improved.
> {code}
> 40_w_cc-selects.hdr
> #[Mean= 11.50063, StdDeviation   = 13.44014]
> #[Max =482.41254, Total count=   316477]
> #[Buckets =   25, SubBuckets =   262144]
> 40_wo_cc-selects.hdr
> #[Mean=  9.82115, StdDeviation   = 10.14270]
> #[Max =522.36493, Total count=   317444]
> #[Buckets =   25, SubBuckets =   262144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default

2020-08-11 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16036:
--
Attachment: Histogram-15.png

> Add flag to disable chunk cache and disable by default
> --
>
> Key: CASSANDRA-16036
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16036
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Histogram-11.png, Histogram-15.png, latency_selects.png
>
>
> Chunk cache is enabled by default and doesn’t have a flag to disable without 
> impacting networking.  In performance testing 4.0 against 3.0 I found that 
> reads were slower in 4.0 and after profiling found that the ChunkCache was 
> partially to blame; after disabling the chunk cache, read performance had 
> improved.
> {code}
> 40_w_cc-selects.hdr
> #[Mean= 11.50063, StdDeviation   = 13.44014]
> #[Max =482.41254, Total count=   316477]
> #[Buckets =   25, SubBuckets =   262144]
> 40_wo_cc-selects.hdr
> #[Mean=  9.82115, StdDeviation   = 10.14270]
> #[Max =522.36493, Total count=   317444]
> #[Buckets =   25, SubBuckets =   262144]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-11 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175656#comment-17175656
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

bq. I'm sorry, if my objections sound harsh. But my point is that it's better 
to fix the whole heap-pressure-nightmare with (de)serialization 
(reads/writes/re-serializations/compaction/etc) in the next major release.

It’s fine, we should be talking about these things. I disagree with the idea 
that we should delay improving compaction allocations because we could 
implement a better solution at some point in the future. It's a textbook 
example of “letting perfect be the enemy of good enough”. The C* project has a 
problem with favoring rewrites in favor of incremental improvements. Compaction 
heap pressure is a real operational problem that causes a lot of headaches, and 
there is real value in mitigating it as part of 4.0, even if it can be further 
improved in the future. 

There's certainly risk here, but I think it's being a bit overstated. The 
changes here are wide, but not particularly deep. The most complex parts are 
probably the collection serializers and other places where we're now having to 
do offset bookkeeping. These should be carefully reviewed, but they're hardly 
unverifiable.

bq. Unrelatedly,  [Blake 
Eggleston](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bdeggleston)
 , is there a reason you didn't go the whole hog and just get rid of the 
ByteBuffer versions of everything?

1) That would have been a much larger change, and I wanted to limit scope. 
IIRC, replacing partition keys would have been a lot of work for a 
comparatively small gc win.
2) bytebuffers are still useful in some places. Specifically in places where 
we're using allocators
3) I've been working on a flyweight reader in my free time that reduces another 
95% of garbage and uses bytebuffers. This will be ready for 4.next, but using 
it should be optional.
4) There's value in decoupling data format from data logic. For instance, this 
would allow us to compare native and bytebuffer values without requiring the 
allocation of an intermediate buffer.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16033) test_resume_stopped_build - materialized_views_test.TestMaterializedViews

2020-08-11 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova reassigned CASSANDRA-16033:
---

Assignee: Ekaterina Dimitrova

> test_resume_stopped_build - materialized_views_test.TestMaterializedViews
> -
>
> Key: CASSANDRA-16033
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16033
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Failing in CircleCI [here | 
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/295/workflows/168d88ab-f55f-4560-a23e-8243aff7b1bd/jobs/1774]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16030) TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address failing on trunk

2020-08-11 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175619#comment-17175619
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16030:
-

Thank you both!

> TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address
>  failing on trunk
> -
>
> Key: CASSANDRA-16030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Caleb Rackliffe
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> {{TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address}}
>  seems to think that the new replacement node is running when it shouldn’t 
> be. We’ve got two confirmed failures, one on trunk and one on a recent 
> trunk-based patch:
> https://ci-cassandra.apache.org/job/Cassandra-trunk/259/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/
> https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/1fa1cbcf-820d-438f-97bd-91e20a50baba/jobs/604



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16030) TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address failing on trunk

2020-08-11 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16030:
-
Status: Ready to Commit  (was: Review In Progress)

> TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address
>  failing on trunk
> -
>
> Key: CASSANDRA-16030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Caleb Rackliffe
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> {{TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address}}
>  seems to think that the new replacement node is running when it shouldn’t 
> be. We’ve got two confirmed failures, one on trunk and one on a recent 
> trunk-based patch:
> https://ci-cassandra.apache.org/job/Cassandra-trunk/259/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/
> https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/1fa1cbcf-820d-438f-97bd-91e20a50baba/jobs/604



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16030) TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address failing on trunk

2020-08-11 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16030:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams
   Status: Review In Progress  (was: Patch Available)

> TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address
>  failing on trunk
> -
>
> Key: CASSANDRA-16030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Caleb Rackliffe
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> {{TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address}}
>  seems to think that the new replacement node is running when it shouldn’t 
> be. We’ve got two confirmed failures, one on trunk and one on a recent 
> trunk-based patch:
> https://ci-cassandra.apache.org/job/Cassandra-trunk/259/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/
> https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/1fa1cbcf-820d-438f-97bd-91e20a50baba/jobs/604



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16030) TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address failing on trunk

2020-08-11 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16030:
-
  Fix Version/s: (was: 4.0-beta)
 4.0-beta2
  Since Version: 4.0-beta1
Source Control Link: 
https://github.com/apache/cassandra-dtest/commit/59ca5090b028956ba609fbd7e37e638dfc40a451
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed.

> TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address
>  failing on trunk
> -
>
> Key: CASSANDRA-16030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Caleb Rackliffe
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta2
>
>
> {{TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address}}
>  seems to think that the new replacement node is running when it shouldn’t 
> be. We’ve got two confirmed failures, one on trunk and one on a recent 
> trunk-based patch:
> https://ci-cassandra.apache.org/job/Cassandra-trunk/259/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/
> https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/1fa1cbcf-820d-438f-97bd-91e20a50baba/jobs/604



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16030) TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address failing on trunk

2020-08-11 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16030:
-
Authors: Stefan Miklosovic  (was: Ekaterina Dimitrova)
Test and Documentation Plan: it is a test.
 Status: Patch Available  (was: In Progress)

> TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address
>  failing on trunk
> -
>
> Key: CASSANDRA-16030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16030
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Caleb Rackliffe
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> {{TestBootstrap::test_node_cannot_join_as_hibernating_node_without_replace_address}}
>  seems to think that the new replacement node is running when it shouldn’t 
> be. We’ve got two confirmed failures, one on trunk and one on a recent 
> trunk-based patch:
> https://ci-cassandra.apache.org/job/Cassandra-trunk/259/testReport/dtest-novnode.bootstrap_test/TestBootstrap/test_node_cannot_join_as_hibernating_node_without_replace_address/
> https://app.circleci.com/pipelines/github/adelapena/cassandra/79/workflows/1fa1cbcf-820d-438f-97bd-91e20a50baba/jobs/604



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch master updated: increase timeout for CASSANDRA-14559 to get rid of flakiness

2020-08-11 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new 59ca509  increase timeout for CASSANDRA-14559 to get rid of flakiness
59ca509 is described below

commit 59ca5090b028956ba609fbd7e37e638dfc40a451
Author: Stefan Miklosovic 
AuthorDate: Thu Aug 6 12:02:25 2020 +0200

increase timeout for CASSANDRA-14559 to get rid of flakiness

Patch by Stefan Miklosovic, reviewed by brandonwilliams for
CASSANDRA-16030
---
 bootstrap_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bootstrap_test.py b/bootstrap_test.py
index e0decb6..e4cefba 100644
--- a/bootstrap_test.py
+++ b/bootstrap_test.py
@@ -734,7 +734,7 @@ class TestBootstrap(Tester):
 self.assert_log_had_msg(blind_replacement_node, "A node with the same 
IP in hibernate status was detected", timeout=60)
 # Waiting two seconds to give node a chance to stop in case above 
assertion is True.
 # When this happens cassandra may not shut down fast enough and the 
below assertion fails.
-time.sleep(2)
+time.sleep(15)
 # Asserting that then new node is not running.
 # This tests the actual expected state as opposed to just checking for 
the existance of the above error message.
 assert not blind_replacement_node.is_running()


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16003) ToolRunner added in CASSANDRA-15942 stdErr checks improvements

2020-08-11 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175615#comment-17175615
 ] 

Berenguer Blasi commented on CASSANDRA-16003:
-

#justfyi we didn't think of adding this test :shrug:  
[https://github.com/apache/cassandra/pull/704/files#diff-ff6938d8ab4ea304113d733d53f84e10R41]
 At least we'll add it there :)

> ToolRunner added in CASSANDRA-15942 stdErr checks improvements
> --
>
> Key: CASSANDRA-16003
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16003
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The JVM will output to stderr on some environments to show what flags that 
> were picked up, when this happens all tests which validate stderr start to 
> fail.  This was found in the org.apache.cassandra.tools.ClearSnapshotTest as 
> it switched to use the ToolRunner; below is a sample failure on my laptop (I 
> had to modify the asserts since they don’t include the input)
> {code}
> java.lang.AssertionError: 
> Expecting empty but was:<"Picked up _JAVA_OPTIONS: 
> -Djava.net.preferIPv4Stack=true
> ">
>   at 
> org.apache.cassandra.tools.ToolRunner.assertEmptyStdErr(ToolRunner.java:339)
>   at 
> org.apache.cassandra.tools.ToolRunner.waitAndAssertOnCleanExit(ToolRunner.java:334)
>   at 
> org.apache.cassandra.tools.ClearSnapshotTest.testClearSnapshot_RemoveMultiple(ClearSnapshotTest.java:91)
> {code}
> Here _JAVA_OPTIONS is used globally on my system so fails the test; there is 
> also JAVA_TOOL_RUNNER which is used the same way.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16012) sstablesplit unit test hardening

2020-08-11 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-16012:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> sstablesplit unit test hardening
> 
>
> Key: CASSANDRA-16012
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16012
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/sstable
>Reporter: Berenguer Blasi
>Priority: Normal
>  Labels: low-hanging-fruit
>
>  
> During CASSANDRA-15883 / CASSANDRA-15991 it was detected unit test coverage 
> for this tool is minimal. There is a unit test to enhance upon under 
> {{test/unit/org/apache/cassandra/tools}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16012) sstablesplit unit test hardening

2020-08-11 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi reassigned CASSANDRA-16012:
---

Assignee: Berenguer Blasi

> sstablesplit unit test hardening
> 
>
> Key: CASSANDRA-16012
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16012
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/sstable
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
>  Labels: low-hanging-fruit
> Fix For: 4.0-beta
>
>
>  
> During CASSANDRA-15883 / CASSANDRA-15991 it was detected unit test coverage 
> for this tool is minimal. There is a unit test to enhance upon under 
> {{test/unit/org/apache/cassandra/tools}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16012) sstablesplit unit test hardening

2020-08-11 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-16012:

Fix Version/s: 4.0-beta

> sstablesplit unit test hardening
> 
>
> Key: CASSANDRA-16012
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16012
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/sstable
>Reporter: Berenguer Blasi
>Priority: Normal
>  Labels: low-hanging-fruit
> Fix For: 4.0-beta
>
>
>  
> During CASSANDRA-15883 / CASSANDRA-15991 it was detected unit test coverage 
> for this tool is minimal. There is a unit test to enhance upon under 
> {{test/unit/org/apache/cassandra/tools}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15393) Add byte array backed cells

2020-08-11 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175217#comment-17175217
 ] 

Caleb Rackliffe edited comment on CASSANDRA-15393 at 8/11/20, 2:12 PM:
---

While we continue to hash out the more strategic discussion, here are the notes 
from my first pass at review:

(Note: These comments are based on things as they stand in [my 
rebase|https://github.com/apache/cassandra/pull/712].)

*Naming Ideas*

- {{ClusteringPrefix#getBuffer(int)}} -> {{bufferAt(int)}}
- {{ClusteringPrefix#getString(int)}} -> {{stringAt(int)}}
- {{ValueAccessor#toSafeBuffer()}} -> {{toMutableBuffer()}}

*Typing*

- It looks safe to throw {{@SafeVarargs}} on {{CompositeType#build()}}.
- {{TypeSerializer#validate()}} and {{AbstractType#writeValue()}} hide the 
class-level type parameter.
- Looks like {{AbstractType#decompose()}} is never used without 
{{ByteBufferAccessor}}? We could probably remove the type parameter and just 
make the {{ByteBuffer}} binding explicit.
- {{TypeSerializer#toCQLLiteral()}} is only used w/ {{ByteBuffer}}, so it 
doesn't look like it needs to be parameterized.
- {{ModificationStatement}} looks like it's dealing exclusively with 
{{ByteBuffer}}. Should the type parameters reflect that?
- Trying to propagate more typing information from 
{{ClusteringBoundOrBoundary.Serializer}} upward to its users for {{Slices}} and 
{{UnfilteredSerializer}} might help clarify some things. (i.e. Literally return 
{{ClusteringBoundOrBoundary}} from {{deserializeValues()}}. We could 
create an array-based version of TOP and BOTTOM.)
- The {{MultiCBuilder}} sub-classes seem to be used only w/ {{ByteBuffer}}. 
Make sure it's types are explicit about that?
- Assuming we don't find some creative ways to reduce the number of places 
where we need type 
  parameters, the following classes also need to be rounded out to avoid 
compile-time warnings:
-- {{BufferClusteringBoundOrBoundary}}, {{ClusteringPrefix}}, 
{{ClusteringBound}}, {{ClusteringBoundOrBoundary}}
-- {{Row}} (on Cell return types), {{Cell}}, {{Array}} / {{BufferCell}}, 
{{ColumnMetadata}}, {{BTreeRow}}, {{ComplexColumnData}}, {{Slice}}
-- {{CounterContext}} (ex. {{total()}})
-- {{RegularColumnIndex}}, {{IndexEntry}}, {{CollectionKeyIndex}}, 
{{CollectionKeyIndexBase}}, {{CassandraIndex}}
-- {{AbstractAllocator}}
-- {{CBuilder}} (and {{ArrayBackedBuilder}}...also {{ArrayBackedBuilder}} only 
builds {{BufferClustering}}? If so, that should flow up in 
   the type information to {{RegularColumnIndex}}, et al.)

*Safety Concerns*

- {{AbstractArrayClusteringPrefix}} references it's own sub-class in a static 
context...we might need to move the empty size to {{ArrayClustering}}?

*Code Organization*

- Would be nice to avoid the switch in {{ClusteringBoundary}} and 
{{ClusteringBound}}. If we know this only works with the {{ByteBuffer}} 
version, can we just cleanly separate the static factories from the interfaces?
- We might be able to factor our some common elements of {{ArrayCell}} and 
{{BufferCell}}.
- We might benefit in terms of usability/developer ergonomics if we push some 
capabilities of the accessor into {{Cell}}. (ex. methods like {{getLong()}}). 
Similar thing going on with {{ClusteringPrefix}}, where we could perhaps stop 
making calls like {{builder.add(prefix.get(i), prefix.accessor())}} and use 
{{bufferAt(i)}} in CP itself. I suppose {{ArrayBackedBuilder}} might need to 
support something like {{add(ClusteringPrefix, int)}} if we still want to do 
that lazily, after the {{isDone()}} check.
- {{AbstractType#writeValue()}} could be implemented in {{Cell}}, given the 
latter knows both its value and accessor already, and {{ColumnSpecification}} 
already knows the column type?

*TODOs and FIXMEs*

- Would {{NativeCell}} not having a specialized accessor be any worse than the 
existing codebase? (If not, I'd be okay with deferring it to another Jira...?)
- It looks like there are a few places where you wanted to avoid 
{{ClusteringPrefix#getBufferArray()}}. (ex. {{RangeTombstoneMarker.Merger}}). 
I'm not sure how many of those usages actually suffer from the wrapping 
overhead though. 
- {{validateCollectionMember}}, the logger, and {{componentsCount}} are unused 
in {{AbstractType}}.
- {{deserializeValuesWithoutSize()}} unused in {{ClusteringPrefix}} (and 
{{version}} is unused in the other overload?)
- Most of the static factories in {{ArrayClusteringBoundOrBoundary}} and 
{{ArrayClusteringBound}} are unused.
- {{UNSET_BYTE_ARRAY}}, {{getChar()}}, {{putBoolean()}}, {{putChar()}}, 
{{writeWithLength()}} are unused in {{ByteArrayUtil}}. What if we just fold 
these methods into {{ByteArrayAccessor}}.
- FIXME in {{AbstractClusteringPrefix}} ...modify {{Digest}} to take a 
value/accessor pair?
- Some items commented out near the top of {{BufferClusteringBound}}.
- TypeSerializer#serialize() and se

[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-08-11 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15861:
-
Test and Documentation Plan: 
https://circleci.com/workflow-run/610e8169-e60c-420b-a556-4120967db6cb  (was: 
https://circleci.com/workflow-run/9e2af3a1-7b63-423d-8cde-d2cd178c81d6)

> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. Before node1 actually sends the files to node3, node2 is killed and node1 
> starts to broadcast repair-failure-message to all participants in 
> {{CoordinatorSession#fail}}
> 4. Node1 receives its own repair-failure-message and fails its local repair 
> sessions at {{LocalSessions#failSession}} which triggers async background 
> compaction.
> 5. Node1's background compaction will mutate sstable's repairAt to 0 and 
> pending repair id to null via  
> {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more 
> in-progress repair.
> 6. Node1 actually sends the sstable to node3 where the sstable's STATS 
> component size is different from the original size recorded in the manifest.
> 7. At the end, node3 reports checksum validation failure when it tries to 
> mutate sstable level and "isTransient" attribute in 
> {{CassandraEntireSSTableStreamReader#read}}.
> {code}
> Currently, entire-sstable-streaming requires sstable components to be 
> immutable, because \{{ComponentManifest}}
> with component sizes are sent before sending actual files. This isn't a 
> problem in legacy streaming as STATS file

[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-08-11 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412
 ] 

ZhaoYang edited comment on CASSANDRA-15861 at 8/11/20, 10:48 AM:
-

bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

I think the same streaming plan id is used by different peers. It may fail to 
create hardlink when streaming the same sstables to different peers in the same 
stream plan. 

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)

The old approach was to synchronize entire streaming phase, so I didn't use 
"synchronized (tidy.global)" which may block concurrent compactions. 

But now only hard-link creation is synchronized, using "synchronized 
(tidy.global)" is better than introducing a new lock.



was (Author: jasonstack):
bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

+1

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)

The old approach was to synchronize entire streaming phase, so I didn't use 
"synchronized (tidy.global)" which may block concurrent compactions. 

But now only hard-link creation is synchronized, using "synchronized 
(tidy.global)" is better than introducing a new lock.


> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Stat

[jira] [Comment Edited] (CASSANDRA-15393) Add byte array backed cells

2020-08-11 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175390#comment-17175390
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15393 at 8/11/20, 10:36 AM:
---

{quote}This one compiles without errors just by omitting the generic type 
information: This one compiles without errors just by omitting the generic type 
information
{quote}
This seems like a problem that can be resolved - I agree that some of the type 
information isn't properly propagated in some places, but I suspect that might 
be because the patch was awaiting feedback on its general structure. (I don't 
personally think there is an issue with being able to invoke in a contrived 
example where we deliberately elide the type information, but we should be sure 
the code doesn't do that today)
{quote}Another source of errors is that positions & offsets are manually 
handled now.
{quote}
I am also not a _huge_ fan of this, but we have compromises to make. I don't 
think this is particularly error-prone, however, since we already do this a 
great deal - it's pretty integral to how our serialization/deserialization 
works, so this is not a new category of problem. That's not to say there isn't 
room for suggestions that might lead to improvement here, I don't know, but it 
doesn't seem at all disqualifying to me.

 

Unrelatedly, [~bdeggleston], is there a reason you didn't go the whole hog and 
just get rid of the {{ByteBuffer}} versions of everything?


was (Author: benedict):
{quote}This one compiles without errors just by omitting the generic type 
information: This one compiles without errors just by omitting the generic type 
information
{quote}
This seems like a problem that can be resolved - I agree that some of the type 
information isn't properly propagated in some places, but I suspect that might 
be because the patch was awaiting feedback on its general structure.
{quote}Another source of errors is that positions & offsets are manually 
handled now.
{quote}
I am also not a _huge_ fan of this, but we have compromises to make. I don't 
think this is particularly error-prone, however, since we already do this a 
great deal - it's pretty integral to how our serialization/deserialization 
works, so this is not a new category of problem. That's not to say there isn't 
room for suggestions that might lead to improvement here, I don't know, but it 
doesn't seem at all disqualifying to me.

 

Unrelatedly, [~bdeggleston], is there a reason you didn't go the whole hog and 
just get rid of the {{ByteBuffer}} versions of everything?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2020-08-11 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-15432:


Assignee: Sylvain Lebresne

> The "read defragmentation" optimization does not work
> -
>
> Key: CASSANDRA-15432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-08-11 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412
 ] 

ZhaoYang edited comment on CASSANDRA-15861 at 8/11/20, 9:47 AM:


bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

+1

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)

The old approach was to synchronized entire streaming phase, so I didn't use 
"synchronized (tidy.global)" which may block concurrent compactions. 

But now only hard-link creation is synchronized, using "synchronized 
(tidy.global)" is better than introducing a new lock.



was (Author: jasonstack):
bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

+1

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)




> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.muta

[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-08-11 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412
 ] 

ZhaoYang edited comment on CASSANDRA-15861 at 8/11/20, 9:47 AM:


bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

+1

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)

The old approach was to synchronize entire streaming phase, so I didn't use 
"synchronized (tidy.global)" which may block concurrent compactions. 

But now only hard-link creation is synchronized, using "synchronized 
(tidy.global)" is better than introducing a new lock.



was (Author: jasonstack):
bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

+1

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)

The old approach was to synchronized entire streaming phase, so I didn't use 
"synchronized (tidy.global)" which may block concurrent compactions. 

But now only hard-link creation is synchronized, using "synchronized 
(tidy.global)" is better than introducing a new lock.


> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io

[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure

2020-08-11 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412
 ] 

ZhaoYang commented on CASSANDRA-15861:
--

bq. 1) Orphaned hard links need to be cleaned up on startup.

If the hard links end with `.tmp`, they will be cleaned up on startup by 
{{StartupChecks#checkSystemKeyspaceState}}

bq. 2) Using the streaming session id for the hard link name, instead of a time 
uuid, would make debugging some issues easier.

+1

bq. We could leave ComponentManifest the way it was before this patch and have 
a separate class, let's call it ComponentContext, that embeds it.

+1

bq. In this case, if you could guarantee that no more than 1 index resample can 
happen at once for a given sstable, the only thing you'd need to synchronize in 
`cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could 
just synchronize hard link creation on `tidy.global`, instead of introducing a 
new lock.

Agreed with caleb, no more than 1 index resample can happen concurrently for a 
given sstable as sstable is marked as compacting before resampling.

bq. That leaves indexSummary, which perhaps we cold make volatile, and all the 
state used in cloneAndReplace()...but we could just extend the synchronized 
(tidy.global) block to include the latter. Nothing expensive happens inside 
cloneAndReplace(), AFAICT.

good idea

bq. synchronized (tidy.global)




> Mutating sstable component may race with entire-sstable-streaming(ZCS) 
> causing checksum validation failure
> --
>
> Key: CASSANDRA-15861
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15861
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming, 
> Local/Compaction
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Flaky dtest: [test_dead_sync_initiator - 
> repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/]
> {code:java|title=stacktrace}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 
> CassandraEntireSSTableStreamReader.java:145 - [Stream 
> 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream 
> for table = keyspace1.standard1
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226)
>   at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140)
>   at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
>   at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49)
>   at 
> org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Checksums do not match for 
> /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db
> {code}
>  
> In the above test, it executes "nodetool repair" on node1 and kills node2 
> during repair. At the end, node3 reports checksum validation failure on 
> sstable transferred from node1.
> {code:java|title=what happened}
> 1. When repair started on node1, it performs anti-compaction which modifies 
> sstable's repairAt to 0 and pending repair id to session-id.
> 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be 
> transferred to node3.
> 3. 

[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2020-08-11 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175390#comment-17175390
 ] 

Benedict Elliott Smith commented on CASSANDRA-15393:


{quote}This one compiles without errors just by omitting the generic type 
information: This one compiles without errors just by omitting the generic type 
information
{quote}
This seems like a problem that can be resolved - I agree that some of the type 
information isn't properly propagated in some places, but I suspect that might 
be because the patch was awaiting feedback on its general structure.
{quote}Another source of errors is that positions & offsets are manually 
handled now.
{quote}
I am also not a _huge_ fan of this, but we have compromises to make. I don't 
think this is particularly error-prone, however, since we already do this a 
great deal - it's pretty integral to how our serialization/deserialization 
works, so this is not a new category of problem. That's not to say there isn't 
room for suggestions that might lead to improvement here, I don't know, but it 
doesn't seem at all disqualifying to me.

 

Unrelatedly, [~bdeggleston], is there a reason you didn't go the whole hog and 
just get rid of the {{ByteBuffer}} versions of everything?

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16044) Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or RangeAwaredCompaction

2020-08-11 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-16044:
-
Fix Version/s: 4.x

> Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or 
> RangeAwaredCompaction
> 
>
> Key: CASSANDRA-16044
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16044
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SASI
>Reporter: ZhaoYang
>Priority: Normal
> Fix For: 4.x
>
>
> Currently SASI searches all SSTable indexes that may include the query 
> partition key and indexed term, but this will cause large IO overhead with 
> range index query (ie. age > 18) when sstable count is huge.
> Proposed improvement: query sstable indexes in token-sorted-runs lazily. When 
> the data in the first few token ranges is sufficient for limit, SASI can 
> reduce the overhead of searching sstable indexes for the remaining ranges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16044) Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or RangeAwaredCompaction

2020-08-11 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-16044:
-
Summary: Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or 
RangeAwaredCompaction  (was: Query SSTable Indexes in token sorted runs for LCS 
and TWCS)

> Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or 
> RangeAwaredCompaction
> 
>
> Key: CASSANDRA-16044
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16044
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SASI
>Reporter: ZhaoYang
>Priority: Normal
>
> Currently SASI searches all SSTable indexes that may include the query 
> partition key and indexed term, but this will cause large IO overhead with 
> range index query (ie. age > 18) when sstable count is huge.
> Proposed improvement: query sstable indexes in token-sorted-runs lazily. When 
> the data in the first few token ranges is sufficient for limit, SASI can 
> reduce the overhead of searching sstable indexes for the remaining ranges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16044) Query SSTable Indexes in token sorted runs for LCS and TWCS

2020-08-11 Thread ZhaoYang (Jira)
ZhaoYang created CASSANDRA-16044:


 Summary: Query SSTable Indexes in token sorted runs for LCS and 
TWCS
 Key: CASSANDRA-16044
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16044
 Project: Cassandra
  Issue Type: Improvement
  Components: Feature/SASI
Reporter: ZhaoYang


Currently SASI searches all SSTable indexes that may include the query 
partition key and indexed term, but this will cause large IO overhead with 
range index query (ie. age > 18) when sstable count is huge.

Proposed improvement: query sstable indexes in token-sorted-runs lazily. When 
the data in the first few token ranges is sufficient for limit, SASI can reduce 
the overhead of searching sstable indexes for the remaining ranges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16043) Perform garbage collection on specific partitions or range of partitions

2020-08-11 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-16043:

Change Category: Operability
 Complexity: Normal
Component/s: Local/Compaction
  Fix Version/s: 4.x
 Status: Open  (was: Triage Needed)

> Perform garbage collection on specific partitions or range of partitions
> 
>
> Key: CASSANDRA-16043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16043
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Local/Compaction
>Reporter: Johnny Miller
>Priority: Normal
> Fix For: 4.x
>
>
> Some of our data is quite seasonal and variable based on partitions and 
> certain partitions tend to contain significantly more tombstones than others. 
> We currently run nodetool garbagecollect on tables when this becomes an issue.
> However, if the time to complete and resources required to only garbage 
> collect particular partitions on specific tables could improve the 
> performance of this activity it would be useful to only target the specific 
> partitions we need.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org