[jira] [Commented] (CASSANDRA-15500) only partition key push down when multiple cluster keys restricted in where clause

2020-01-12 Thread Vincent White (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014072#comment-17014072
 ] 

Vincent White commented on CASSANDRA-15500:
---

That's correct, it does appear to just be an issue with the outputting of the 
CQL string  in 
{{org.apache.cassandra.db.filter.ClusteringIndexNamesFilter#toCQLString}} where 
we're running afoul of
{code:java}
clusterings.size() <= 1
{code}
since the {{clusterings.size()}} for this type of query will be 1.
 Haven't had a chance to look any further but there is a short note about the 
original addition of this check here:
 
https://issues.apache.org/jira/browse/CASSANDRA-7392?focusedCommentId=14877285&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14877285

> only partition key push down when multiple cluster keys restricted in where 
> clause
> --
>
> Key: CASSANDRA-15500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15500
> Project: Cassandra
>  Issue Type: Bug
>Reporter: onmstester
>Priority: Normal
>
> Using Apache Cassandra 3.11.2, defined a table like this:
>  
> _create table my_table(_
>                  __                   _partition text,_
>        __           _clustering1 int,_
>       _clustering2 text,_
>       _data set,_
>     **    _*primary key (partition, clustering1, 
> clustering2))*_
>  
> and configured slow queries threshold to 1ms in yaml to see how queries 
> passed to cassandra. Query below:
>  
> _select * from my_table where partition='a' and clustering1= 1 and 
> clustering2='b'_
>  
> would be like this in debug.log of cassandra:
>  
> _select * from my_table where partition='a' LIMIT 100>  (it means that the 
> two cluster key restriction did not push down to storage engine and the whole 
> partition been retrieved)_
>  
> but this query:
>  
> _select * from my_table where partition='a' and clustering1= 1_
>  
> _would be_
>  
> _select * from my_table where partition='a' and_ _clustering1= 1_ _LIMIT 100> 
> (single cluster key been pushed down to storage engine)_
>  
>  
> _So it seems to me that, we could not restrict multiple clustering keys in 
> select because it would retrieve the whole partition ?!_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15293) Static columns not include in mutation size calculation

2020-01-09 Thread Vincent White (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012404#comment-17012404
 ] 

Vincent White commented on CASSANDRA-15293:
---

||Branch||
|[Trunk unit test|https://github.com/vincewhite/cassandra/commits/15293_trunk]|
|[3.11 unit 
test|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|

> Static columns not include in mutation size calculation
> ---
>
> Key: CASSANDRA-15293
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15293
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging, Observability/Metrics
>Reporter: Vincent White
>Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Patch to include any updates to static columns in the data size calculation 
> of PartitionUpdate.
> ||Patch||
> |[Trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14099) LCS ordering of sstables by timestamp is inverted

2019-10-14 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14099:
--
Fix Version/s: (was: 3.11.x)
   (was: 4.x)
   (was: 3.0.x)
   3.11.4
   4.0

> LCS ordering of sstables by timestamp is inverted
> -
>
> Key: CASSANDRA-14099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Jeff Jirsa
>Assignee: Vincent White
>Priority: Low
> Fix For: 3.11.4, 4.0
>
>
> In CASSANDRA-14010 we discovered that CASSANDRA-13776 broke sstable ordering 
> by timestamp (inverted it accidentally). Investigating that revealed that the 
> comparator was expecting newest-to-oldest for read command, but LCS expects 
> oldest-to-newest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14099) LCS ordering of sstables by timestamp is inverted

2019-10-14 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14099:
--
Resolution: Fixed
Status: Resolved  (was: Open)

> LCS ordering of sstables by timestamp is inverted
> -
>
> Key: CASSANDRA-14099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Jeff Jirsa
>Assignee: Vincent White
>Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In CASSANDRA-14010 we discovered that CASSANDRA-13776 broke sstable ordering 
> by timestamp (inverted it accidentally). Investigating that revealed that the 
> comparator was expecting newest-to-oldest for read command, but LCS expects 
> oldest-to-newest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14099) LCS ordering of sstables by timestamp is inverted

2019-10-14 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14099:
--
Status: Open  (was: Patch Available)

> LCS ordering of sstables by timestamp is inverted
> -
>
> Key: CASSANDRA-14099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Jeff Jirsa
>Assignee: Vincent White
>Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In CASSANDRA-14010 we discovered that CASSANDRA-13776 broke sstable ordering 
> by timestamp (inverted it accidentally). Investigating that revealed that the 
> comparator was expecting newest-to-oldest for read command, but LCS expects 
> oldest-to-newest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15353) Documentation Preview - Cassandra 4.0

2019-10-09 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White reassigned CASSANDRA-15353:
-

Assignee: Vincent White

> Documentation Preview - Cassandra 4.0
> -
>
> Key: CASSANDRA-15353
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15353
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: DeepakVohra
>Assignee: Vincent White
>Priority: Normal
>
> Please review, and add comments to, some of the documentation preview for 
> Apache Cassandra 4.0.
> *TODOs * 
> _Read Repair_
> https://docs.google.com/document/d/1qkyXPAYjkb2fFAP5WAOJ9KGTrRQ79Cl4nhuydf0m03U/edit?usp=sharing
> _Full Repair Example_
> https://docs.google.com/document/d/1Hxncmze_KNhpDQjrvePMfhGnspZEuwW8tqgqrMAMp4Q/edit?usp=sharing
> *New Features  *
> _Support for Java 11_
> https://docs.google.com/document/d/1v7ffccqk_5Son4iwfuwZae8YUam41quewKi_6Z_PQGw/edit?usp=sharing
> *Improvements * 
> _Improved Streaming_
> https://docs.google.com/document/d/1zd6AuHBgC82v598cuDtVIkxJVDV9c0A2rrMEVFHyB4Y/edit?usp=sharing
> _Improved Internode Messaging_
> https://docs.google.com/document/d/1ub2DBHE7hNEKe4tuCOpdbCZ7qGqCdMAYvSTC-YamuMA/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15353) Documentation Preview - Cassandra 4.0

2019-10-09 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White reassigned CASSANDRA-15353:
-

Assignee: (was: Vincent White)

> Documentation Preview - Cassandra 4.0
> -
>
> Key: CASSANDRA-15353
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15353
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: DeepakVohra
>Priority: Normal
>
> Please review, and add comments to, some of the documentation preview for 
> Apache Cassandra 4.0.
> *TODOs * 
> _Read Repair_
> https://docs.google.com/document/d/1qkyXPAYjkb2fFAP5WAOJ9KGTrRQ79Cl4nhuydf0m03U/edit?usp=sharing
> _Full Repair Example_
> https://docs.google.com/document/d/1Hxncmze_KNhpDQjrvePMfhGnspZEuwW8tqgqrMAMp4Q/edit?usp=sharing
> *New Features  *
> _Support for Java 11_
> https://docs.google.com/document/d/1v7ffccqk_5Son4iwfuwZae8YUam41quewKi_6Z_PQGw/edit?usp=sharing
> *Improvements * 
> _Improved Streaming_
> https://docs.google.com/document/d/1zd6AuHBgC82v598cuDtVIkxJVDV9c0A2rrMEVFHyB4Y/edit?usp=sharing
> _Improved Internode Messaging_
> https://docs.google.com/document/d/1ub2DBHE7hNEKe4tuCOpdbCZ7qGqCdMAYvSTC-YamuMA/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15293) Static columns not include in mutation size calculation

2019-08-25 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15293:
--
 Bug Category: Parent values: Correctness(12982)
   Complexity: Low Hanging Fruit
Discovered By: Code Inspection
 Severity: Low
Since Version: 3.0 alpha 1

> Static columns not include in mutation size calculation
> ---
>
> Key: CASSANDRA-15293
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15293
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging, Observability/Metrics
>Reporter: Vincent White
>Priority: Low
>
> Patch to include any updates to static columns in the data size calculation 
> of PartitionUpdate.
> ||Patch||
> |[Trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15293) Static columns not include in mutation size calculation

2019-08-25 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15293:
--
Description: 
Patch to include any updates to static columns in the data size calculation of 
PartitionUpdate.


||Patch||
|[trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|



  was:
Patch to include any updates to static columns in the data size calculation of 
PartitionUpdate.




> Static columns not include in mutation size calculation
> ---
>
> Key: CASSANDRA-15293
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15293
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging, Observability/Metrics
>Reporter: Vincent White
>Priority: Normal
>
> Patch to include any updates to static columns in the data size calculation 
> of PartitionUpdate.
> ||Patch||
> |[trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15293) Static columns not include in mutation size calculation

2019-08-25 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15293:
--
Description: 
Patch to include any updates to static columns in the data size calculation of 
PartitionUpdate.


||Patch||
|[Trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|



  was:
Patch to include any updates to static columns in the data size calculation of 
PartitionUpdate.


||Patch||
|[trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|




> Static columns not include in mutation size calculation
> ---
>
> Key: CASSANDRA-15293
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15293
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging, Observability/Metrics
>Reporter: Vincent White
>Priority: Normal
>
> Patch to include any updates to static columns in the data size calculation 
> of PartitionUpdate.
> ||Patch||
> |[Trunk/3.11|https://github.com/vincewhite/cassandra/commits/include_update_to_static_columns_in_update_size]|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15293) Static columns not include in mutation size calculation

2019-08-25 Thread Vincent White (Jira)
Vincent White created CASSANDRA-15293:
-

 Summary: Static columns not include in mutation size calculation
 Key: CASSANDRA-15293
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15293
 Project: Cassandra
  Issue Type: Bug
  Components: Observability/Logging, Observability/Metrics
Reporter: Vincent White


Patch to include any updates to static columns in the data size calculation of 
PartitionUpdate.





--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15292) Point-in-time recovery ignoring timestamp of static column updates

2019-08-25 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15292:
--
Description: 
During point-in-time recovery 
org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
see if any write timestamps in the update exceed the recovery point. If any of 
the timestamps do exceed this point the the commit log replay is stopped.

Currently maxTimestamp only iterates over the regular rows in the update and 
doesn't check for any included updates to static columns. If a ParitionUpdate 
only contains updates to static columns then maxTimestamp will return 
Long.MIN_VALUE and always be replayed. 

This generally isn't much of an issue, except for non-dense compact storage 
tables which are implemented in the 3.x storage engine in large part with 
static columns. In this case the commit log will always continue applying 
updates to them past the recovery point until it hits an update to a different 
table with regular columns or reaches the end of the commit logs.

 
||Patch||
|[3.11|https://github.com/vincewhite/cassandra/commits/3_11_check_static_column_timestamps_commit_log_archive]|
|[Trunk|https://github.com/vincewhite/cassandra/commits/trunk_check_static_column_timestamps]|

  was:
During point-in-time recovery 
org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
see if any write timestamps in the update exceed the recovery point. If any of 
the timestamps do exceed this point the the commit log replay is stopped.

Currently maxTimestamp only iterates over the regular rows in the update and 
doesn't check for any included updates to static columns. If a ParitionUpdate 
only contains updates to static columns then maxTimestamp will return 
Long.MIN_VALUE and always be replayed. 

This generally isn't much of an issue, except for non-dense compact storage 
tables which are implemented in the 3.x storage engine in large part with 
static columns. In this case the commit log will always continue applying 
updates to them past the recovery point until it hits an update to a different 
table with regular columns or reaches the end of the commit logs.

 
||Patch||
|[3.11|https://github.com/vincewhite/cassandra/commits/3_11_check_static_column_timestamps_commit_log_archive]|
|Trunk|


> Point-in-time recovery ignoring timestamp of static column updates
> --
>
> Key: CASSANDRA-15292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15292
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Vincent White
>Priority: Normal
>
> During point-in-time recovery 
> org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
> see if any write timestamps in the update exceed the recovery point. If any 
> of the timestamps do exceed this point the the commit log replay is stopped.
> Currently maxTimestamp only iterates over the regular rows in the update and 
> doesn't check for any included updates to static columns. If a ParitionUpdate 
> only contains updates to static columns then maxTimestamp will return 
> Long.MIN_VALUE and always be replayed. 
> This generally isn't much of an issue, except for non-dense compact storage 
> tables which are implemented in the 3.x storage engine in large part with 
> static columns. In this case the commit log will always continue applying 
> updates to them past the recovery point until it hits an update to a 
> different table with regular columns or reaches the end of the commit logs.
>  
> ||Patch||
> |[3.11|https://github.com/vincewhite/cassandra/commits/3_11_check_static_column_timestamps_commit_log_archive]|
> |[Trunk|https://github.com/vincewhite/cassandra/commits/trunk_check_static_column_timestamps]|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15292) Point-in-time recovery ignoring timestamp of static column updates

2019-08-25 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15292:
--
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
Since Version: 3.0 alpha 1

> Point-in-time recovery ignoring timestamp of static column updates
> --
>
> Key: CASSANDRA-15292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15292
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Vincent White
>Priority: Normal
>
> During point-in-time recovery 
> org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
> see if any write timestamps in the update exceed the recovery point. If any 
> of the timestamps do exceed this point the the commit log replay is stopped.
> Currently maxTimestamp only iterates over the regular rows in the update and 
> doesn't check for any included updates to static columns. If a ParitionUpdate 
> only contains updates to static columns then maxTimestamp will return 
> Long.MIN_VALUE and always be replayed. 
> This generally isn't much of an issue, except for non-dense compact storage 
> tables which are implemented in the 3.x storage engine in large part with 
> static columns. In this case the commit log will always continue applying 
> updates to them past the recovery point until it hits an update to a 
> different table with regular columns or reaches the end of the commit logs.
>  
> ||Patch||
> |[3.11|https://github.com/vincewhite/cassandra/commits/3_11_check_static_column_timestamps_commit_log_archive]|
> |Trunk|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15292) Point-in-time recovery ignoring timestamp of static column updates

2019-08-25 Thread Vincent White (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15292:
--
Description: 
During point-in-time recovery 
org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
see if any write timestamps in the update exceed the recovery point. If any of 
the timestamps do exceed this point the the commit log replay is stopped.

Currently maxTimestamp only iterates over the regular rows in the update and 
doesn't check for any included updates to static columns. If a ParitionUpdate 
only contains updates to static columns then maxTimestamp will return 
Long.MIN_VALUE and always be replayed. 

This generally isn't much of an issue, except for non-dense compact storage 
tables which are implemented in the 3.x storage engine in large part with 
static columns. In this case the commit log will always continue applying 
updates to them past the recovery point until it hits an update to a different 
table with regular columns or reaches the end of the commit logs.

 
||Patch||
|[3.11|https://github.com/vincewhite/cassandra/commits/3_11_check_static_column_timestamps_commit_log_archive]|
|Trunk|

  was:
During point-in-time recovery 
org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
see if any write timestamps in the update exceed the recovery point. If any of 
the timestamps do exceed this point the the commit log replay is stopped.

Currently maxTimestamp only iterates over the regular rows in the update and 
doesn't check for any included updates to static columns. If a ParitionUpdate 
only contains updates to static columns then maxTimestamp will return 
Long.MIN_VALUE and always be replayed. 

This generally isn't much of an issue, except for non-dense compact storage 
tables which are implemented in the 3.x storage engine in large part with 
static columns. In this case the commit log will always continue applying 
updates to them past the recovery point until it hits an update to a different 
table with regular columns or reaches the end of the commit logs.

 
||Patch||
|[3.11|http://example.com/]|
|Trunk|


> Point-in-time recovery ignoring timestamp of static column updates
> --
>
> Key: CASSANDRA-15292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15292
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Vincent White
>Priority: Normal
>
> During point-in-time recovery 
> org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
> see if any write timestamps in the update exceed the recovery point. If any 
> of the timestamps do exceed this point the the commit log replay is stopped.
> Currently maxTimestamp only iterates over the regular rows in the update and 
> doesn't check for any included updates to static columns. If a ParitionUpdate 
> only contains updates to static columns then maxTimestamp will return 
> Long.MIN_VALUE and always be replayed. 
> This generally isn't much of an issue, except for non-dense compact storage 
> tables which are implemented in the 3.x storage engine in large part with 
> static columns. In this case the commit log will always continue applying 
> updates to them past the recovery point until it hits an update to a 
> different table with regular columns or reaches the end of the commit logs.
>  
> ||Patch||
> |[3.11|https://github.com/vincewhite/cassandra/commits/3_11_check_static_column_timestamps_commit_log_archive]|
> |Trunk|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15292) Point-in-time recovery ignoring timestamp of static column updates

2019-08-25 Thread Vincent White (Jira)
Vincent White created CASSANDRA-15292:
-

 Summary: Point-in-time recovery ignoring timestamp of static 
column updates
 Key: CASSANDRA-15292
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15292
 Project: Cassandra
  Issue Type: Bug
  Components: Local/Commit Log
Reporter: Vincent White


During point-in-time recovery 
org.apache.cassandra.db.partitions.PartitionUpdate#maxTimestamp is checked to 
see if any write timestamps in the update exceed the recovery point. If any of 
the timestamps do exceed this point the the commit log replay is stopped.

Currently maxTimestamp only iterates over the regular rows in the update and 
doesn't check for any included updates to static columns. If a ParitionUpdate 
only contains updates to static columns then maxTimestamp will return 
Long.MIN_VALUE and always be replayed. 

This generally isn't much of an issue, except for non-dense compact storage 
tables which are implemented in the 3.x storage engine in large part with 
static columns. In this case the commit log will always continue applying 
updates to them past the recovery point until it hits an update to a different 
table with regular columns or reaches the end of the commit logs.

 
||Patch||
|[3.11|http://example.com/]|
|Trunk|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15191) stop_paranoid disk failure policy race condition on startup

2019-06-30 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15191:
--
Discovered By: Adhoc Test
Since Version: 3.0 alpha 1

> stop_paranoid disk failure policy race condition on startup
> ---
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Priority: Normal
> Attachments: log.txt
>
>
> If the {{stop_paranoid}} disk failure policy is triggered during startup (for 
> example by a compaction) before the node reaches CassandraDaemon.start(), 
> CassandraDaemon.start() will start the client transport services and start 
> listening for client connections regardless of the previous corrupt sstable 
> exception. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15191) stop_paranoid disk failure policy race condition on startup

2019-06-30 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-15191:
-

 Summary: stop_paranoid disk failure policy race condition on 
startup
 Key: CASSANDRA-15191
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White
 Attachments: log.txt

If the {{stop_paranoid}} disk failure policy is triggered during startup (for 
example by a compaction) before the node reaches CassandraDaemon.start(), 
CassandraDaemon.start() will start the client transport services and start 
listening for client connections regardless of the previous corrupt sstable 
exception. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2019-06-18 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15158:
--
Description: 
Currently when a node is bootstrapping we use a set of latches 
(org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
in-flight schema pull requests, and we don't proceed with bootstrapping/stream 
until all the latches are released (or we timeout waiting for each one). One 
issue with this is that if we have a large schema, or the retrieval of the 
schema from the other nodes was unexpectedly slow then we have no explicit 
check in place to ensure we have actually received a schema before we proceed.

While it's possible to increase "migration_task_wait_in_seconds" to force the 
node to wait on each latche longer, there are cases where this doesn't help 
because the callbacks for the schema pull requests have expired off the 
messaging service's callback map 
(org.apache.cassandra.net.MessagingService#callbacks) after 
request_timeout_in_ms (default 10 seconds) before the other nodes were able to 
respond to the new node.

This patch checks for schema agreement between the bootstrapping node and the 
rest of the live nodes before proceeding with bootstrapping. It also adds a 
check to prevent the new node from flooding existing nodes with simultaneous 
schema pull requests as can happen in large clusters.

Removing the latch system should also prevent new nodes in large clusters 
getting stuck for extended amounts of time as they wait 
`migration_task_wait_in_seconds` on each of the latches left orphaned by the 
timed out callbacks.

 
||3.11||
|[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
|[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|

 

  was:
Currently when a node is bootstrapping we use a set of latches 
(org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
in-flight schema pull requests, and we don't proceed with bootstrapping/stream 
until all the latches are released (or we timeout waiting for each one). One 
issue with this is that if we have a large schema, or the retrieval of the 
schema from the other nodes was unexpectedly slow then we have no explicit 
check in place to ensure we have actually received a schema before we proceed.

While it's possible to increase "migration_task_wait_in_seconds" to force the 
node to wait on each latches longer, there are cases where this doesn't help 
because the callbacks for the schema pull requests have expired off the 
messaging service's callback map 
(org.apache.cassandra.net.MessagingService#callbacks) after -getMinRpcTimeout() 
(2 seconds by default)-  before the other nodes were able to respond to the new 
node.

This patch checks for schema agreement between the bootstrapping node and the 
rest of the live nodes before proceeding with bootstrapping. It also adds a 
check to prevent the new node from flooding existing nodes with simultaneous 
schema pull requests as can happen in large clusters.
||3.11||
|[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
|[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|

 


> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Vincent White
>Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. I

[jira] [Updated] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2019-06-18 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15158:
--
Description: 
Currently when a node is bootstrapping we use a set of latches 
(org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
in-flight schema pull requests, and we don't proceed with bootstrapping/stream 
until all the latches are released (or we timeout waiting for each one). One 
issue with this is that if we have a large schema, or the retrieval of the 
schema from the other nodes was unexpectedly slow then we have no explicit 
check in place to ensure we have actually received a schema before we proceed.

While it's possible to increase "migration_task_wait_in_seconds" to force the 
node to wait on each latches longer, there are cases where this doesn't help 
because the callbacks for the schema pull requests have expired off the 
messaging service's callback map 
(org.apache.cassandra.net.MessagingService#callbacks) after -getMinRpcTimeout() 
(2 seconds by default)-  before the other nodes were able to respond to the new 
node.

This patch checks for schema agreement between the bootstrapping node and the 
rest of the live nodes before proceeding with bootstrapping. It also adds a 
check to prevent the new node from flooding existing nodes with simultaneous 
schema pull requests as can happen in large clusters.
||3.11||
|[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
|[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|

 

  was:
Currently when a node is bootstrapping we use a set of latches 
(org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
in-flight schema pull requests, and we don't proceed with bootstrapping/stream 
until all the latches are released (or we timeout waiting for each one). One 
issue with this is that if we have a large schema, or the retrieval of the 
schema from the other nodes was unexpectedly slow then we have no explicit 
check in place to ensure we have actually received a schema before we proceed.

While it's possible to increase "migration_task_wait_in_seconds" to force the 
node to wait on each latches longer, there are cases where this doesn't help 
because the callbacks for the schema pull requests have expired off the 
messaging service's callback map 
(org.apache.cassandra.net.MessagingService#callbacks) after getMinRpcTimeout() 
(2 seconds by default) before the other nodes were able to respond to the new 
node.

This patch checks for schema agreement between the bootstrapping node and the 
rest of the live nodes before proceeding with bootstrapping. It also adds a 
check to prevent the new node from flooding existing nodes with simultaneous 
schema pull requests as can happen in large clusters.

|||3.11||
|[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|




> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Vincent White
>Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latches longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> -getMinRpcTimeout() (2 seconds by default)-  before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewh

[jira] [Updated] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2019-06-13 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15158:
--
 Severity: Normal
Discovered By: User Report
 Bug Category: Parent values: Correctness(12982)Level 1 values: Consistency 
Failure(12989)
Since Version: 3.1

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> --
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Vincent White
>Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latches longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> getMinRpcTimeout() (2 seconds by default) before the other nodes were able to 
> respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> |||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

2019-06-13 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-15158:
-

 Summary: Wait for schema agreement rather then in flight schema 
requests when bootstrapping
 Key: CASSANDRA-15158
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip
Reporter: Vincent White


Currently when a node is bootstrapping we use a set of latches 
(org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
in-flight schema pull requests, and we don't proceed with bootstrapping/stream 
until all the latches are released (or we timeout waiting for each one). One 
issue with this is that if we have a large schema, or the retrieval of the 
schema from the other nodes was unexpectedly slow then we have no explicit 
check in place to ensure we have actually received a schema before we proceed.

While it's possible to increase "migration_task_wait_in_seconds" to force the 
node to wait on each latches longer, there are cases where this doesn't help 
because the callbacks for the schema pull requests have expired off the 
messaging service's callback map 
(org.apache.cassandra.net.MessagingService#callbacks) after getMinRpcTimeout() 
(2 seconds by default) before the other nodes were able to respond to the new 
node.

This patch checks for schema agreement between the bootstrapping node and the 
rest of the live nodes before proceeding with bootstrapping. It also adds a 
check to prevent the new node from flooding existing nodes with simultaneous 
schema pull requests as can happen in large clusters.

|||3.11||
|[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-22 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15136:
--
Description: 
Just fixes the order in the exception message.



||3.0.x||3.11.x||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|

  was:
Just fixes the order in the exception message.



||3.0.x||3.11||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|


> Incorrect error message in legacy reader
> 
>
> Key: CASSANDRA-15136
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Just fixes the order in the exception message.
> ||3.0.x||3.11.x||
> |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-22 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15136:
--
Description: 
Just fixes the order in the exception message.



||3.0.x||3.11||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|

  was:
Just fixes the order in the exception message.

 
||3.0.x|3.11||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|


> Incorrect error message in legacy reader
> 
>
> Key: CASSANDRA-15136
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Just fixes the order in the exception message.
> ||3.0.x||3.11||
> |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-22 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15136:
--
Description: 
Just fixes the order in the exception message.

 
||3.0.x|3.11||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|

  was:
Just fixes the order in the exception message.

 
||3.11||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|


> Incorrect error message in legacy reader
> 
>
> Key: CASSANDRA-15136
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Just fixes the order in the exception message.
>  
> ||3.0.x|3.11||
> |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom30]|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-22 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846376#comment-16846376
 ] 

Vincent White commented on CASSANDRA-15136:
---

Thanks, should be sorted now.

> Incorrect error message in legacy reader
> 
>
> Key: CASSANDRA-15136
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Just fixes the order in the exception message.
>  
> ||3.11||
> |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-22 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15136:
--
Description: 
Just fixes the order in the exception message.

 
||3.11||
|[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|

  was:
Just fixes the order in the exception message.

 
||3.11||
|[Patch|https://github.com/vincewhite/cassandra/commit/5a62fdd7aa7463a10a1a0bb546c1322ab15eb9cf]|


> Incorrect error message in legacy reader
> 
>
> Key: CASSANDRA-15136
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Just fixes the order in the exception message.
>  
> ||3.11||
> |[Patch|https://github.com/vincewhite/cassandra/commits/readLegacyAtom]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-21 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15136:
--
   Complexity: Low Hanging Fruit
Discovered By: User Report
Since Version: 3.0 alpha 1

> Incorrect error message in legacy reader
> 
>
> Key: CASSANDRA-15136
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Logging
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Just fixes the order in the exception message.
>  
> ||3.11||
> |[Patch|https://github.com/vincewhite/cassandra/commit/5a62fdd7aa7463a10a1a0bb546c1322ab15eb9cf]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15136) Incorrect error message in legacy reader

2019-05-21 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-15136:
-

 Summary: Incorrect error message in legacy reader
 Key: CASSANDRA-15136
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15136
 Project: Cassandra
  Issue Type: Bug
  Components: Observability/Logging
Reporter: Vincent White
Assignee: Vincent White


Just fixes the order in the exception message.

 
||3.11||
|[Patch|https://github.com/vincewhite/cassandra/commit/5a62fdd7aa7463a10a1a0bb546c1322ab15eb9cf]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15135) SASI tokenizer options not validated before being added to schema

2019-05-20 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15135:
--
Discovered By: Adhoc Test
Since Version: 3.4

> SASI tokenizer options not validated before being added to schema
> -
>
> Key: CASSANDRA-15135
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15135
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SASI
>Reporter: Vincent White
>Priority: Normal
>
> If you attempt to create a SASI index with an illegal argument combination 
> the index will be added to the schema tables before trying instantiate the 
> tokenizer which causes a RuntimeException. Since the index was written to the 
> schema tables, cassandra will hit the same exception and fail to start when 
> it tries to load the schema on boot.
>  The branch below includes a unit test to reproduce the issue.
> ||3.11||
> |[PoC|https://github.com/vincewhite/cassandra/commit/089547946d284ae3feb0d5620067b85b8fd66ebc]|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15134) SASI index files not included in snapshots

2019-05-20 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15134:
--
Severity: Normal  (was: Low)

> SASI index files not included in snapshots
> --
>
> Key: CASSANDRA-15134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15134
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SASI
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> Newly written SASI index files are not being included in snapshots. This is 
> because the SASI index files are not added to the components 
> ({{org.apache.cassandra.io.sstable.SSTable#components}}) list of newly 
> written sstables. 
> Although I don't believe anything except snapshots ever tries to reference 
> the SASI index files from this location, on startup Cassandra does add the 
> SASI index files (if they are found on disk) of existing sstables in their 
> components list. In that case sstables that existed on startup with SASI 
> index files will have their SASI index files included in any snapshots.
>  
> This patch updates the components list of newly written sstable once the 
> index is built.
> ||3.11||Trunk||
> |[PoC|https://github.com/vincewhite/cassandra/commit/a641298ad03250d3e4c195e05a93aad56dff8ca7]|[PoC|https://github.com/vincewhite/cassandra/commit/1cfe46688380838e7106f14446658988cfe68137]|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15135) SASI tokenizer options not validated before being added to schema

2019-05-20 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-15135:
-

 Summary: SASI tokenizer options not validated before being added 
to schema
 Key: CASSANDRA-15135
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15135
 Project: Cassandra
  Issue Type: Bug
  Components: Feature/SASI
Reporter: Vincent White


If you attempt to create a SASI index with an illegal argument combination the 
index will be added to the schema tables before trying instantiate the 
tokenizer which causes a RuntimeException. Since the index was written to the 
schema tables, cassandra will hit the same exception and fail to start when it 
tries to load the schema on boot.

 The branch below includes a unit test to reproduce the issue.
||3.11||
|[PoC|https://github.com/vincewhite/cassandra/commit/089547946d284ae3feb0d5620067b85b8fd66ebc]|

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15134) SASI index files not included in snapshots

2019-05-20 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15134:
--
 Severity: Low
Discovered By: Adhoc Test
Since Version: 3.4

> SASI index files not included in snapshots
> --
>
> Key: CASSANDRA-15134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15134
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SASI
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Low
>
> Newly written SASI index files are not being included in snapshots. This is 
> because the SASI index files are not added to the components 
> ({{org.apache.cassandra.io.sstable.SSTable#components}}) list of newly 
> written sstables. 
> Although I don't believe anything except snapshots ever tries to reference 
> the SASI index files from this location, on startup Cassandra does add the 
> SASI index files (if they are found on disk) of existing sstables in their 
> components list. In that case sstables that existed on startup with SASI 
> index files will have their SASI index files included in any snapshots.
>  
> This patch updates the components list of newly written sstable once the 
> index is built.
> ||3.11||Trunk||
> |[PoC\|[https://github.com/vincewhite/cassandra/commit/a641298ad03250d3e4c195e05a93aad56dff8ca7]]|[PoC\|[https://github.com/vincewhite/cassandra/commit/1cfe46688380838e7106f14446658988cfe68137]]|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15134) SASI index files not included in snapshots

2019-05-20 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-15134:
--
Description: 
Newly written SASI index files are not being included in snapshots. This is 
because the SASI index files are not added to the components 
({{org.apache.cassandra.io.sstable.SSTable#components}}) list of newly written 
sstables. 

Although I don't believe anything except snapshots ever tries to reference the 
SASI index files from this location, on startup Cassandra does add the SASI 
index files (if they are found on disk) of existing sstables in their 
components list. In that case sstables that existed on startup with SASI index 
files will have their SASI index files included in any snapshots.

 

This patch updates the components list of newly written sstable once the index 
is built.
||3.11||Trunk||
|[PoC|https://github.com/vincewhite/cassandra/commit/a641298ad03250d3e4c195e05a93aad56dff8ca7]|[PoC|https://github.com/vincewhite/cassandra/commit/1cfe46688380838e7106f14446658988cfe68137]|

 

  was:
Newly written SASI index files are not being included in snapshots. This is 
because the SASI index files are not added to the components 
({{org.apache.cassandra.io.sstable.SSTable#components}}) list of newly written 
sstables. 

Although I don't believe anything except snapshots ever tries to reference the 
SASI index files from this location, on startup Cassandra does add the SASI 
index files (if they are found on disk) of existing sstables in their 
components list. In that case sstables that existed on startup with SASI index 
files will have their SASI index files included in any snapshots.

 

This patch updates the components list of newly written sstable once the index 
is built.
||3.11||Trunk||
|[PoC\|[https://github.com/vincewhite/cassandra/commit/a641298ad03250d3e4c195e05a93aad56dff8ca7]]|[PoC\|[https://github.com/vincewhite/cassandra/commit/1cfe46688380838e7106f14446658988cfe68137]]|

 


> SASI index files not included in snapshots
> --
>
> Key: CASSANDRA-15134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15134
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SASI
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Low
>
> Newly written SASI index files are not being included in snapshots. This is 
> because the SASI index files are not added to the components 
> ({{org.apache.cassandra.io.sstable.SSTable#components}}) list of newly 
> written sstables. 
> Although I don't believe anything except snapshots ever tries to reference 
> the SASI index files from this location, on startup Cassandra does add the 
> SASI index files (if they are found on disk) of existing sstables in their 
> components list. In that case sstables that existed on startup with SASI 
> index files will have their SASI index files included in any snapshots.
>  
> This patch updates the components list of newly written sstable once the 
> index is built.
> ||3.11||Trunk||
> |[PoC|https://github.com/vincewhite/cassandra/commit/a641298ad03250d3e4c195e05a93aad56dff8ca7]|[PoC|https://github.com/vincewhite/cassandra/commit/1cfe46688380838e7106f14446658988cfe68137]|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15134) SASI index files not included in snapshots

2019-05-20 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-15134:
-

 Summary: SASI index files not included in snapshots
 Key: CASSANDRA-15134
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15134
 Project: Cassandra
  Issue Type: Bug
  Components: Feature/SASI
Reporter: Vincent White
Assignee: Vincent White


Newly written SASI index files are not being included in snapshots. This is 
because the SASI index files are not added to the components 
({{org.apache.cassandra.io.sstable.SSTable#components}}) list of newly written 
sstables. 

Although I don't believe anything except snapshots ever tries to reference the 
SASI index files from this location, on startup Cassandra does add the SASI 
index files (if they are found on disk) of existing sstables in their 
components list. In that case sstables that existed on startup with SASI index 
files will have their SASI index files included in any snapshots.

 

This patch updates the components list of newly written sstable once the index 
is built.
||3.11||Trunk||
|[PoC\|[https://github.com/vincewhite/cassandra/commit/a641298ad03250d3e4c195e05a93aad56dff8ca7]]|[PoC\|[https://github.com/vincewhite/cassandra/commit/1cfe46688380838e7106f14446658988cfe68137]]|

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14957) Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision

2019-01-16 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741791#comment-16741791
 ] 

Vincent White edited comment on CASSANDRA-14957 at 1/16/19 10:38 AM:
-

I believe this can happen due a race condition when issuing create statements. 
If a CREATE TABLE statement for the same name is issued -on a two different 
nodes- at the same time you then series of events will look like the following:

Node1 will create and propagate the table as normal, with column family ID 
cf_id1.

If the Node2 gets past the below check before receiving the schema change from 
the Node1, then Node2 will continue executing its CREATE TABLE statement as 
normal except with its own column family ID cf_id2.
{code:java|title=org/apache/cassandra/service/MigrationManager.java:378}
   // If we have a table or a view which has the same name, we can't add a 
new one
else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != 
null)
throw new AlreadyExistsException(cfm.ksName, cfm.cfName);

logger.info("Create new table: {}", cfm);
announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), 
announceLocally);
{code}
Node2 will send out its own set of schema mutations as normal via announce(). 
On all nodes that receive this change, and locally on Node2, they will write 
the schema changes to disk 
(*org/apache/cassandra/schema/SchemaKeyspace.java:1390)* before attempting to 
merge them with their live schema. When attempting to merge the changes with 
their live schema 
*org.apache.cassandra.config.CFMetaData#validateCompatibility* will throw a 
configuration exception and stop the new change being merged. The changes 
written to disk are not rolled back.

All nodes will continue to use the table definition in their live schema 
(cf_id1) and everything will continue to work as expected as if the second 
CREATE TABLE statement was ignored. The issue is that all nodes now have the 
wrong column family ID recorded in their *system_schema.tables* system tables. 
When the nodes restart they we read their schema from disk and start using the 
wrong column family ID, at which point they will make a new empty folder on 
disk for it and you will start seeing the types of errors you've mentioned.

This of course isn't just limited to corrupting the column family ID but I 
believe this can apply to any part of the column family definition.

I believe this is solved in trunk with the changes introduced as part of 
CASSANDRA-10699

EDIT: This doesn't require the statements going to two different nodes, it can 
happen with both statements going to the same node.


was (Author: vincentwhite):
I believe this can happen due a race condition when issuing create statements. 
If a CREATE TABLE statement for the same name is issued -on a two different 
nodes- at the same time you then series of events will look like the following:

Node1 will create and propagate the table as normal, with column family ID 
cf_id1.

If the Node2 gets past the below check before receiving the schema change from 
the Node1, then Node2 will continue executing its CREATE TABLE statement as 
normal except with its own column family ID cf_id2.
{code:java|title=org/apache/cassandra/service/MigrationManager.java:378}
   // If we have a table or a view which has the same name, we can't add a 
new one
else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != 
null)
throw new AlreadyExistsException(cfm.ksName, cfm.cfName);

logger.info("Create new table: {}", cfm);
announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), 
announceLocally);
{code}
Node2 will send out its own set of schema mutations as normal via announce(). 
On all nodes that receive this change, and locally on Node2, they will write 
the schema changes to disk 
(*org/apache/cassandra/schema/SchemaKeyspace.java:1390)* before attempting to 
merge them with their live schema. When attempting to merge the changes with 
their live schema 
*org.apache.cassandra.config.CFMetaData#validateCompatibility* will throw a 
configuration exception and stop the new change being merged. The changes 
written to disk are not rolled back.

All nodes will continue to use the table definition in their live schema 
(cf_id1) and everything will continue to work as expected as if the second 
CREATE TABLE statement was ignored. The issue is that all nodes now have the 
wrong column family ID recorded in their *system_schema.tables* system tables. 
When the nodes restart they we read their schema from disk and start using the 
wrong column family ID, at which point they will make a new empty folder on 
disk for it and you will start seeing the types of errors you've mentioned.

This of course isn't just limited to corrupting the column family ID but I 
believe th

[jira] [Commented] (CASSANDRA-14957) Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision

2019-01-15 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743600#comment-16743600
 ] 

Vincent White commented on CASSANDRA-14957:
---

 

This doesn't require a new CREATE TABLE statement to be issued while restarting 
the nodes etc, only that at the time of the table's creation, somehow two 
CREATE TABLES statements were issued for the same table.

Slightly unrelated: I do wonder if there is a driver somewhere out there that 
was accidentally applying something like speculative execution to DDL 
statements because I do see this issue in situations where it would be unlikely 
that the operator was manually sending off two CREATE TABLE statements so close 
together.

Where the node restarts come into play is that this condition will go unnoticed 
until the nodes are restarted and they read the "wrong"/new table ID from the 
system_schema tables. Since table ID's are timeuuid's we see that both the old 
and new ID's of your tasks table were originally generated on February 19, 2018 
11:26:33 AM GMT. This would indicate that these nodes haven't been restarted 
since then? (assuming no drastic change in your computer clock)

 

> Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision
> 
>
> Key: CASSANDRA-14957
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14957
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Avraham Kalvo
>Priority: Major
>
> We were issuing a rolling restart on a mission-critical five node C* cluster.
> The first node which was restarted got the following messages in its 
> system.log:
> ```
> January 2nd 2019, 12:06:37.310 - INFO 12:06:35 Initializing 
> tasks_scheduler_external.tasks
> ```
> ```
> WARN 12:06:39 UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId bd7200a0-1567-11e8-8974-855d74ee356f. If a table was just created, this 
> is likely due to the schema not being fully propagated. Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:349)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:286)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> ```
> The latter was then repeated several times across the cluster.
> It was then found out that the table in question 
> `tasks_scheduler_external.tasks` was created with a new schema version 
> sometime along the entire cluster consecutive restart and became available 
> once the schema agreement settled, which started taking requests leaving the 
> previous version of the schema unavailable for any request, thus generating a 
> data loss to our online system.
> Data loss was recovered by manually copying SSTables from the previous 
> version directory of the schema to the new one followed by `nodetool refresh` 
> to the relevant table.
> The above has repeated itself for several tables across various keyspaces.
> One other thing to mention is that a repair was in place for the first node 
> to be restarted, which was obviously stopped as the daemon was shut down, but 
> this doesn't seem to do with the above at first glance.
> Seems somewhat related to:
> https://issues.apache.org/jira/browse/CASSANDRA-13559



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14957) Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision

2019-01-15 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741791#comment-16741791
 ] 

Vincent White edited comment on CASSANDRA-14957 at 1/16/19 4:14 AM:


I believe this can happen due a race condition when issuing create statements. 
If a CREATE TABLE statement for the same name is issued -on a two different 
nodes- at the same time you then series of events will look like the following:

Node1 will create and propagate the table as normal, with column family ID 
cf_id1.

If the Node2 gets past the below check before receiving the schema change from 
the Node1, then Node2 will continue executing its CREATE TABLE statement as 
normal except with its own column family ID cf_id2.
{code:java|title=org/apache/cassandra/service/MigrationManager.java:378}
   // If we have a table or a view which has the same name, we can't add a 
new one
else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != 
null)
throw new AlreadyExistsException(cfm.ksName, cfm.cfName);

logger.info("Create new table: {}", cfm);
announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), 
announceLocally);
{code}
Node2 will send out its own set of schema mutations as normal via announce(). 
On all nodes that receive this change, and locally on Node2, they will write 
the schema changes to disk 
(*org/apache/cassandra/schema/SchemaKeyspace.java:1390)* before attempting to 
merge them with their live schema. When attempting to merge the changes with 
their live schema 
*org.apache.cassandra.config.CFMetaData#validateCompatibility* will throw a 
configuration exception and stop the new change being merged. The changes 
written to disk are not rolled back.

All nodes will continue to use the table definition in their live schema 
(cf_id1) and everything will continue to work as expected as if the second 
CREATE TABLE statement was ignored. The issue is that all nodes now have the 
wrong column family ID recorded in their *system_schema.tables* system tables. 
When the nodes restart they we read their schema from disk and start using the 
wrong column family ID, at which point they will make a new empty folder on 
disk for it and you will start seeing the types of errors you've mentioned.

This of course isn't just limited to corrupting the column family ID but I 
believe this can apply to any part of the column family definition.

I believe this is solved in trunk with the changes introduced as part of 
CASSANDRA-10699

EDIT: This doesn't require the statements going to two different nodes, it can 
happen with both statements going to the same nodes.


was (Author: vincentwhite):
I believe this can happen due a race condition when issuing create statements. 
If a CREATE TABLE statement for the same name is issued on a two different 
nodes at the same time you then series of events will look like the following:

Node1 will create and propagate the table as normal, with column family ID 
cf_id1.

If the Node2 gets past the below check before receiving the schema change from 
the Node1, then Node2 will continue executing its CREATE TABLE statement as 
normal except with its own column family ID cf_id2.
{code:java|title=org/apache/cassandra/service/MigrationManager.java:378}
   // If we have a table or a view which has the same name, we can't add a 
new one
else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != 
null)
throw new AlreadyExistsException(cfm.ksName, cfm.cfName);

logger.info("Create new table: {}", cfm);
announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), 
announceLocally);
{code}
Node2 will send out its own set of schema mutations as normal via announce(). 
On all nodes that receive this change, and locally on Node2, they will write 
the schema changes to disk 
(*org/apache/cassandra/schema/SchemaKeyspace.java:1390)* before attempting to 
merge them with their live schema. When attempting to merge the changes with 
their live schema 
*org.apache.cassandra.config.CFMetaData#validateCompatibility* will throw a 
configuration exception and stop the new change being merged. The changes 
written to disk are not rolled back.

All nodes will continue to use the table definition in their live schema 
(cf_id1) and everything will continue to work as expected as if the second 
CREATE TABLE statement was ignored. The issue is that all nodes now have the 
wrong column family ID recorded in their *system_schema.tables* system tables. 
When the nodes restart they we read their schema from disk and start using the 
wrong column family ID, at which point they will make a new empty folder on 
disk for it and you will start seeing the types of errors you've mentioned.

This of course isn't just limited to corrupting the column family ID but I 
believe this 

[jira] [Updated] (CASSANDRA-14957) Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision

2019-01-13 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14957:
--
Reproduced In: 3.0.10, 3.11.x  (was: 3.0.10)

> Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision
> 
>
> Key: CASSANDRA-14957
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14957
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Avraham Kalvo
>Priority: Major
>
> We were issuing a rolling restart on a mission-critical five node C* cluster.
> The first node which was restarted got the following messages in its 
> system.log:
> ```
> January 2nd 2019, 12:06:37.310 - INFO 12:06:35 Initializing 
> tasks_scheduler_external.tasks
> ```
> ```
> WARN 12:06:39 UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId bd7200a0-1567-11e8-8974-855d74ee356f. If a table was just created, this 
> is likely due to the schema not being fully propagated. Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:349)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:286)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> ```
> The latter was then repeated several times across the cluster.
> It was then found out that the table in question 
> `tasks_scheduler_external.tasks` was created with a new schema version 
> sometime along the entire cluster consecutive restart and became available 
> once the schema agreement settled, which started taking requests leaving the 
> previous version of the schema unavailable for any request, thus generating a 
> data loss to our online system.
> Data loss was recovered by manually copying SSTables from the previous 
> version directory of the schema to the new one followed by `nodetool refresh` 
> to the relevant table.
> The above has repeated itself for several tables across various keyspaces.
> One other thing to mention is that a repair was in place for the first node 
> to be restarted, which was obviously stopped as the daemon was shut down, but 
> this doesn't seem to do with the above at first glance.
> Seems somewhat related to:
> https://issues.apache.org/jira/browse/CASSANDRA-13559



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14957) Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision

2019-01-13 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741791#comment-16741791
 ] 

Vincent White commented on CASSANDRA-14957:
---

I believe this can happen due a race condition when issuing create statements. 
If a CREATE TABLE statement for the same name is issued on a two different 
nodes at the same time you then series of events will look like the following:

Node1 will create and propagate the table as normal, with column family ID 
cf_id1.

If the Node2 gets past the below check before receiving the schema change from 
the Node1, then Node2 will continue executing its CREATE TABLE statement as 
normal except with its own column family ID cf_id2.
{code:java|title=org/apache/cassandra/service/MigrationManager.java:378}
   // If we have a table or a view which has the same name, we can't add a 
new one
else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != 
null)
throw new AlreadyExistsException(cfm.ksName, cfm.cfName);

logger.info("Create new table: {}", cfm);
announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), 
announceLocally);
{code}
Node2 will send out its own set of schema mutations as normal via announce(). 
On all nodes that receive this change, and locally on Node2, they will write 
the schema changes to disk 
(*org/apache/cassandra/schema/SchemaKeyspace.java:1390)* before attempting to 
merge them with their live schema. When attempting to merge the changes with 
their live schema 
*org.apache.cassandra.config.CFMetaData#validateCompatibility* will throw a 
configuration exception and stop the new change being merged. The changes 
written to disk are not rolled back.

All nodes will continue to use the table definition in their live schema 
(cf_id1) and everything will continue to work as expected as if the second 
CREATE TABLE statement was ignored. The issue is that all nodes now have the 
wrong column family ID recorded in their *system_schema.tables* system tables. 
When the nodes restart they we read their schema from disk and start using the 
wrong column family ID, at which point they will make a new empty folder on 
disk for it and you will start seeing the types of errors you've mentioned.

This of course isn't just limited to corrupting the column family ID but I 
believe this can apply to any part of the column family definition.

I believe this is solved in trunk with the changes introduced as part of 
CASSANDRA-10699

> Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision
> 
>
> Key: CASSANDRA-14957
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14957
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Avraham Kalvo
>Priority: Major
>
> We were issuing a rolling restart on a mission-critical five node C* cluster.
> The first node which was restarted got the following messages in its 
> system.log:
> ```
> January 2nd 2019, 12:06:37.310 - INFO 12:06:35 Initializing 
> tasks_scheduler_external.tasks
> ```
> ```
> WARN 12:06:39 UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId bd7200a0-1567-11e8-8974-855d74ee356f. If a table was just created, this 
> is likely due to the schema not being fully propagated. Please wait for 
> schema agreement on table creation.
> at 
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:349)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:286)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[apache

[jira] [Updated] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys

2019-01-08 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14365:
--
Description: 
In the old storage engine, static cells with a collection as part of the 
clustering key fail to validate because a 0 byte collection (like in the cell 
name of a static cell) isn't valid.

To reproduce:

1.
{code:java}
CREATE TABLE test.x (
id int,
id2 frozen>,
st int static,
PRIMARY KEY (id, id2)
);

INSERT INTO test.x (id, st) VALUES (1, 2);
{code}
2.
 Kill the cassandra process

3.
 Restart cassandra to replay the commitlog

Outcome:
{noformat}
ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read a set
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) 
[main/:na]


{noformat}
I haven't investigated if there are other more subtle issues caused by these 
cells failing to validate other places in the code, but I believe the fix for 
this is to check for 0 byte length collections and accept them as valid as we 
do with other types.

I haven't had a chance for any extensive testing but this naive patch seems to 
have the desired affect. 


||Patch||
|[2.2 PoC 
Patch|https://github.com/vincewhite/cassandra/commits/zero_length_collection]|


  was:
In the old storage engine, static cells with a collection as part of the 
clustering key fail to validate because a 0 byte collection (like in the cell 
name of a static cell) isn't valid.

To reproduce:

1.
{code:java}
CREATE TABLE test.x (
id int,
id2 frozen>,
st int static,
PRIMARY KEY (id, id2)
);

INSERT INTO test.x (id, st) VALUES (1, 2);
{code}
2.
 Kill the cassandra process

3.
 Restart cassandra to replay the commitlog

Outcome:
{noformat}
ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read a set
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) 
[main/:na]


{noformat}
I haven't investigated if there are other more subtle issues caused by these 
cells failing to validate other places in the co

[jira] [Updated] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys

2019-01-08 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14365:
--
Description: 
In the old storage engine, static cells with a collection as part of the 
clustering key fail to validate because a 0 byte collection (like in the cell 
name of a static cell) isn't valid.

To reproduce:

1.
{code:java}
CREATE TABLE test.x (
id int,
id2 frozen>,
st int static,
PRIMARY KEY (id, id2)
);

INSERT INTO test.x (id, st) VALUES (1, 2);
{code}
2.
 Kill the cassandra process

3.
 Restart cassandra to replay the commitlog

Outcome:
{noformat}
ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read a set
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) 
[main/:na]


{noformat}
I haven't investigated if there are other more subtle issues caused by these 
cells failing to validate other places in the code, but I believe the fix for 
this is to check for 0 byte length collections and accept them as valid as we 
do with other types.

I haven't had a chance for any extensive testing but this naive patch seems to 
have the desired affect. 


||Patch||
|[2.2 
PoC|https://github.com/vincewhite/cassandra/commits/zero_length_collection]|


  was:
In the old storage engine, static cells with a collection as part of the 
clustering key fail to validate because a 0 byte collection (like in the cell 
name of a static cell) isn't valid.

To reproduce:

1.
{code:java}
CREATE TABLE test.x (
id int,
id2 frozen>,
st int static,
PRIMARY KEY (id, id2)
);

INSERT INTO test.x (id, st) VALUES (1, 2);
{code}
2.
 Kill the cassandra process

3.
 Restart cassandra to replay the commitlog

Outcome:
{noformat}
ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read a set
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) 
[main/:na]


{noformat}
I haven't investigated if there are other more subtle issues caused by these 
cells failing to validate other places in the code, bu

[jira] [Updated] (CASSANDRA-14948) Backport dropped column checks to 3.11

2019-01-02 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14948:
--
Fix Version/s: 3.11.x
   Status: Patch Available  (was: Open)

> Backport dropped column checks to 3.11
> --
>
> Key: CASSANDRA-14948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14948
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
> Fix For: 3.11.x
>
>
> This is a follow on from CASSANDRA-14913 and CASSANDRA-14843 that introduced 
> some fixes to prevent and mitigate data corruption caused by dropping a 
> column then re-adding it with the same name but an incompatible type (e.g. 
> simple int to a complex map<>) or different kind (regular/static). 
> This patch backports the checks that now exist in trunk. This does include 
> adding a column to the dropped_columns table to keep track of static columns 
> like trunk, not sure it we are able to make that change in 3.11.x. 
> Also not sure what our stance on backporting just the isValueCompatibleWith 
> check to 3.0 is. I'd be for it since it prevents recreating a simple column 
> as a map (or vice-versa) which will basically always lead to corruption.
> ||C* 3.11.x||
> |[Patch|https://github.com/vincewhite/cassandra/commit/3986b53b8acaf1d3691f9b35fd098a40667c520f]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14948) Backport dropped column checks to 3.11

2019-01-02 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14948:
-

 Summary: Backport dropped column checks to 3.11
 Key: CASSANDRA-14948
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14948
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White
Assignee: Vincent White


This is a follow on from CASSANDRA-14913 and CASSANDRA-14843 that introduced 
some fixes to prevent and mitigate data corruption caused by dropping a column 
then re-adding it with the same name but an incompatible type (e.g. simple int 
to a complex map<>) or different kind (regular/static). 

This patch backports the checks that now exist in trunk. This does include 
adding a column to the dropped_columns table to keep track of static columns 
like trunk, not sure it we are able to make that change in 3.11.x. 

Also not sure what our stance on backporting just the isValueCompatibleWith 
check to 3.0 is. I'd be for it since it prevents recreating a simple column as 
a map (or vice-versa) which will basically always lead to corruption.

||C* 3.11.x||
|[Patch|https://github.com/vincewhite/cassandra/commit/3986b53b8acaf1d3691f9b35fd098a40667c520f]|




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14564) issue while adding a column in a table with compact storage

2019-01-02 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14564:
--
 Assignee: Vincent White
Fix Version/s: 3.11.x
   Status: Patch Available  (was: Open)

> issue while adding a column in a table with compact storage   
> --
>
> Key: CASSANDRA-14564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Laxmikant Upadhyay
>Assignee: Vincent White
>Priority: Major
> Fix For: 3.11.x
>
>
> I have upgraded my system from cassandra 2.1.16 to 3.11.2. We had some tables 
> with COMPACT STORAGE enabled. We see some weird   behaviour of cassandra 
> while adding a column into it.
> Cassandra does not give any error while altering  however the added column is 
> invisible. 
> Same behaviour when we create a new table with compact storage and try to 
> alter it. Below is the commands ran in sequence: 
>  
> {code:java}
> x@cqlsh:xuser> CREATE TABLE xuser.employee(emp_id int PRIMARY KEY,emp_name 
> text, emp_city text, emp_sal varint, emp_phone varint ) WITH  COMPACT STORAGE;
> x@cqlsh:xuser> desc table xuser.employee ;
> CREATE TABLE xuser.employee (
> emp_id int PRIMARY KEY,
> emp_city text,
> emp_name text,
> emp_phone varint,
> emp_sal varint
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';{code}
> Now altering the table by adding a new column:
>   
> {code:java}
> x@cqlsh:xuser>  alter table employee add profile text;
> x@cqlsh:xuser> desc table xuser.employee ;
> CREATE TABLE xuser.employee (
> emp_id int PRIMARY KEY,
> emp_city text,
> emp_name text,
> emp_phone varint,
> emp_sal varint
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> notice that above desc table result does not have newly added column profile. 
> However when i try to add it again it gives column already exist;
> {code:java}
> x@cqlsh:xuser>  alter table employee add profile text;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> column name profile because it conflicts with an existing column"
> x@cqlsh:xuser> select emp_name,profile from employee;
>  emp_name | profile
> --+-
> (0 rows)
> x@cqlsh:xuser>
> {code}
> Inserting also behaves strange:
> {code:java}
> x@cqlsh:xuser> INSERT INTO employee (emp_id , emp_city , emp_name , emp_phone 
> , emp_sal ,profile) VALUES ( 1, 'ggn', 'john', 123456, 5, 'SE');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are missing: column1"
> x@cqlsh:xuser> INSERT INTO employee (emp_id , emp_city , emp_name , emp_phone 
> , emp_sal ,profile,column1) VALUES ( 1, 'ggn', 'john', 123456, 5, 
> 'SE',null);
> x@cqlsh:xuser> select * from employee;
>  emp_id | emp_city | emp_name | emp_phone | emp_sal
> +--+--+---+-
> (0 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14564) issue while adding a column in a table with compact storage

2019-01-02 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732694#comment-16732694
 ] 

Vincent White commented on CASSANDRA-14564:
---

Looks like the behaviour your seeing is because Cassandra's running with 
assertions turned off (It's recommended they are left on even in production 
environments for C*). Because non-dense compact storage columns are implemented 
as static columns in the new engine, with assertions turned on you would have 
seen an assertionError from {{org/apache/cassandra/db/CompactTables.java:67}} 
when trying to add a non-static column. I've had a quick a look and haven't 
seen any reason why we couldn't add new static columns to such a table. Haven't 
had a chance to run the full test suite but here's a patch to transparently 
make the new column static. I figure it's ok to keep that transparent to the 
user rather than requiring them to add the {{static}} keyword since these 
columns aren't listed as static anywhere else.  
||C* 3.11||
|[PoC 
Patch|https://github.com/vincewhite/cassandra/commit/88d432f349fdd49517352987b587dbf1354fcdd8]|

> issue while adding a column in a table with compact storage   
> --
>
> Key: CASSANDRA-14564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, CQL
>Reporter: Laxmikant Upadhyay
>Priority: Major
>
> I have upgraded my system from cassandra 2.1.16 to 3.11.2. We had some tables 
> with COMPACT STORAGE enabled. We see some weird   behaviour of cassandra 
> while adding a column into it.
> Cassandra does not give any error while altering  however the added column is 
> invisible. 
> Same behaviour when we create a new table with compact storage and try to 
> alter it. Below is the commands ran in sequence: 
>  
> {code:java}
> x@cqlsh:xuser> CREATE TABLE xuser.employee(emp_id int PRIMARY KEY,emp_name 
> text, emp_city text, emp_sal varint, emp_phone varint ) WITH  COMPACT STORAGE;
> x@cqlsh:xuser> desc table xuser.employee ;
> CREATE TABLE xuser.employee (
> emp_id int PRIMARY KEY,
> emp_city text,
> emp_name text,
> emp_phone varint,
> emp_sal varint
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';{code}
> Now altering the table by adding a new column:
>   
> {code:java}
> x@cqlsh:xuser>  alter table employee add profile text;
> x@cqlsh:xuser> desc table xuser.employee ;
> CREATE TABLE xuser.employee (
> emp_id int PRIMARY KEY,
> emp_city text,
> emp_name text,
> emp_phone varint,
> emp_sal varint
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> notice that above desc table result does not have newly added column profile. 
> However when i try to add it again it gives column already exist;
> {code:java}
> x@cqlsh:xuser>  alter table employee add profile text;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> column name profile because it conflicts with an existing column"
> x@cqlsh:xuser> select emp_name,profile from employee;
>  emp_name | profile
> --+-
> (0 rows)
> x@cqlsh:xuser>
> {code}
> Inserting also behaves strange:
> {code:java}
> x@cqlsh:xuser> INSERT INTO employee (emp_id , emp_city , emp_name , emp_phone 
> , emp_sal ,profile) VALUES ( 1, 'ggn', 'john', 123456, 5, 'SE');
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are 

[jira] [Comment Edited] (CASSANDRA-14192) netstats information mismatch between senders and receivers

2018-12-02 Thread Vincent White (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344515#comment-16344515
 ] 

Vincent White edited comment on CASSANDRA-14192 at 12/3/18 4:58 AM:


This is because we now use RangeAwareSSTableWriter to write out the incoming 
streams to disk. Its getFilename method returns just the keyspace/table rather 
than a complete filename (since it can write out more than one file during it's 
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo 
which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information 
again. I'll update this ticket when I have something useful.


was (Author: vincentwhite):
This is because we now use RangeAwareSSTableWriter to write out the incoming 
streams to disk. Its getFilename method returns just the keyspace/table rather 
than a complete filename (since it can write out more than one file during it's 
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo 
which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information 
again. I'll update this ticket when I have someone useful.

> netstats information mismatch between senders and receivers
> ---
>
> Key: CASSANDRA-14192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14192
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability
>Reporter: Jonathan Ballet
>Assignee: Vincent White
>Priority: Minor
>
> When adding a new node to an existing cluster, the {{netstats}} command 
> called while the node is joining show different statistic values between the 
> node receiving the data and the nodes sending the data.
> Receiving node:
> {code}
> Mode: JOINING
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.13.184
> /172.20.30.7
> Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 
> GiB total
> [...]
> /172.20.40.128
> /172.20.16.45
> Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 
> GiB total
> [...]
> /172.20.9.63
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 0  0 0
> Small messages  n/a 0  11121 0
> Gossip messages n/a 0  32690 0
> {code}
> Sending node 1:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.21.19
> Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB 
> total
> [...]
> Read Repair Statistics:
> Attempted: 680832
> Mismatch (Blocking): 716
> Mismatch (Background): 279
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2 123307 4
> Small messages  n/a 2  637010302   509
> Gossip messages n/a23 798851 11535
> {code}
> Sending node 2:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.21.19
> Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB 
> total
> [...]
> Read Repair Statistics:
> Attempted: 84967
> Mismatch (Blocking): 17568
> Mismatch (Background): 3078
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2  17818 2
> Small messages  n/a 2  126082304   507
> Gossip messages n/a34 202810 11725
> {code}
> In this case, the join process is running since a while and the sending nodes 
> seem to say they sent everything already. This output stays the same for a 
> while though (maybe ~15% of the total joining time).
> However, the receiving node values stay like this once the sending nodes have 
> sent everything, until it goes from this state to the {{NORMAL}} state (so 
> there's visually no catching up from ~86 files to ~405 files for example, it 
> goes directly from the state showed above to {{NORMAL}})
> This makes tracking the progress of the join process a bit more difficult 
> than needed, because we need to compare and deduce the actual state from both 
> the receiving node values and the sending nodes values, which are both "not 
> correct" (sending nodes say everything has been sent but stays in this state 
> for a long time, receiving node says it still needs to download lot of 
> files/data before finishi

[jira] [Created] (CASSANDRA-14559) Check for endpoint collision with hibernating nodes

2018-07-05 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14559:
-

 Summary: Check for endpoint collision with hibernating nodes 
 Key: CASSANDRA-14559
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14559
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White


I ran across an edge case when replacing a node with the same address. This 
issue results in the node(and its tokens) being unsafely removed from gossip.

Steps to replicate:

1. Create 3 node cluster.
2. Stop a node
3. Replace the stopped node with a node using the same address using the 
replace_address flag
4. Stop the node before it finishes bootstrapping
5. Remove the replace_address flag and restart the node to resume bootstrapping 
(if the data dir is also cleared at this point the node will also generate new 
tokens when it starts)
6. Stop the node before it finishes bootstrapping again
7. 30 Seconds later the node will be removed from gossip because it now matches 
the check for a FatClient

I think this is only an issue when replacing a node with the same address 
because other replacements now use STATUS_BOOTSTRAPPING_REPLACE and leave the 
dead node unchanged.

I believe the simplest fix for this is to add a check that prevents a 
non-bootstrapped node (without the replaces_address flag) starting if there is 
a gossip entry for the same address in the hibernate state. 

[3.11 PoC 
|https://github.com/apache/cassandra/compare/trunk...vincewhite:check_for_hibernate_on_start]


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-9652) Nodetool cleanup does not work for nodes taken out of replication

2018-07-03 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White resolved CASSANDRA-9652.
--
Resolution: Fixed

> Nodetool cleanup does not work for nodes taken out of replication
> -
>
> Key: CASSANDRA-9652
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9652
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Erick Ramirez
>Priority: Minor
> Fix For: 3.11.x
>
>
> After taking a node (DC) out of replication, running a cleanup does not get 
> rid of the data on the node. The SSTables remain on disk and no data is 
> cleared out.
> The following entry is recorded in {{system.log}}:
> {noformat}
>  INFO [CompactionExecutor:8] 2015-06-25 12:33:01,417 CompactionManager.java 
> (line 527) Cleanup cannot run before a node has joined the ring
> {noformat}
> *STEPS TO REPRODUCE*
> # Build a (C* 2.0.10) cluster with multiple DCs.
> # Run {{cassandra-stress -n1}}  to create schema.
> # Alter schema to replicate to all DCs.
> {noformat}
> cqlsh> ALTER KEYSPACE "Keyspace1" WITH replication = { 'class' : 
> 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 2, 'DC3' : 1 } ;
> {noformat}
> # Run {{cassandra-stress -n10}} to generate data.
> # Alter schema to stop replication to {{DC3}}.
> # On node in {{DC3}}, run {{nodetool cleanup}}.
> *WORKAROUND*
> # Stop Cassandra.
> # Manually delete the SSTables on disk.
> # Start Cassandra.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14477) The check of num_tokens against the length of inital_token in the yaml triggers unexpectedly

2018-05-30 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14477:
--
Status: Patch Available  (was: Open)

> The check of num_tokens against the length of inital_token in the yaml 
> triggers unexpectedly
> 
>
> Key: CASSANDRA-14477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14477
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Priority: Minor
>
> In CASSANDRA-10120 we added a check that compares num_tokens against the 
> number of tokens supplied in the yaml via initial_token. From my reading of 
> CASSANDRA-10120 it was to prevent cassandra starting if the yaml contained 
> contradictory values for num_tokens and initial_tokens which should help 
> prevent misconfiguration via human error. The current behaviour appears to 
> differ slightly in that it performs this comparison regardless of whether 
> num_tokens is included in the yaml or not. Below are proposed patches to only 
> perform the check if both options are present in the yaml.
> ||Branch||
> |[3.0.x|https://github.com/apache/cassandra/compare/cassandra-3.0...vincewhite:num_tokens_30]|
> |[3.x|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:num_tokens_test_1_311]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14477) The check of num_tokens against the length of inital_token in the yaml triggers unexpectedly

2018-05-30 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14477:
-

 Summary: The check of num_tokens against the length of 
inital_token in the yaml triggers unexpectedly
 Key: CASSANDRA-14477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14477
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White


In CASSANDRA-10120 we added a check that compares num_tokens against the number 
of tokens supplied in the yaml via initial_token. From my reading of 
CASSANDRA-10120 it was to prevent cassandra starting if the yaml contained 
contradictory values for num_tokens and initial_tokens which should help 
prevent misconfiguration via human error. The current behaviour appears to 
differ slightly in that it performs this comparison regardless of whether 
num_tokens is included in the yaml or not. Below are proposed patches to only 
perform the check if both options are present in the yaml.
||Branch||
|[3.0.x|https://github.com/apache/cassandra/compare/cassandra-3.0...vincewhite:num_tokens_30]|
|[3.x|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:num_tokens_test_1_311]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14477) The check of num_tokens against the length of inital_token in the yaml triggers unexpectedly

2018-05-30 Thread Vincent White (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White reassigned CASSANDRA-14477:
-

Assignee: Vincent White

> The check of num_tokens against the length of inital_token in the yaml 
> triggers unexpectedly
> 
>
> Key: CASSANDRA-14477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14477
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>
> In CASSANDRA-10120 we added a check that compares num_tokens against the 
> number of tokens supplied in the yaml via initial_token. From my reading of 
> CASSANDRA-10120 it was to prevent cassandra starting if the yaml contained 
> contradictory values for num_tokens and initial_tokens which should help 
> prevent misconfiguration via human error. The current behaviour appears to 
> differ slightly in that it performs this comparison regardless of whether 
> num_tokens is included in the yaml or not. Below are proposed patches to only 
> perform the check if both options are present in the yaml.
> ||Branch||
> |[3.0.x|https://github.com/apache/cassandra/compare/cassandra-3.0...vincewhite:num_tokens_30]|
> |[3.x|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:num_tokens_test_1_311]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2018-05-22 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486779#comment-16486779
 ] 

Vincent White edited comment on CASSANDRA-14073 at 5/23/18 6:31 AM:


I've realised I overlooked an issue with the purpose of this ticket. Closing 
and creating a follow up for a more correct/complete fix. CASSANDRA-14463



was (Author: vincentwhite):
I've realised I overlooked an issue with the purpose of this ticket. Since 
Closing and creating a follow up for a more correct/complete fix. 
CASSANDRA-14463


> Prevent replacement nodes from skipping bootstrapping without 
> allow_unsafe_replace:true
> ---
>
> Key: CASSANDRA-14073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>  Labels: bootstrap, patch
>
> I've noticed that when replacing a node with replace_address it can skip 
> bootstrapping if it is listed in its own seed list. This probably shouldn't 
> be allowed without the allow_unsafe_replace option set to true as is required 
> when using auto_bootstrap: false in combination with replace_address. Patch 
> [here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
>  and an attempt at a dtest 
> [here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2018-05-22 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486779#comment-16486779
 ] 

Vincent White edited comment on CASSANDRA-14073 at 5/23/18 6:29 AM:


I've realised I overlooked an issue with the purpose of this ticket. Since 
Closing and creating a follow up for a more correct/complete fix. 
CASSANDRA-14463



was (Author: vincentwhite):
I've realised I overlooked an issue with the purpose of this ticket. Since 
Closing and creating a follow up for a more correct/complete fix. 
CASSANDRA-14073.

> Prevent replacement nodes from skipping bootstrapping without 
> allow_unsafe_replace:true
> ---
>
> Key: CASSANDRA-14073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>  Labels: bootstrap, patch
>
> I've noticed that when replacing a node with replace_address it can skip 
> bootstrapping if it is listed in its own seed list. This probably shouldn't 
> be allowed without the allow_unsafe_replace option set to true as is required 
> when using auto_bootstrap: false in combination with replace_address. Patch 
> [here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
>  and an attempt at a dtest 
> [here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2018-05-22 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14073:
--
   Resolution: Incomplete
Fix Version/s: (was: 3.11.x)
   Status: Resolved  (was: Patch Available)

> Prevent replacement nodes from skipping bootstrapping without 
> allow_unsafe_replace:true
> ---
>
> Key: CASSANDRA-14073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>  Labels: bootstrap, patch
>
> I've noticed that when replacing a node with replace_address it can skip 
> bootstrapping if it is listed in its own seed list. This probably shouldn't 
> be allowed without the allow_unsafe_replace option set to true as is required 
> when using auto_bootstrap: false in combination with replace_address. Patch 
> [here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
>  and an attempt at a dtest 
> [here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14463) Prevent the generation of new tokens when using replace_address flag

2018-05-22 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14463:
--
Description: 
This is a follow up to/replacement of CASSANDRA-14073.

The behaviour that I want to avoid is someone trying to replace a node with the 
replace_address flag and mistakenly having that node listed in its own seed 
list which causes the node to generate a new set of random tokens before 
joining the ring. 

Currently anytime an unbootstrapped node is listed in its own seed list and 
initial_token isn't set in the yaml, Cassandra will generate a new set of 
random tokens and join the ring regardless of whether it was replacing a 
previous node or not. 

We could simply check for this configuration and refuse to start but I it's 
probably better (particularly for 3.0.X) if it's handled in the same manner as 
skipping streaming with the allow_unsafe_replace flag that was introduced in 
3.X . This would still allow 3.0.X users the ability to re-bootstrap nodes 
without needing to re-stream all the data to the node again, which can be 
useful. 

We currently handle replacing without streaming different;y between 3.0.X and 
3.X. In 3.X we have the allow_unsafe_replace JVM flag to allow the use of 
auto_bootstrap: false in combination with the replace_address option.  But in 
3.0.X to perform the replacement of a node with the same IP address without 
streaming I believe you need to:
 * Set replace_address (because the address is already in gossip)
 * Include the node in its own seed list (to skip bootstrapping/streaming)
 * Set the initial_token to the token/s owned by the previous node (to prevent 
it generating new tokens.

I believe if 3.0.X simply refused to start when a node has itself in its seed 
list and replace_address set this will completely block this operation. 

Example patches to fix this edge case using allow_unsafe_replace:

 
||Branch||
|[3.0.x|https://github.com/apache/cassandra/compare/trunk...vincewhite:30-no_clobber]|
|[3.x|https://github.com/apache/cassandra/compare/trunk...vincewhite:311-no_clobber]|

  was:
This is a follow up to/replacement of CASSANDRA-14073.

The behaviour that I want to avoid is someone trying to replace a node with the 
replace_address flag and mistakenly having that node listed in its own seed 
list which causes the node to generate a new set of random tokens before 
joining the ring. 

Currently anytime an unbootstrapped node is listed in its own seed list and 
initial_token isn't set in the yaml, Cassandra will generate a new set of 
random tokens and join the ring regardless of whether it was replacing a 
previous node or not. 

We could simply check for this configuration and refuse to start but I it's 
probably better (particularly for 3.0.X) if it's handled in the same manner as 
skipping streaming with the allow_unsafe_replace flag that was introduced in 
3.X . This would still allow 3.0.X users the ability to re-bootstrap nodes 
without needing to re-stream all the data to the node again, which can be 
useful. 

We currently handle replacing without streaming different;y between 3.0.X and 
3.X. In 3.X we have the allow_unsafe_replace JVM flag to allow the use of 
auto_bootstrap: false in combination with the replace_address option.  But in 
3.0.X to perform the replacement of a node with the same IP address without 
streaming I believe you need to:
 * Set replace_address (because the address is already in gossip)
 * Include the node in its own seed list (to skip bootstrapping/streaming)
 * Set the initial_token to the token/s owned by the previous node (to prevent 
it generating new tokens.

I believe if 3.0.X simply refused to start when a node has itself in its seed 
list and replace_address set this will completely block this operation. 

Example patches to fix this edge case using allow_unsafe_replace:

 
||Branch||
|[3.0.x\|https://github.com/apache/cassandra/compare/trunk...vincewhite:30-no_clobber]|
|[3.x\|https://github.com/apache/cassandra/compare/trunk...vincewhite:311-no_clobber]|


> Prevent the generation of new tokens when using replace_address flag
> 
>
> Key: CASSANDRA-14463
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14463
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Priority: Minor
>
> This is a follow up to/replacement of CASSANDRA-14073.
> The behaviour that I want to avoid is someone trying to replace a node with 
> the replace_address flag and mistakenly having that node listed in its own 
> seed list which causes the node to generate a new set of random tokens before 
> joining the ring. 
> Currently anytime an unbootstrapped node is listed in its own seed list and 
> initial_token isn't set in the yaml, Cassandra will generate a new set 

[jira] [Commented] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2018-05-22 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486779#comment-16486779
 ] 

Vincent White commented on CASSANDRA-14073:
---

I've realised I overlooked an issue with the purpose of this ticket. Since 
Closing and creating a follow up for a more correct/complete fix. 
CASSANDRA-14073.

> Prevent replacement nodes from skipping bootstrapping without 
> allow_unsafe_replace:true
> ---
>
> Key: CASSANDRA-14073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Minor
>  Labels: bootstrap, patch
> Fix For: 3.11.x
>
>
> I've noticed that when replacing a node with replace_address it can skip 
> bootstrapping if it is listed in its own seed list. This probably shouldn't 
> be allowed without the allow_unsafe_replace option set to true as is required 
> when using auto_bootstrap: false in combination with replace_address. Patch 
> [here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
>  and an attempt at a dtest 
> [here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14463) Prevent the generation of new tokens when using replace_address flag

2018-05-22 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14463:
-

 Summary: Prevent the generation of new tokens when using 
replace_address flag
 Key: CASSANDRA-14463
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14463
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White


This is a follow up to/replacement of CASSANDRA-14073.

The behaviour that I want to avoid is someone trying to replace a node with the 
replace_address flag and mistakenly having that node listed in its own seed 
list which causes the node to generate a new set of random tokens before 
joining the ring. 

Currently anytime an unbootstrapped node is listed in its own seed list and 
initial_token isn't set in the yaml, Cassandra will generate a new set of 
random tokens and join the ring regardless of whether it was replacing a 
previous node or not. 

We could simply check for this configuration and refuse to start but I it's 
probably better (particularly for 3.0.X) if it's handled in the same manner as 
skipping streaming with the allow_unsafe_replace flag that was introduced in 
3.X . This would still allow 3.0.X users the ability to re-bootstrap nodes 
without needing to re-stream all the data to the node again, which can be 
useful. 

We currently handle replacing without streaming different;y between 3.0.X and 
3.X. In 3.X we have the allow_unsafe_replace JVM flag to allow the use of 
auto_bootstrap: false in combination with the replace_address option.  But in 
3.0.X to perform the replacement of a node with the same IP address without 
streaming I believe you need to:
 * Set replace_address (because the address is already in gossip)
 * Include the node in its own seed list (to skip bootstrapping/streaming)
 * Set the initial_token to the token/s owned by the previous node (to prevent 
it generating new tokens.

I believe if 3.0.X simply refused to start when a node has itself in its seed 
list and replace_address set this will completely block this operation. 

Example patches to fix this edge case using allow_unsafe_replace:

 
||Branch||
|[3.0.x\|https://github.com/apache/cassandra/compare/trunk...vincewhite:30-no_clobber]|
|[3.x\|https://github.com/apache/cassandra/compare/trunk...vincewhite:311-no_clobber]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys

2018-04-04 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14365:
-

 Summary: Commit log replay failure for static columns with 
collections in clustering keys
 Key: CASSANDRA-14365
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14365
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vincent White


In the old storage engine, static cells with a collection as part of the 
clustering key fail to validate because a 0 byte collection (like in the cell 
name of a static cell) isn't valid.

To reproduce:

1.
{code:java}
CREATE TABLE test.x (
id int,
id2 frozen>,
st int static,
PRIMARY KEY (id, id2)
);

INSERT INTO test.x (id, st) VALUES (1, 2);
{code}
2.
 Kill the cassandra process

3.
 Restart cassandra to replay the commitlog

Outcome:
{noformat}
ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read a set
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) 
[main/:na]


{noformat}
I haven't investigated if there are other more subtle issues caused by these 
cells failing to validate other places in the code, but I believe the fix for 
this is to check for 0 byte length collections and accept them as valid as we 
do with other types.

I haven't had a chance for any extensive testing but this naive patch seems to 
have the desired affect. [2.2 PoC 
Patch|https://github.com/vincewhite/cassandra/commits/zero_length_collection]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13797) RepairJob blocks on syncTasks

2018-03-01 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383119#comment-16383119
 ] 

Vincent White commented on CASSANDRA-13797:
---

After upgrading an ~18 node, vnode multi-DC cluster from 3.11.0 to 3.11.1 it 
started seeing some nodes running hundreds of concurrent validation 
compactions, rolling back it went back to 1 concurrent validation per CF. I 
haven't had a chance to reproduce it at that scale but my locale testing show 
that if I have enough data, or just add a sleep(999) to validation 
compactions to simulate long validations, they continue to accumulate over a 
few seconds until the repair session has looped through all the common ranges. 



> RepairJob blocks on syncTasks
> -
>
> Key: CASSANDRA-13797
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13797
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 3.0.15, 3.11.1, 4.0
>
>
> The thread running {{RepairJob}} blocks while it waits for the validations it 
> starts to complete ([see 
> here|https://github.com/bdeggleston/cassandra/blob/9fdec0a82851f5c35cd21d02e8c4da8fc685edb2/src/java/org/apache/cassandra/repair/RepairJob.java#L185]).
>  However, the downstream callbacks (ie: the post-repair cleanup stuff) aren't 
> waiting for {{RepairJob#run}} to return, they're waiting for a result to be 
> set on RepairJob the future, which happens after the sync tasks have 
> completed. This post repair cleanup stuff also immediately shuts down the 
> executor {{RepairJob#run}} is running in. So in noop repair sessions, where 
> there's nothing to stream, I'm seeing the callbacks sometimes fire before 
> {{RepairJob#run}} wakes up, and causing an {{InterruptedException}} is thrown.
> I'm pretty sure this can just be removed, but I'd like a second opinion. This 
> appears to just be a holdover from before repair coordination became async. I 
> thought it might be doing some throttling by blocking, but each repair 
> session gets it's own executor, and validation is  throttled by the fixed 
> size executors doing the actual work of validation, so I don't think we need 
> to keep this around.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13797) RepairJob blocks on syncTasks

2018-03-01 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382839#comment-16382839
 ] 

Vincent White commented on CASSANDRA-13797:
---

Now that we don't wait for the validations of each repair job to finish before 
moving onto the next one, I don't see anything to stop the repair coordinator 
from spinning through all the token ranges and effectively triggering all the 
validations tasks at once, which could be a significant amount of validation 
compactions on each node depending on your topology and common ranges for that 
keyspace. I'm also not sure of the overhead of creating all the 
futures/listeners on the coordinator at once in this case. 

In 3 the validation executor thread pool has no size limit so a new validation 
is started as soon as a validation request is received. I admit I haven't 
caught up on the changes to repair in trunk, and while the validation executor 
pool size is configurable in trunk, its default is still Integer.MAX_VALUE.

I understand this same affect (hundreds of concurrent validations) can still 
happen if you trigger a repair across a keyspace with a large number of column 
families but with this change there is no way of avoiding it without using 
subrange repairs on a single column family (if you have a topology/replication 
that cant be merged into a small number of common ranges) .


> RepairJob blocks on syncTasks
> -
>
> Key: CASSANDRA-13797
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13797
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 3.0.15, 3.11.1, 4.0
>
>
> The thread running {{RepairJob}} blocks while it waits for the validations it 
> starts to complete ([see 
> here|https://github.com/bdeggleston/cassandra/blob/9fdec0a82851f5c35cd21d02e8c4da8fc685edb2/src/java/org/apache/cassandra/repair/RepairJob.java#L185]).
>  However, the downstream callbacks (ie: the post-repair cleanup stuff) aren't 
> waiting for {{RepairJob#run}} to return, they're waiting for a result to be 
> set on RepairJob the future, which happens after the sync tasks have 
> completed. This post repair cleanup stuff also immediately shuts down the 
> executor {{RepairJob#run}} is running in. So in noop repair sessions, where 
> there's nothing to stream, I'm seeing the callbacks sometimes fire before 
> {{RepairJob#run}} wakes up, and causing an {{InterruptedException}} is thrown.
> I'm pretty sure this can just be removed, but I'd like a second opinion. This 
> appears to just be a holdover from before repair coordination became async. I 
> thought it might be doing some throttling by blocking, but each repair 
> session gets it's own executor, and validation is  throttled by the fixed 
> size executors doing the actual work of validation, so I don't think we need 
> to keep this around.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14204) Nodetool garbagecollect AssertionError

2018-01-30 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14204:
--
Status: Patch Available  (was: Open)

> Nodetool garbagecollect AssertionError
> --
>
> Key: CASSANDRA-14204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14204
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Priority: Minor
> Fix For: 3.11.x, 4.x
>
>
> When manually running a garbage collection compaction across a table with 
> unrepaired sstables and only_purge_repaired_tombstones set to true an 
> assertion error is thrown. This is because the unrepaired sstables aren't 
> being removed from the transaction as they are filtered out in 
> filterSSTables().
> ||3.11||trunk||
> |[branch|https://github.com/vincewhite/cassandra/commit/e13c822736edd3df3403c02e8ef90816f158cde2]|[branch|https://github.com/vincewhite/cassandra/commit/cc8828576404e72504d9b334be85f84c90e77aa7]|
> The stacktrace:
> {noformat}
> -- StackTrace --
> java.lang.AssertionError
>   at 
> org.apache.cassandra.db.compaction.CompactionManager.parallelAllSSTableOperation(CompactionManager.java:339)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager.performGarbageCollection(CompactionManager.java:476)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.garbageCollect(ColumnFamilyStore.java:1579)
>   at 
> org.apache.cassandra.service.StorageService.garbageCollect(StorageService.java:3069)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>   at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>   at sun.rmi.transport.Transport$1.run(Transport.java:200)
>   at sun.rmi.transport.Transport$1.run(Transport.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}

[jira] [Created] (CASSANDRA-14204) Nodetool garbagecollect AssertionError

2018-01-30 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14204:
-

 Summary: Nodetool garbagecollect AssertionError
 Key: CASSANDRA-14204
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14204
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White
 Fix For: 3.11.x, 4.x


When manually running a garbage collection compaction across a table with 
unrepaired sstables and only_purge_repaired_tombstones set to true an assertion 
error is thrown. This is because the unrepaired sstables aren't being removed 
from the transaction as they are filtered out in filterSSTables().
||3.11||trunk||
|[branch|https://github.com/vincewhite/cassandra/commit/e13c822736edd3df3403c02e8ef90816f158cde2]|[branch|https://github.com/vincewhite/cassandra/commit/cc8828576404e72504d9b334be85f84c90e77aa7]|

The stacktrace:
{noformat}
-- StackTrace --
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.CompactionManager.parallelAllSSTableOperation(CompactionManager.java:339)
at 
org.apache.cassandra.db.compaction.CompactionManager.performGarbageCollection(CompactionManager.java:476)
at 
org.apache.cassandra.db.ColumnFamilyStore.garbageCollect(ColumnFamilyStore.java:1579)
at 
org.apache.cassandra.service.StorageService.garbageCollect(StorageService.java:3069)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14192) netstats information mismatch between senders and receivers

2018-01-29 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344515#comment-16344515
 ] 

Vincent White commented on CASSANDRA-14192:
---

This is because we now use RangeAwareSSTableWriter to write out the incoming 
streams to disk. Its getFilename method returns just the keyspace/table rather 
than a complete filename (since it can write out more than one file during it's 
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo 
which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information 
again. I'll update this ticket when I have someone useful.

> netstats information mismatch between senders and receivers
> ---
>
> Key: CASSANDRA-14192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14192
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability
>Reporter: Jonathan Ballet
>Priority: Minor
>
> When adding a new node to an existing cluster, the {{netstats}} command 
> called while the node is joining show different statistic values between the 
> node receiving the data and the nodes sending the data.
> Receiving node:
> {code}
> Mode: JOINING
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.13.184
> /172.20.30.7
> Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 
> GiB total
> [...]
> /172.20.40.128
> /172.20.16.45
> Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 
> GiB total
> [...]
> /172.20.9.63
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 0  0 0
> Small messages  n/a 0  11121 0
> Gossip messages n/a 0  32690 0
> {code}
> Sending node 1:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.21.19
> Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB 
> total
> [...]
> Read Repair Statistics:
> Attempted: 680832
> Mismatch (Blocking): 716
> Mismatch (Background): 279
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2 123307 4
> Small messages  n/a 2  637010302   509
> Gossip messages n/a23 798851 11535
> {code}
> Sending node 2:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.21.19
> Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB 
> total
> [...]
> Read Repair Statistics:
> Attempted: 84967
> Mismatch (Blocking): 17568
> Mismatch (Background): 3078
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2  17818 2
> Small messages  n/a 2  126082304   507
> Gossip messages n/a34 202810 11725
> {code}
> In this case, the join process is running since a while and the sending nodes 
> seem to say they sent everything already. This output stays the same for a 
> while though (maybe ~15% of the total joining time).
> However, the receiving node values stay like this once the sending nodes have 
> sent everything, until it goes from this state to the {{NORMAL}} state (so 
> there's visually no catching up from ~86 files to ~405 files for example, it 
> goes directly from the state showed above to {{NORMAL}})
> This makes tracking the progress of the join process a bit more difficult 
> than needed, because we need to compare and deduce the actual state from both 
> the receiving node values and the sending nodes values, which are both "not 
> correct" (sending nodes say everything has been sent but stays in this state 
> for a long time, receiving node says it still needs to download lot of 
> files/data before finishing.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14092) Max ttl of 20 years will overflow localDeletionTime

2018-01-29 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343619#comment-16343619
 ] 

Vincent White commented on CASSANDRA-14092:
---

I haven't had a chance to completely catch up on the ML discussion but I 
thought I would post a proof of concept branch I had that promotes 
localDeletionTime to long on trunk. The code definitely isn't meant to be a 
final product but it might be a useful starting point regardless of whether we 
actually decide to go this route or not. It's fairly straight forward and 
unfortunately most of the code went towards undoing optimizations introduced to 
tombstone histograms in CASSANDRA-13444. I put together the majority of this 
last year so I may be forgetting some glaring issues but I believe it was 
basically complete minus a few unit tests to be cleaned up and tools updated.

One outstanding issue I do recall is related to EXPIRED_LIVENESS_TTL which is 
currently Integer.MAX_VALUE but sounds like that should be resolved/removed at 
some point by CASSANDRA-13826.
||trunk||
|[branch|https://github.com/vincewhite/cassandra/commit/364f9ac848ae54eae9a1360d72aad4ba0a2b63a8]|

> Max ttl of 20 years will overflow localDeletionTime
> ---
>
> Key: CASSANDRA-14092
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14092
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Blocker
> Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2
>
>
> CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year 
> 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for 
> {{localDeletionTime}}.
> It turns out that next year the {{localDeletionTime}} will start overflowing 
> with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20 
> years) > Integer.MAX_VALUE}}), so we should remove this limitation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14169) Trivial intellij junit run fix

2018-01-15 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326733#comment-16326733
 ] 

Vincent White commented on CASSANDRA-14169:
---

I wonder if we should include the rest of the parameters that are normally 
included by build.xml e.g. 

{code:java}
 
 
 
 {code}


I don't know if it should be its own ticket, but I also noticed this the 
exception message isn't particularly helpful since it outputs the wrong variable

{code: title=org.apache.cassandra.io.sstable.LegacySSTableTest#defineSchema | 
java}
String scp = System.getProperty(LEGACY_SSTABLE_PROP);
Assert.assertNotNull("System property " + LEGACY_SSTABLE_ROOT + " not 
set", scp);
{code}

I believe it is meant to be:

{code: title=org.apache.cassandra.io.sstable.LegacySSTableTest#defineSchema | 
java}
String scp = System.getProperty(LEGACY_SSTABLE_PROP);
Assert.assertNotNull("System property " + LEGACY_SSTABLE_PROP + " not 
set", scp);
{code}

> Trivial intellij junit run fix
> --
>
> Key: CASSANDRA-14169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14169
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Trivial
>
> Unable to run 
> {{[LegacySSTableTest|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java#L63]}}
>  in the Intellij, because the 
> {{[legacy-sstable-root|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java#L96]}}
>  is not defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14132) sstablemetadata incorrect date string for "EncodingStats minLocalDeletionTime:"

2017-12-20 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14132:
--
Fix Version/s: 4.x
   Status: Patch Available  (was: Open)

> sstablemetadata incorrect date string for "EncodingStats 
> minLocalDeletionTime:"
> ---
>
> Key: CASSANDRA-14132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14132
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Priority: Trivial
> Fix For: 4.x
>
>
> There is a unit mismatch in the outputing of EncodingStats 
> minLocalDeletionTime. EncodingStats.minLocalDeletionTime is stored in seconds 
> but is being interpenetrated as milliseconds when converted to a date string. 
> Patch: 
> [Trunk|https://github.com/vincewhite/cassandra/commit/fa9ef1dede3067dffb65042ed4bdca08de042a0e]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14132) sstablemetadata incorrect date string for "EncodingStats minLocalDeletionTime:"

2017-12-20 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14132:
-

 Summary: sstablemetadata incorrect date string for "EncodingStats 
minLocalDeletionTime:"
 Key: CASSANDRA-14132
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14132
 Project: Cassandra
  Issue Type: Bug
Reporter: Vincent White
Priority: Trivial


There is a unit mismatch in the outputing of EncodingStats 
minLocalDeletionTime. EncodingStats.minLocalDeletionTime is stored in seconds 
but is being interpenetrated as milliseconds when converted to a date string. 

Patch: 
[Trunk|https://github.com/vincewhite/cassandra/commit/fa9ef1dede3067dffb65042ed4bdca08de042a0e]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14099) LCS ordering of sstables by timestamp is inverted

2017-12-12 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14099:
--
Status: Patch Available  (was: Open)

I've created a patch that splits in this comparator in two as a way to maybe 
help avoid this confusion in the future. Now that this is split into two I'm 
not sure if a unit test for ageSortedSSTables (or the comparators themselves) 
would be required? I have included a unit test for ageSortedSSTables on my 
3.0.x branch, not sure if it's worth making ageSortedSStable() Public just for 
this but I didn't see anywhere else where its behaviour was visible.

[3.0 patch | 
https://github.com/vincewhite/cassandra/commits/14099_timestamp_comparators_30]
[3.0 utest | 
https://github.com/vincewhite/cassandra/commit/5ab1ff36a28b41039bd93de7d47b4131e1c2dfaa]
[3.x patch | 
https://github.com/vincewhite/cassandra/commits/14099_timestamp_comparators_311]
[trunk patch | 
https://github.com/vincewhite/cassandra/commits/14099_timestamp_comparators_trunk]



> LCS ordering of sstables by timestamp is inverted
> -
>
> Key: CASSANDRA-14099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jeff Jirsa
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In CASSANDRA-14010 we discovered that CASSANDRA-13776 broke sstable ordering 
> by timestamp (inverted it accidentally). Investigating that revealed that the 
> comparator was expecting newest-to-oldest for read command, but LCS expects 
> oldest-to-newest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14027) consistent_range_movement_false_with_rf1_should_succeed_test - bootstrap_test.TestBootstrap

2017-12-05 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279587#comment-16279587
 ] 

Vincent White commented on CASSANDRA-14027:
---

I had this test running against trunk for a few hours and didn't see any 
failures.

> consistent_range_movement_false_with_rf1_should_succeed_test - 
> bootstrap_test.TestBootstrap
> ---
>
> Key: CASSANDRA-14027
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14027
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>
> 15 Nov 2017 11:22:19 [node2] Timed out waiting for 
> /tmp/dtest-3SeOAb/test/node2/logs/system.log to be created.
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-3SeOAb
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/bootstrap_test.py", line 239, in 
> consistent_range_movement_false_with_rf1_should_succeed_test
> self._bootstrap_test_with_replica_down(False, rf=1)
>   File "/home/cassandra/cassandra-dtest/bootstrap_test.py", line 261, in 
> _bootstrap_test_with_replica_down
> cluster.start()
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 415, in start
> node.watch_log_for(start_message, timeout=kwargs.get('timeout',60), 
> process=p, verbose=verbose, from_mark=mark)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 449, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Timed out waiting for {} to be created.".format(log_file))
> "15 Nov 2017 11:22:19 [node2] Timed out waiting for 
> /tmp/dtest-3SeOAb/test/node2/logs/system.log to be 
> created.\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-3SeOAb\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2017-11-30 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273709#comment-16273709
 ] 

Vincent White commented on CASSANDRA-14013:
---

I've create a patch for 3.0.x and trunk using the same method. I guess it 
should be safe to work with just absolute paths rather than canonical paths 
here, I haven't made that change on the 3.x.x patches yet. I also had to fiddle 
with the unit tests since there is now a dependancy on DatabaseDescriptor and 
passing in file paths that exist in the configured data directory.

[3.0.x|https://github.com/vincewhite/cassandra/commits/14013-30]
[3.11.x|https://github.com/vincewhite/cassandra/commits/14013-test]
[trunk|https://github.com/vincewhite/cassandra/commits/14013-trunk]



> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Gregor Uhlenheuer
>Assignee: Vincent White
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14047) test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: Missing: ['127.0.0.3.* now UP']:

2017-11-29 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14047:
--
Attachment: trunk-patch_passes_test-debug.log
3_11-debug.log
trunk-debug.log
trunk-debug.log-2

On 3.11 I saw still see the {{UnknownColumnFamilyException}}. Which does cause 
the test to fail because it triggers the "Unexpected error in log" assertion 
error when tearing down the test. Strangely the test passes and doesn't hit 
this on trunk with my patch even though the logs still contains the 
UnknownColumnFamilyException (not sure if thats related to C* version specific 
config in dtests or something). 

 So the netty issue is unrelated to the flakiness of this test, not sure if it 
should have its own ticket? I've attached a few sets of debug logs that 
demonstrate the various behaviours with/without netty and with/without my patch 
from the previous comment. 

In regard to the test itself. It appears that the reads that are triggering the 
{{UnknownColumnFamilyException}} are actually from the initialisation of 
CassandraRoleManager since they are for {{system_auth.roles}} (I believe 
{{hasExistingRoles()}} in {{setupDefaultRole()}}), I'm not exactly sure what 
the best way to resolve this is. This error isn't an issue for the role manager 
itself as it will simply retry later and it doesn't affect the tests apart from 
triggering the unexpected error in log. For the tests I guess we could leave a 
gap between starting nodes. But it's probably more correct to just ignore these 
errors. I've tested that 
[https://github.com/vincewhite/cassandra-dtest/commit/7e48704713123a253a914802975f7163474ede9b]
 this resolves the failures and I assume it's probably safe to ignore this 
error for all of the tests in consistency_test but I haven't looked into that 
at this stage. 

Also these tests don't do anything fancy in regard to how they start the 
cluster, they just use the normal {{cluster.start(wait_for_binary_proto=True, 
wait_other_notice=True)}} call so I guess this could causes random failures in 
a lot of tests.

> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: 
> Missing: ['127.0.0.3.* now UP']:
> --
>
> Key: CASSANDRA-14047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14047
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Vincent White
> Attachments: 3_11-debug.log, trunk-debug.log, trunk-debug.log-2, 
> trunk-patch_passes_test-debug.log
>
>
> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: 
> Missing: ['127.0.0.3.* now UP']:
> 15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:
> INFO  [main] 2017-11-15 11:21:32,452 YamlConfigura.
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-v3VgyS
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing single dc, users, each quorum reads
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/tools/decorators.py", line 48, in 
> wrapped
> f(obj)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 621, in 
> test_simple_strategy_each_quorum_users
> 
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_users, 
> [self.nodes], [self.rf], combinations)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
> self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in 
> _start_cluster
> cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 428, in start
> node.watch_log_for_alive(other_node, from_mark=mark)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 520, in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", 

[jira] [Updated] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2017-11-28 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14013:
--
Status: Awaiting Feedback  (was: In Progress)

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Gregor Uhlenheuer
>Assignee: Vincent White
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14047) test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: Missing: ['127.0.0.3.* now UP']:

2017-11-28 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268714#comment-16268714
 ] 

Vincent White commented on CASSANDRA-14047:
---

While looking into this the only error I came across was related to joining 
nodes begin sent read messages for column families they didn't know about yet. 
Which look like ...


{noformat}
WARN  [MessagingService-NettyInbound-Thread-3-1] 2017-11-28 12:40:41,123 
MessageInHandler.java:273 - Got message from unknown table while reading from 
socket; closing
org.apache.cassandra.exceptions.UnknownTableException: Couldn't find table with 
id 5bc52802-de25-35ed-aeab-188eecebb090. If a table was just created, this is 
likely due to the sch
emanot being fully propagated.  Please wait for schema agreement on table 
creation.
at 
org.apache.cassandra.schema.Schema.getExistingTableMetadata(Schema.java:474) 
~[main/:na]
at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:670)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:606)
 ~[main/:na]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:139) 
~[main/:na]
at 
org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:178)
 ~[main/:na]
at 
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]

{noformat}

...followed by 
{noformat}
WARN  [MessagingService-NettyInbound-Thread-3-1] 2017-11-28 12:40:41,193 
MessageInHandler.java:277 - Unexpected exception caught in inbound channel 
pipeline from /127.0.0.1:41522
java.lang.ArrayIndexOutOfBoundsException: 90
at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:657)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:606)
 ~[main/:na]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:139) 
~[main/:na]
at 
org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:178)
 ~[main/:na]
at 
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
 [netty-all-4.1.14.Final.jar:4.1.14.Final]
{noformat}

...as C* appears to get stuck (at least for a good amount of time) in a loop 
where decode() is called (after the initial exception), ReadCommand() throws an 
exception(due to the read index into the buffer being left in an incorrect 
state following the initial exception) each time which again calls ctx.close() 
on the channel before this loop continues. 

I probably don't have enough netty knowledge to propose a good fix for this but 
[this|https://github.com/apache/cassandra/compare/trunk...vincewhite:debug-1] 
appeared to help, though I didn't get time to look too deeply into the new 
behaviour. Maybe someone like [~jasobrown] would have a more correct fix. 

I suspect this is probably affects most of these consistency tests. 

> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: 
> Missing: ['127.0.0.3.* now UP']:
> --
>
> Key: CASSANDRA-14047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14047
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Vincent White
>
> test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: 
> Missing: ['127.0.0.3.* now UP']:
> 15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:
> INFO  [main] 2017-11-15 11:21:32,452 YamlConfigura.
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-v3VgyS
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing single dc, users, each quorum reads
> - >> end captured logging << -
>   F

[jira] [Updated] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2017-11-27 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14073:
--
Fix Version/s: 3.11.2
   Status: Patch Available  (was: Open)

> Prevent replacement nodes from skipping bootstrapping without 
> allow_unsafe_replace:true
> ---
>
> Key: CASSANDRA-14073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Vincent White
>Priority: Minor
>  Labels: patch
> Fix For: 3.11.2
>
>
> I've noticed that when replacing a node with replace_address it can skip 
> bootstrapping if it is listed in its own seed list. This probably shouldn't 
> be allowed without the allow_unsafe_replace option set to true as is required 
> when using auto_bootstrap: false in combination with replace_address. Patch 
> [here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
>  and an attempt at a dtest 
> [here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2017-11-27 Thread Vincent White (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent White updated CASSANDRA-14073:
--
Since Version:   (was: 2.1.0)

> Prevent replacement nodes from skipping bootstrapping without 
> allow_unsafe_replace:true
> ---
>
> Key: CASSANDRA-14073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Vincent White
>Priority: Minor
>  Labels: patch
>
> I've noticed that when replacing a node with replace_address it can skip 
> bootstrapping if it is listed in its own seed list. This probably shouldn't 
> be allowed without the allow_unsafe_replace option set to true as is required 
> when using auto_bootstrap: false in combination with replace_address. Patch 
> [here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
>  and an attempt at a dtest 
> [here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14073) Prevent replacement nodes from skipping bootstrapping without allow_unsafe_replace:true

2017-11-27 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14073:
-

 Summary: Prevent replacement nodes from skipping bootstrapping 
without allow_unsafe_replace:true
 Key: CASSANDRA-14073
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14073
 Project: Cassandra
  Issue Type: Bug
  Components: Configuration
Reporter: Vincent White
Priority: Minor


I've noticed that when replacing a node with replace_address it can skip 
bootstrapping if it is listed in its own seed list. This probably shouldn't be 
allowed without the allow_unsafe_replace option set to true as is required when 
using auto_bootstrap: false in combination with replace_address. Patch 
[here|https://github.com/vincewhite/cassandra/commits/replace_address_seed_list]
 and an attempt at a dtest 
[here|https://github.com/vincewhite/cassandra-dtest/commits/unsafe_replace]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14063) Cassandra will start listening for clients without initialising system_auth after a failed bootstrap

2017-11-20 Thread Vincent White (JIRA)
Vincent White created CASSANDRA-14063:
-

 Summary: Cassandra will start listening for clients without 
initialising system_auth after a failed bootstrap
 Key: CASSANDRA-14063
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14063
 Project: Cassandra
  Issue Type: Bug
  Components: Auth
Reporter: Vincent White
Priority: Minor


This issue is closely related to CASSANDRA-11381. In this case, when a node 
joining the ring fails to complete the bootstrapping process with a streaming 
failure it will still always call 
org.apache.cassandra.service.CassandraDaemon#start and begin listening for 
client connections. If no authentication is configured clients are able to 
connect to the node and query the cluster much like write survey mode. But if 
authentication is enabled then it will cause an NPE because 
org.apache.cassandra.service.StorageService#doAuthSetup is only called after 
successfully completing the bootstrapping process. With the changes made in 
CASSANDRA-11381 we could now simply call doAuthSetup earlier since we don't 
have to worry about calling it multiple times but reading some of the concerns 
related to third party authentication classes, and now that "Incremental 
Bootstrapping" as described in CASSANDRA-8494 and CASSANDRA-8943, don't appear 
to be nearing implementation any time soon I would probably prefer that 
bootstrapping nodes simply didn't start listening for clients following a 
failed bootstrapping attempt. 

I've attached a quick and naive patch that demonstrates my desired behaviour. I 
ended up creating a new variable for this for clarity but I also had a bit of 
trouble finding already existing information that wasn't tied up in more 
complicated or transient processes that I could use to determine this 
particular state. I believe 
org.apache.cassandra.service.StorageService#isAuthSetupComplete would also work 
in this case so we could tie it to that instead. If someone has something 
simpler or knows the correct place I should be querying for this state from, I 
welcome all feedback. 

This [patch|https://github.com/vincewhite/cassandra/commits/system_auth_npe] 
also doesn't really address enabling/disabling thrift/binary via nodetool once 
the node is running. I wasn't sure if we should disallow it completely or 
include a force flag.


Cheers
-Vince



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14010) NullPointerException when creating keyspace

2017-11-16 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256229#comment-16256229
 ] 

Vincent White commented on CASSANDRA-14010:
---

I had a look at this and assumed that 
org.apache.cassandra.schema.SchemaKeyspace#fetchKeyspaceParams() was just not 
getting any rows returned when it queried the system_schema.keyspaces table. In 
fact in my testing it was getting a row returned but it only contained the 
primary key and null's for both other columns. I didn't dig too deep into this 
but it doesn't seem like it should happen, it's probably worth someone with a 
more intimate knowledge of the read path taking a look. Also on my machine I 
could only trigger this exception with multiple clients looping CREATE/DROP 
commands and it was still relatively rare.  


> NullPointerException when creating keyspace
> ---
>
> Key: CASSANDRA-14010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14010
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Pellby
>
> We have a test environment were we drop and create keyspaces and tables 
> several times within a short time frame. Since upgrading from 3.11.0 to 
> 3.11.1, we are seeing a lot of create statements failing. See the logs below:
> {code:java}
> 2017-11-13T14:29:20.037986449Z WARN Directory /tmp/ramdisk/commitlog doesn't 
> exist
> 2017-11-13T14:29:20.038009590Z WARN Directory /tmp/ramdisk/saved_caches 
> doesn't exist
> 2017-11-13T14:29:20.094337265Z INFO Initialized prepared statement caches 
> with 10 MB (native) and 10 MB (Thrift)
> 2017-11-13T14:29:20.805946340Z INFO Initializing system.IndexInfo
> 2017-11-13T14:29:21.934686905Z INFO Initializing system.batches
> 2017-11-13T14:29:21.973914733Z INFO Initializing system.paxos
> 2017-11-13T14:29:21.994550268Z INFO Initializing system.local
> 2017-11-13T14:29:22.014097194Z INFO Initializing system.peers
> 2017-11-13T14:29:22.124211254Z INFO Initializing system.peer_events
> 2017-11-13T14:29:22.153966833Z INFO Initializing system.range_xfers
> 2017-11-13T14:29:22.174097334Z INFO Initializing system.compaction_history
> 2017-11-13T14:29:22.194259920Z INFO Initializing system.sstable_activity
> 2017-11-13T14:29:22.210178271Z INFO Initializing system.size_estimates
> 2017-11-13T14:29:22.223836992Z INFO Initializing system.available_ranges
> 2017-11-13T14:29:22.237854207Z INFO Initializing system.transferred_ranges
> 2017-11-13T14:29:22.253995621Z INFO Initializing 
> system.views_builds_in_progress
> 2017-11-13T14:29:22.264052481Z INFO Initializing system.built_views
> 2017-11-13T14:29:22.283334779Z INFO Initializing system.hints
> 2017-11-13T14:29:22.304110311Z INFO Initializing system.batchlog
> 2017-11-13T14:29:22.318031950Z INFO Initializing system.prepared_statements
> 2017-11-13T14:29:22.326547917Z INFO Initializing system.schema_keyspaces
> 2017-11-13T14:29:22.337097407Z INFO Initializing system.schema_columnfamilies
> 2017-11-13T14:29:22.354082675Z INFO Initializing system.schema_columns
> 2017-11-13T14:29:22.384179063Z INFO Initializing system.schema_triggers
> 2017-11-13T14:29:22.394222027Z INFO Initializing system.schema_usertypes
> 2017-11-13T14:29:22.414199833Z INFO Initializing system.schema_functions
> 2017-11-13T14:29:22.427205182Z INFO Initializing system.schema_aggregates
> 2017-11-13T14:29:22.427228345Z INFO Not submitting build tasks for views in 
> keyspace system as storage service is not initialized
> 2017-11-13T14:29:22.652838866Z INFO Scheduling approximate time-check task 
> with a precision of 10 milliseconds
> 2017-11-13T14:29:22.732862906Z INFO Initializing system_schema.keyspaces
> 2017-11-13T14:29:22.746598744Z INFO Initializing system_schema.tables
> 2017-11-13T14:29:22.759649011Z INFO Initializing system_schema.columns
> 2017-11-13T14:29:22.766245435Z INFO Initializing system_schema.triggers
> 2017-11-13T14:29:22.778716809Z INFO Initializing system_schema.dropped_columns
> 2017-11-13T14:29:22.791369819Z INFO Initializing system_schema.views
> 2017-11-13T14:29:22.839141724Z INFO Initializing system_schema.types
> 2017-11-13T14:29:22.852911976Z INFO Initializing system_schema.functions
> 2017-11-13T14:29:22.852938112Z INFO Initializing system_schema.aggregates
> 2017-11-13T14:29:22.869348526Z INFO Initializing system_schema.indexes
> 2017-11-13T14:29:22.874178682Z INFO Not submitting build tasks for views in 
> keyspace system_schema as storage service is not initialized
> 2017-11-13T14:29:23.700250435Z INFO Initializing key cache with capacity of 
> 25 MBs.
> 2017-11-13T14:29:23.724357053Z INFO Initializing row cache with capacity of 0 
> MBs
> 2017-11-13T14:29:23.724383599Z INFO Initializing counter cache with capacity 
> of 12 MBs
> 2017-11-13T14:29:23.724386906Z INFO Scheduling counter cache save to every 
> 7200 seconds (going to s

[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2017-11-14 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251160#comment-16251160
 ] 

Vincent White commented on CASSANDRA-14013:
---

I took a look at this today and found the cause of this issue. It is indeed 
that the name "snapshots" is causing confusion as C* tries to retrieve the 
keyspace and column family names from the file paths here: 


{code:java| title=org.apache.cassandra.io.sstable.Descriptor#fromFilename()}
  else if 
(cfDirectory.getParentFile().getName().equals(Directories.SNAPSHOT_SUBDIR))
{
cfDirectory = cfDirectory.getParentFile().getParentFile();
}
cfname = cfDirectory.getName().split("-")[0] + indexName;
ksname = cfDirectory.getParentFile().getName();

{code}

I wrote a quick patch 
[here|https://github.com/vincewhite/cassandra/commits/14013-test] and would 
really appreciate some suggestions on improving it (or a different approach). I 
didn't have a chance to test 3.0.x but the code at least in this area appears 
to be the same. I should have time to tidy this up and add a test this week.

Cheers,
-vince

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Gregor Uhlenheuer
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13127) Materialized Views: View row expires too soon

2017-01-24 Thread Vincent White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837348#comment-15837348
 ] 

Vincent White commented on CASSANDRA-13127:
---

I think I pretty much understand what's happening here. It basically all stems 
from the base upsert behaviour  (creating a row via {{UPDATE}} so the primary 
key columns don't exist on their own vs {{INSERT}}). I'm still not sure it 
matches the MV docs though and the comments in the code say things like:
{code}1) either the columns for the base and view PKs are exactly the same: in 
that case, the view entry should live as long as the base row lives. This means 
the view entry should only expire once *everything* in the base row has 
expired. Which means the row TTL should be the max of any other TTL.{code} I 
think the logic in {{computeLivenessInfoForEntry}} doesn't make sense for 
updates because it only ever expected inserts. It leads to some funky behaviour 
if you're mixing updates, inserts and TTL's. I didn't test with deletes but I 
guees they could cause similar results.

Simply patching computeLivenessInfoForEntry like:
{code:title=ViewUpdateGenerator.java#computeLivenessInfoForEntry}
int expirationTime = baseLiveness.localExpirationTime();
for (Cell cell : baseRow.cells())
{

-if (cell.ttl() > ttl)
+if (cell.localDeletionTime() > expirationTime)
{
ttl = cell.ttl();
expirationTime = cell.localDeletionTime();
}
}
-return ttl == baseLiveness.ttl()
+return expirationTime == baseLiveness.localExpirationTime()
 ? baseLiveness
 : LivenessInfo.withExpirationTime(baseLiveness.timestamp(), 
ttl, expirationTime);
}
{code} isn't enough because it leads to further unexpected behaviour where 
update statements will resurrect previously TTL'd MV entries in some cases. If 
an update statement sets a column that could cause the update of _any_ view in 
that keyspace it will resurrect entries in views that have PK's made up of only 
columns from the base PK, regardless of whether the statement updates non-PK 
columns in that view. If the update statement only sets values of columns that 
don't appear in the keyspace's MV's then no MV TTL'd entries for that PK will 
be resurrected. If there was never an entry in the MV for that MV PK then it 
won't create a a new one. This is because upserts don't create new MV entries 
unless they set the value of a non-PK column in that view (with or without this 
patch).

I don't think I've seen it referenced anywhere but is that intended behaviour 
when using upserts and materialized views? That an {{UPDATE}} to a column not 
in a view will not create an entry in an MV if the veiw's PK is only made up of 
columns from the base table's PK, but the matching {{INSERT}} statement will?

> Materialized Views: View row expires too soon
> -
>
> Key: CASSANDRA-13127
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13127
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Duarte Nunes
>
> Consider the following commands, ran against trunk:
> {code}
> echo "DROP MATERIALIZED VIEW ks.mv; DROP TABLE ks.base;" | bin/cqlsh
> echo "CREATE TABLE ks.base (p int, c int, v int, PRIMARY KEY (p, c));" | 
> bin/cqlsh
> echo "CREATE MATERIALIZED VIEW ks.mv AS SELECT p, c FROM base WHERE p IS NOT 
> NULL AND c IS NOT NULL PRIMARY KEY (c, p);" | bin/cqlsh
> echo "INSERT INTO ks.base (p, c) VALUES (0, 0) USING TTL 10;" | bin/cqlsh
> # wait for row liveness to get closer to expiration
> sleep 6;
> echo "UPDATE ks.base USING TTL 8 SET v = 0 WHERE p = 0 and c = 0;" | bin/cqlsh
> echo "SELECT p, c, ttl(v) FROM ks.base; SELECT * FROM ks.mv;" | bin/cqlsh
>  p | c | ttl(v)
> ---+---+
>  0 | 0 |  7
> (1 rows)
>  c | p
> ---+---
>  0 | 0
> (1 rows)
> # wait for row liveness to expire
> sleep 4;
> echo "SELECT p, c, ttl(v) FROM ks.base; SELECT * FROM ks.mv;" | bin/cqlsh
>  p | c | ttl(v)
> ---+---+
>  0 | 0 |  3
> (1 rows)
>  c | p
> ---+---
> (0 rows)
> {code}
> Notice how the view row is removed even though the base row is still live. I 
> would say this is because in ViewUpdateGenerator#computeLivenessInfoForEntry 
> the TTLs are compared instead of the expiration times, but I'm not sure I'm 
> getting that far ahead in the code when updating a column that's not in the 
> view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)