[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 


PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  I used the code 
from version _3.11_.


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: IntervalTreeSimplified.java

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: IntervalList.java

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: IntervalListWithElimination.java

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  I used the code 
from version _3.11_.

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)
Chandrasekhar Thumuluru created CASSANDRA-15397:
---

 Summary: IntervalTree performance comparison with Linear Walk and 
Binary Search based Elimination. 
 Key: CASSANDRA-15397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chandrasekhar Thumuluru
 Attachments: 99p_1_SSTable_with_5000_Searches.png, 
99p_15000_SSTable_with_5000_Searches.png, 
99p_2_SSTable_with_5000_Searches.png, 
99p_25000_SSTable_with_5000_Searches.png, 
99p_3_SSTable_with_5000_Searches.png, 
99p_5000_SSTable_with_5000_Searches.png

Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14731) Transient Write Metrics

2019-11-04 Thread Abdul Aziz Ali (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967223#comment-16967223
 ] 

Abdul Aziz Ali commented on CASSANDRA-14731:


Hi [~benedict] can i pick this up? Im guessing we need to define some new 
metrics in `TableMetrics` `KeyspaceMetrics` and then modify 
`AbstractWriteResponseHandler` right ? 

> Transient Write Metrics
> ---
>
> Key: CASSANDRA-14731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14731
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core, Observability/Metrics
>Reporter: Benedict Elliott Smith
>Priority: Low
>  Labels: metrics, transient-replication
> Fix For: 4.x
>
>
> While we record the number of attempt transient writes, we do not record how 
> successful these were.
> Also, we do not count transient writes that happen due to the failure 
> detector.  Possibly, these While these are distinct from those writes that 
> happen ‘speculatively’ due to slow responses, there’s a strong chance they 
> will be the most common form of transient write.  It might be worth having 
> separate 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

2019-11-04 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967116#comment-16967116
 ] 

Blake Eggleston commented on CASSANDRA-15393:
-

Here's a rough initial pass at converting cells from buffers to arrays: 
[https://github.com/bdeggleston/cassandra/tree/15387-arrays]

There are {{FIXME}}s everywhere and I still need to clean up some ideas that 
didn't pan out and take another look at cell cloning, but +902/-402 is smaller 
than I was expecting.

I've also started playing around with converting from buffer to a {{Value}} 
type that can be backed by ByteBuffer, byte[], or native memory. It's nicer to 
look at, but is also a much deeper rabbit hole. There is also code in a lot of 
places that relies on ByteBuffers having mutable position that would need to 
reworked and verified.

> Add byte array backed cells
> ---
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15350) Add CAS “uncertainty” and “contention" messages that are currently propagated as a WriteTimeoutException.

2019-11-04 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962289#comment-16962289
 ] 

Yifan Cai edited comment on CASSANDRA-15350 at 11/4/19 10:29 PM:
-

 
||Branch||Diff||Tests||PR||
|[cas-exception-changes|https://github.com/yifan-c/cassandra/tree/cas-exception-changes]|[diff|https://github.com/apache/cassandra/compare/trunk...yifan-c:cas-exception-changes]|[tests|https://circleci.com/workflow-run/67f7d98f-5c2b-4b2c-9fd7-120862e554e7]|[PR|https://github.com/apache/cassandra/pull/379]|

 Changes:
 # Added {{CasWriteTimeoutException}} and {{CasWriteUncertaintyException}}
 # Added {{encode}}/{{decode}}/{{encodeSize}} for the new exceptions. Test 
cases added in ErrorMessageTest
 # Modified {{MessageFilters}} in dtest to support constructing CAS scenario.
 # Added CasWriteTest.
 # Minor changes
 ** Calculate the exact UTF-8 string byte size in CBUtil to reduce unnecessary 
memory allocation, (over-estimating with utf8MaxBytes)
 ** Corrected {{assertRows}} parameters sequence
 ** Moved {{DatabaseDescriptor}} initialization for dtest to test base class.


was (Author: yifanc):
 
||Branch||Diff||Tests||
|[cas-exception-changes|https://github.com/yifan-c/cassandra/tree/cas-exception-changes]|[diff|https://github.com/apache/cassandra/compare/trunk...yifan-c:cas-exception-changes]|[tests|https://circleci.com/workflow-run/67f7d98f-5c2b-4b2c-9fd7-120862e554e7]|

 Changes:
 # Added {{CasWriteTimeoutException}} and {{CasWriteUncertaintyException}}
 # Added {{encode}}/{{decode}}/{{encodeSize}} for the new exceptions. Test 
cases added in ErrorMessageTest
 # Modified {{MessageFilters}} in dtest to support constructing CAS scenario.
 # Added CasWriteTest.
 # Minor changes
 ** Calculate the exact UTF-8 string byte size in CBUtil to reduce unnecessary 
memory allocation, (over-estimating with utf8MaxBytes)
 ** Corrected {{assertRows}} parameters sequence
 ** Moved {{DatabaseDescriptor}} initialization for dtest to test base class.

> Add CAS “uncertainty” and “contention" messages that are currently propagated 
> as a WriteTimeoutException.
> -
>
> Key: CASSANDRA-15350
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Lightweight Transactions
>Reporter: Alex Petrov
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: client-impacting, protocolv5, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, CAS uncertainty introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as 
> WriteTimeout. One of this conditions it manifests is when there’s at least 
> one acceptor that has accepted the value, which means that this value _may_ 
> still get accepted during the later round, despite the proposer failure. 
> Similar problem happens with CAS contention, which is also indistinguishable 
> from the “regular” timeout, even though it is visible in metrics correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15350) Add CAS “uncertainty” and “contention" messages that are currently propagated as a WriteTimeoutException.

2019-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15350:
---
Labels: client-impacting protocolv5 pull-request-available  (was: 
client-impacting protocolv5)

> Add CAS “uncertainty” and “contention" messages that are currently propagated 
> as a WriteTimeoutException.
> -
>
> Key: CASSANDRA-15350
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Lightweight Transactions
>Reporter: Alex Petrov
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: client-impacting, protocolv5, pull-request-available
>
> Right now, CAS uncertainty introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as 
> WriteTimeout. One of this conditions it manifests is when there’s at least 
> one acceptor that has accepted the value, which means that this value _may_ 
> still get accepted during the later round, despite the proposer failure. 
> Similar problem happens with CAS contention, which is also indistinguishable 
> from the “regular” timeout, even though it is visible in metrics correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15309) Make the upgrade tests run on trunk

2019-11-04 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15309:
-
Status: Ready to Commit  (was: Review In Progress)

> Make the upgrade tests run on trunk
> ---
>
> Key: CASSANDRA-15309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15309
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Joey Lynch
>Assignee: Vinay Chella
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> It appears that the upgrade tests (j8_upgradetests-no-vnodes circleci target) 
> don't really work on trunk right now, it appears to be a java home issue 
> potentially. Example run: https://circleci.com/gh/jolynch/cassandra/553
> {noformat}
>  Your job ran 4412 tests with 3923 failures
> - test_IN_clause_on_last_key - 
> upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_xupgrade_tests/cql_tests.pymajor_version_int
>  = 8
> def switch_jdks(major_version_int):
> """
> Changes the jdk version globally, by setting JAVA_HOME = JAVA[N]_HOME.
> This means the environment must have JAVA[N]_HOME set to switch to 
> jdk version N.
> """
> new_java_home = 'JAVA{}_HOME'.format(major_version_int)
> 
> try:
> >   os.environ[new_java_home]
> upgrade_tests/upgrade_base.py:25: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = environ({'PYTHONUNBUFFERED': 'true', 'DEFAULT_DIR': 
> '/home/cassandra/cassandra-dtest', 'CIRCLE_NODE_INDEX': '47', 
> 'CIR...ade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_x::()::test_IN_clause_on_last_key
>  (call)'})
> key = 'JAVA8_HOME'
> def __getitem__(self, key):
> try:
> value = self._data[self.encodekey(key)]
> except KeyError:
> # raise KeyError with the original key value
> >   raise KeyError(key) from None
> E   KeyError: 'JAVA8_HOME'{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15309) Make the upgrade tests run on trunk

2019-11-04 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967007#comment-16967007
 ] 

Dinesh Joshi commented on CASSANDRA-15309:
--

+1. Please revert the CircleCI changes before committing.

> Make the upgrade tests run on trunk
> ---
>
> Key: CASSANDRA-15309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15309
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Joey Lynch
>Assignee: Vinay Chella
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> It appears that the upgrade tests (j8_upgradetests-no-vnodes circleci target) 
> don't really work on trunk right now, it appears to be a java home issue 
> potentially. Example run: https://circleci.com/gh/jolynch/cassandra/553
> {noformat}
>  Your job ran 4412 tests with 3923 failures
> - test_IN_clause_on_last_key - 
> upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_xupgrade_tests/cql_tests.pymajor_version_int
>  = 8
> def switch_jdks(major_version_int):
> """
> Changes the jdk version globally, by setting JAVA_HOME = JAVA[N]_HOME.
> This means the environment must have JAVA[N]_HOME set to switch to 
> jdk version N.
> """
> new_java_home = 'JAVA{}_HOME'.format(major_version_int)
> 
> try:
> >   os.environ[new_java_home]
> upgrade_tests/upgrade_base.py:25: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = environ({'PYTHONUNBUFFERED': 'true', 'DEFAULT_DIR': 
> '/home/cassandra/cassandra-dtest', 'CIRCLE_NODE_INDEX': '47', 
> 'CIR...ade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_x::()::test_IN_clause_on_last_key
>  (call)'})
> key = 'JAVA8_HOME'
> def __getitem__(self, key):
> try:
> value = self._data[self.encodekey(key)]
> except KeyError:
> # raise KeyError with the original key value
> >   raise KeyError(key) from None
> E   KeyError: 'JAVA8_HOME'{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15309) Make the upgrade tests run on trunk

2019-11-04 Thread Joey Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joey Lynch updated CASSANDRA-15309:
---
Reviewers: Joey Lynch, Joey Lynch  (was: Joey Lynch)
   Joey Lynch, Joey Lynch
   Status: Review In Progress  (was: Patch Available)

> Make the upgrade tests run on trunk
> ---
>
> Key: CASSANDRA-15309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15309
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Joey Lynch
>Assignee: Vinay Chella
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> It appears that the upgrade tests (j8_upgradetests-no-vnodes circleci target) 
> don't really work on trunk right now, it appears to be a java home issue 
> potentially. Example run: https://circleci.com/gh/jolynch/cassandra/553
> {noformat}
>  Your job ran 4412 tests with 3923 failures
> - test_IN_clause_on_last_key - 
> upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_xupgrade_tests/cql_tests.pymajor_version_int
>  = 8
> def switch_jdks(major_version_int):
> """
> Changes the jdk version globally, by setting JAVA_HOME = JAVA[N]_HOME.
> This means the environment must have JAVA[N]_HOME set to switch to 
> jdk version N.
> """
> new_java_home = 'JAVA{}_HOME'.format(major_version_int)
> 
> try:
> >   os.environ[new_java_home]
> upgrade_tests/upgrade_base.py:25: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = environ({'PYTHONUNBUFFERED': 'true', 'DEFAULT_DIR': 
> '/home/cassandra/cassandra-dtest', 'CIRCLE_NODE_INDEX': '47', 
> 'CIR...ade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_x::()::test_IN_clause_on_last_key
>  (call)'})
> key = 'JAVA8_HOME'
> def __getitem__(self, key):
> try:
> value = self._data[self.encodekey(key)]
> except KeyError:
> # raise KeyError with the original key value
> >   raise KeyError(key) from None
> E   KeyError: 'JAVA8_HOME'{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15309) Make the upgrade tests run on trunk

2019-11-04 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966987#comment-16966987
 ] 

Joey Lynch commented on CASSANDRA-15309:


+1 on commit 
[f1aae0e6|https://github.com/apache/cassandra/commit/f1aae0e6524cd1c345a95000b9c28774b2f66418]

We can follow up on fixing the failing upgrade tests separately.

> Make the upgrade tests run on trunk
> ---
>
> Key: CASSANDRA-15309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15309
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Joey Lynch
>Assignee: Vinay Chella
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> It appears that the upgrade tests (j8_upgradetests-no-vnodes circleci target) 
> don't really work on trunk right now, it appears to be a java home issue 
> potentially. Example run: https://circleci.com/gh/jolynch/cassandra/553
> {noformat}
>  Your job ran 4412 tests with 3923 failures
> - test_IN_clause_on_last_key - 
> upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_xupgrade_tests/cql_tests.pymajor_version_int
>  = 8
> def switch_jdks(major_version_int):
> """
> Changes the jdk version globally, by setting JAVA_HOME = JAVA[N]_HOME.
> This means the environment must have JAVA[N]_HOME set to switch to 
> jdk version N.
> """
> new_java_home = 'JAVA{}_HOME'.format(major_version_int)
> 
> try:
> >   os.environ[new_java_home]
> upgrade_tests/upgrade_base.py:25: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = environ({'PYTHONUNBUFFERED': 'true', 'DEFAULT_DIR': 
> '/home/cassandra/cassandra-dtest', 'CIRCLE_NODE_INDEX': '47', 
> 'CIR...ade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_2_1_x::()::test_IN_clause_on_last_key
>  (call)'})
> key = 'JAVA8_HOME'
> def __getitem__(self, key):
> try:
> value = self._data[self.encodekey(key)]
> except KeyError:
> # raise KeyError with the original key value
> >   raise KeyError(key) from None
> E   KeyError: 'JAVA8_HOME'{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-04 Thread Joey Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joey Lynch updated CASSANDRA-15379:
---
Change Category: Performance
 Complexity: Low Hanging Fruit
  Fix Version/s: 4.0-alpha
  Reviewers: Dinesh Joshi
 Status: Open  (was: Triage Needed)

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking

2019-11-04 Thread Federico (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966946#comment-16966946
 ] 

Federico commented on CASSANDRA-10726:
--

[~isaacreath] would you? I've encountered this problem in my application and 
Cassandra 4 is so far away that I think many teams would benefit from this.

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination
>Reporter: Richard Low
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-04 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966938#comment-16966938
 ] 

Joey Lynch edited comment on CASSANDRA-15379 at 11/4/19 7:30 PM:
-

My rationale for the {{EnumSet}} over a boolean member function is:
 # Versus the boolean function idea it doesn't break the ICompressor 
abstraction and let compressors know that flushes exist. As in, it is very easy 
for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but 
probably can't make the call if that should be used in flushes or other 
situations. I could have a {{isFastCompressor}} boolean function but given that 
{{ICompressor}} is a public API interface I think sets of capabilities will be 
more maintainable than a collection of boolean functions going forwards, 
especially if we start adding more capabilities (see #2).
 # If we go down the path of _not_ making more knobs and just try to have the 
database figure out the best way to compress data for users this is easier to 
maintain long term since compressors can offer multiple types of hints to the 
database. For example the database might refuse to use slow compressors in 
flushes, commitlogs, etc or having compaction strategies opt into higher ratio 
compression strategies in higher "levels". If we do go down this path there are 
fewer interface changes (instead of adding and removing functions we just add 
ICompressor.Uses hints).
 # Versus the set of strings idea, it has compile time checks that are useful 
(which is the primary argument against sets of strings afaik).

After thinking about this problem space more I'm no longer convinced that 
giving general users more knobs here is the right choice (the table 
properties). By using a {{suitableUses}} hint the database can in the future 
4.x releases internally optimize:
 * Flushes: "get this data off my heap as fast as possible". We don't care 
about ratio (since the products will be re-compacted shortly) or decompression 
speed, only care about compression speed.
 * Commitlog: "some compression is nice but get this data off my heap fast". We 
mostly care about compression speed, but very minorly about ratio.
 * Compaction: "The older the data the more compressed it should be". We care a 
lot about decompression speed and ratio, but don't want to pick expensive 
compressors at the high churn points (L0 in LCS, small tables in STCS, before 
the time window bucket in TWCS)

The interface still gives advanced users a backdoor (they extend the compressor 
they want to change the behavior of and change what capabilities it offers).

edit: I pinged this ticket into 
[slack|https://the-asf.slack.com/archives/CK23JSY2K/p1572881897039500] to seek 
more feedback.


was (Author: jolynch):
My rationale for the {{EnumSet}} over a boolean member function is:
 # Versus the boolean function idea it doesn't break the ICompressor 
abstraction and let compressors know that flushes exist. As in, it is very easy 
for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but 
probably can't make the call if that should be used in flushes or other 
situations. I could have a {{isFastCompressor}} boolean function but given that 
{{ICompressor}} is a public API interface I think sets of capabilities will be 
more maintainable than a collection of boolean functions going forwards, 
especially if we start adding more capabilities (see #2).
 # If we go down the path of _not_ making more knobs and just try to have the 
database figure out the best way to compress data for users this is easier to 
maintain long term since compressors can offer multiple types of hints to the 
database. For example the database might refuse to use slow compressors in 
flushes, commitlogs, etc or having compaction strategies opt into higher ratio 
compression strategies in higher "levels". If we do go down this path there are 
fewer interface changes (instead of adding and removing functions we just add 
ICompressor.Uses hints).
 # Versus the set of strings idea, it has compile time checks that are useful 
(which is the primary argument against sets of strings afaik).

After thinking about this problem space more I'm no longer convinced that 
giving general users more knobs here is the right choice (the table 
properties). By using a {{suitableUses}} hint the database can internally 
optimize:
 * Flushes: "get this data off my heap as fast as possible". We don't care 
about ratio (since the products will be re-compacted shortly) or decompression 
speed, only care about compression speed.
 * Commitlog: "some compression is nice but get this data off my heap fast". We 
mostly care about compression speed, but very minorly about ratio.
 * Compaction: "The older the data the more compressed it should be". We care a 
lot about decompression speed and ratio, but don't want to pick expensive 
compressors at the high 

[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-04 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966938#comment-16966938
 ] 

Joey Lynch commented on CASSANDRA-15379:


My rationale for the {{EnumSet}} over a boolean member function is:
 # Versus the boolean function idea it doesn't break the ICompressor 
abstraction and let compressors know that flushes exist. As in, it is very easy 
for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but 
probably can't make the call if that should be used in flushes or other 
situations. I could have a {{isFastCompressor}} boolean function but given that 
{{ICompressor}} is a public API interface I think sets of capabilities will be 
more maintainable than a collection of boolean functions going forwards, 
especially if we start adding more capabilities (see #2).
 # If we go down the path of _not_ making more knobs and just try to have the 
database figure out the best way to compress data for users this is easier to 
maintain long term since compressors can offer multiple types of hints to the 
database. For example the database might refuse to use slow compressors in 
flushes, commitlogs, etc or having compaction strategies opt into higher ratio 
compression strategies in higher "levels". If we do go down this path there are 
fewer interface changes (instead of adding and removing functions we just add 
ICompressor.Uses hints).
 # Versus the set of strings idea, it has compile time checks that are useful 
(which is the primary argument against sets of strings afaik).

After thinking about this problem space more I'm no longer convinced that 
giving general users more knobs here is the right choice (the table 
properties). By using a {{suitableUses}} hint the database can internally 
optimize:
 * Flushes: "get this data off my heap as fast as possible". We don't care 
about ratio (since the products will be re-compacted shortly) or decompression 
speed, only care about compression speed.
 * Commitlog: "some compression is nice but get this data off my heap fast". We 
mostly care about compression speed, but very minorly about ratio.
 * Compaction: "The older the data the more compressed it should be". We care a 
lot about decompression speed and ratio, but don't want to pick expensive 
compressors at the high churn points (L0 in LCS, small tables in STCS, before 
the time window bucket in TWCS)

The interface still gives advanced users a backdoor (they extend the compressor 
they want to change the behavior of and change what capabilities it offers).

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed 

[jira] [Updated] (CASSANDRA-15385) Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default

2019-11-04 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-15385:
--
  Fix Version/s: (was: 3.11.x)
 (was: 3.0.x)
 3.11.6
 3.0.20
  Since Version: 4.0-alpha
Source Control Link: 
[9b1f3796a65db46f15f2f2ad8af4180f71e3f53f|https://github.com/apache/cassandra/commit/9b1f3796a65db46f15f2f2ad8af4180f71e3f53f]
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default
> --
>
> Key: CASSANDRA-15385
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15385
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
> Fix For: 3.0.20, 3.11.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15385) Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default

2019-11-04 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966880#comment-16966880
 ] 

Aleksey Yeschenko commented on CASSANDRA-15385:
---

Cheers - committed to 3.0 as 
[9b1f3796a65db46f15f2f2ad8af4180f71e3f53f|https://github.com/apache/cassandra/commit/9b1f3796a65db46f15f2f2ad8af4180f71e3f53f]
 and merged upwards.

> Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default
> --
>
> Key: CASSANDRA-15385
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15385
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch master updated: Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default

2019-11-04 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new 64dca64  Ensure that tracing doesn't break connections in 3.x/4.0 
mixed mode by default
64dca64 is described below

commit 64dca6496874160709548e6dc9696417b837bda0
Author: Aleksey Yeschenko 
AuthorDate: Thu Oct 31 13:06:30 2019 +

Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by 
default

patch by Aleksey Yeschenko; reviewed by Sam Tunnicliffe for CASSANDRA-15385
---
 upgrade_tests/cql_tests.py | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/upgrade_tests/cql_tests.py b/upgrade_tests/cql_tests.py
index a03cf9c..d3475a3 100644
--- a/upgrade_tests/cql_tests.py
+++ b/upgrade_tests/cql_tests.py
@@ -5465,7 +5465,7 @@ class TestCQL(UpgradeTester):
 logger.debug("Querying {} node".format("upgraded" if is_upgraded 
else "old"))
 assert_all(cursor, "SELECT k FROM ks.test WHERE v = 0", [[0]])
 
-def test_tracing_prevents_startup_after_upgrading(self, 
fixture_dtest_setup):
+def test_tracing_prevents_startup_after_upgrading(self):
 """
 Test that after upgrading from 2.1 to 3.0, the system_traces.sessions 
table is properly upgraded to include
 the client column.
@@ -5476,13 +5476,6 @@ class TestCQL(UpgradeTester):
 cursor.execute("CREATE KEYSPACE foo WITH replication = {'class': 
'SimpleStrategy', 'replication_factor': 1}")
 cursor.execute("CREATE TABLE foo.bar (k int PRIMARY KEY, v int)")
 
-#It's possible to log an error when reading trace information because 
the schema at node differs
-#between versions
-if self.is_40_or_greater():
-fixture_dtest_setup.ignore_log_patterns = 
fixture_dtest_setup.ignore_log_patterns +\
-  ["Unknown column 
coordinator_port during deserialization",
-   "Unknown column 
source_port during deserialization"]
-
 for is_upgraded, cursor in self.do_upgrade(cursor):
 logger.debug("Querying {} node".format("upgraded" if is_upgraded 
else "old"))
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.0 updated (b40d79c -> 9b1f379)

2019-11-04 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from b40d79c  Make sure index summary redistributions don't start when 
compactions are paused
 add 9b1f379  Ensure that tracing doesn't break connections in 3.x/4.0 
mixed mode by default

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt | 1 +
 src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java | 3 +++
 src/java/org/apache/cassandra/tracing/TraceKeyspace.java| 2 ++
 3 files changed, 6 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated (f9b46ae -> 5d6b1c7)

2019-11-04 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from f9b46ae  Merge branch 'cassandra-3.11' into trunk
 add 9b1f379  Ensure that tracing doesn't break connections in 3.x/4.0 
mixed mode by default
 add c2e241f  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 5d6b1c7  Merge branch 'cassandra-3.11' into trunk

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt | 11 ++-
 NEWS.txt| 11 +++
 2 files changed, 17 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.11 updated (4b547f1 -> c2e241f)

2019-11-04 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 4b547f1  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 9b1f379  Ensure that tracing doesn't break connections in 3.x/4.0 
mixed mode by default
 add c2e241f  Merge branch 'cassandra-3.0' into cassandra-3.11

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt | 1 +
 src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java | 2 ++
 src/java/org/apache/cassandra/tracing/TraceKeyspace.java| 2 ++
 3 files changed, 5 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Valentin Lorentz (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966832#comment-16966832
 ] 

Valentin Lorentz commented on CASSANDRA-15358:
--

> Does "{{Maximum memory usage reached}}" always precede the first 
> "{{java.lang.IllegalArgumentException: initialBuffer is not a direct 
> buffer.}}"?

 

Yes, "Maximum memory usage reached" is always printed right before the 
exception is printed for the first time (except for one occurence where there's 
some chatter about slow requests).

 

If I run my query again without restarting the process, the exception is raised 
again, without "Maximum memory usage reached" before it.

"Maximum memory usage reached" does get printed again every ~15min if I leave 
the process running, though.

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95)
> at 
> org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
> at 
> 

[jira] [Comment Edited] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966811#comment-16966811
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15358 at 11/4/19 4:41 PM:
-

Does "{{Maximum memory usage reached}}" always precede the first 
"{{java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.}}"?

There's a bug I spotted on inspection, but it shouldn't be possible to 
encounter, as it would entail converting the heap {{ByteBuffer}} allocated by 
the first message to a read only {{ByteBuffer}} before being recycled into the 
{{BufferPool}}, causing it to be treated as though it were a 
{{DirectByteBuffer}}.  I cannot imagine a scenario where this could happen, but 
that doesn't mean that it doesn't.  I will upload a branch to see if fixing 
this resolves the problem.


was (Author: benedict):
Does {{Maximum memory usage reached}} always precede the first 
{{java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.}}?

There's a bug I spotted on inspection, but it shouldn't be possible to 
encounter, as it would entail converting the heap {{ByteBuffer}} allocated by 
the first message to a read only {{ByteBuffer}} before being recycled into the 
{{BufferPool}}, causing it to be treated as though it were a 
{{DirectByteBuffer}}.  I cannot imagine a scenario where this could happen, but 
that doesn't mean that it doesn't.  I will upload a branch to see if fixing 
this resolves the problem.

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 

[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966811#comment-16966811
 ] 

Benedict Elliott Smith commented on CASSANDRA-15358:


Does {{Maximum memory usage reached}} always precede the first 
{{java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.}}?

There's a bug I spotted on inspection, but it shouldn't be possible to 
encounter, as it would entail converting the heap {{ByteBuffer}} allocated by 
the first message to a read only {{ByteBuffer}} before being recycled into the 
{{BufferPool}}, causing it to be treated as though it were a 
{{DirectByteBuffer}}.  I cannot imagine a scenario where this could happen, but 
that doesn't mean that it doesn't.  I will upload a branch to see if fixing 
this resolves the problem.

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95)
> at 
> org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56)
> at 
> 

[jira] [Updated] (CASSANDRA-15081) LegacyLayout does not have same behavior as 2.x when handling unknown column names

2019-11-04 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15081:
---
Since Version: 3.11.2  (was: 3.11.1)

> LegacyLayout does not have same behavior as 2.x when handling unknown column 
> names
> --
>
> Key: CASSANDRA-15081
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15081
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Cameron Zemek
>Priority: High
>  Labels: patch, pull-request-available
> Attachments: 15081.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Due to a bug I haven't been able to reproduce the production cluster had 
> unknown column names. To replicate the issue for this test I did the 
> following:
> {noformat}
> $ ccm create -v 2.1.19 -n 1 -s bug
> $ cat > schema.cql << 'EOF'
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;
> CREATE TABLE test.unknowntest (id int primary key, payload text, "paylo!d" 
> text);
> EOF
> $ ccm node1 cqlsh -f schema.cql
> $ export CASSANDRA_INCLUDE=~/.ccm/bug/node1/bin/cassandra.in.sh
> $ cat > bug.json << 'EOF'
> [
> {"key": "1",
> "cells": [["","",1554432501209207],
> ["paylo!d","hello world",1554432501209207],
> ["payload","hello world",1554432501209207]]}
> ]
> EOF
> $ ~/.ccm/repository/2.1.19/tools/bin/json2sstable -K test -c unknowntest 
> ~/bug.json 
> ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-Data.db{noformat}
> Then test the behavior of unknown columns in 2.1:
> {noformat}
> $ ccm stop
> $ ccm create -v 2.1.19 -n 1 -s bug2_1_19
> $ cat > schema2.cql << 'EOF'
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;
> CREATE TABLE test.unknowntest (id int primary key, payload text);
> EOF
> $ ccm node1 cqlsh -f schema2.cql
> $ ccm stop
> $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* 
> ~/.ccm/bug2_1_19/node1/data0/test/unknowntest-/
> $ ccm start
> $ ccm node1 cqlsh
> Connected to bug2_1_19 at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 2.1.19 | CQL spec 3.2.1 | Native protocol v3]
> Use HELP for help.
> cqlsh> select * from test.unknowntest where id = 1;
> id | payload
> +-
> 1 | hello world
> (1 rows){noformat}
> Compared to 3.11.4 which did the following:
> {noformat}
> $ ccm stop
> $ ccm create -v 3.11.4 -n 1 -s bug3_11_4
> $ ccm node1 cqlsh -f schema2.cql
> $ ccm stop
> $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* 
> ~/.ccm/bug3_11_4/node1/data0/test/unknowntest-/
> $ ccm start
> $ ccm node1 cqlsh
> Connected to bug3_11_4 at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
> cqlsh> select * from test.unknowntest where id = 1;
> ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] 
> message="Operation failed - received 0 responses and 1 failures" 
> info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 
> 'consistency': 'ONE'}
> {noformat}
> In the logs this resulted in an IllegalStateException from LegacyLayout line 
> 1127
> The expected behavior would be to ignore the column and return results the 
> same as in 2.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15081) LegacyLayout does not have same behavior as 2.x when handling unknown column names

2019-11-04 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965633#comment-16965633
 ] 

Michael Semb Wever edited comment on CASSANDRA-15081 at 11/4/19 3:52 PM:
-

Thanks [~cam1982]. 

||branch||circleci||asf jenkins tests||asf jenkins dtests||
|[cassandra-3.11_15081|https://github.com/apache/cassandra/compare/trunk...instaclustr:3.11-15081]|[circleci|https://circleci.com/gh/instaclustr/workflows/cassandra/tree/3.11-15081]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/27//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/27/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/700//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/700]|

The tests in the pipeline (non-dtests) look ok compared to the 3.11 branch. 
(Most of the failures there are in the first "test" stage.) The same goes for 
the dtests compared against 3.11.

I'm double-checking how this effects the scenario described in CASSANDRA-13939 
when cells are skipped and the deserializer's correction to the file pointer 
for reading the next row. As we're now treating legacy unknown columns as such 
skipped cells as well. Any input on this [~cam1982]?

Also, I'm changing the "since version" field, since the bug (in its current 
form) only existed from 3.11.2, after CASSANDRA-13939, when the 
AssertionFailedError was changed to an IllegalStateException.


was (Author: michaelsembwever):
Thanks [~cam1982]. 

||branch||circleci||asf jenkins tests||asf jenkins dtests||
|[cassandra-3.11_15081|https://github.com/apache/cassandra/compare/trunk...instaclustr:3.11-15081]|[circleci|https://circleci.com/gh/instaclustr/workflows/cassandra/tree/3.11-15081]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/27//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/27/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/700//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/700]|

The tests in the pipeline (non-dtests) look ok compared to the 3.11 branch. 
(Most of the failures there are in the first "test" stage.) The same goes for 
the dtests compared against 3.11.

I'm double-checking how this effects the scenario described in CASSANDRA-13939 
when cells are skipped and the deserializer's correction to the file pointer 
for reading the next row. As we're now treating legacy unknown columns as such 
skipped cells as well. Any input on this [~cam1982]?



> LegacyLayout does not have same behavior as 2.x when handling unknown column 
> names
> --
>
> Key: CASSANDRA-15081
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15081
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Cameron Zemek
>Priority: High
>  Labels: patch, pull-request-available
> Attachments: 15081.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Due to a bug I haven't been able to reproduce the production cluster had 
> unknown column names. To replicate the issue for this test I did the 
> following:
> {noformat}
> $ ccm create -v 2.1.19 -n 1 -s bug
> $ cat > schema.cql << 'EOF'
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;
> CREATE TABLE test.unknowntest (id int primary key, payload text, "paylo!d" 
> text);
> EOF
> $ ccm node1 cqlsh -f schema.cql
> $ export CASSANDRA_INCLUDE=~/.ccm/bug/node1/bin/cassandra.in.sh
> $ cat > bug.json << 'EOF'
> [
> {"key": "1",
> "cells": [["","",1554432501209207],
> ["paylo!d","hello world",1554432501209207],
> ["payload","hello world",1554432501209207]]}
> ]
> EOF
> $ ~/.ccm/repository/2.1.19/tools/bin/json2sstable -K test -c unknowntest 
> ~/bug.json 
> ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-Data.db{noformat}
> Then test the behavior of unknown columns in 2.1:
> {noformat}
> $ ccm stop
> $ ccm create -v 2.1.19 -n 1 -s bug2_1_19
> $ cat > schema2.cql << 'EOF'
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;
> CREATE TABLE test.unknowntest (id int primary key, payload text);
> EOF
> $ ccm node1 cqlsh -f schema2.cql
> $ ccm stop
> $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* 
> ~/.ccm/bug2_1_19/node1/data0/test/unknowntest-/
> $ ccm start
> $ ccm node1 cqlsh
> Connected to bug2_1_19 at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 2.1.19 | CQL spec 3.2.1 | Native 

[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Valentin Lorentz (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966733#comment-16966733
 ] 

Valentin Lorentz commented on CASSANDRA-15358:
--

# `15:07  INFO [SSTableBatchOpen:1] 2019-11-04 16:02:29,594 
BufferPool.java:216 - Global buffer pool is enabled, when pool is exhausted 
(max is 512.000MiB) it will allocate on heap`
 # :

{code:java}
INFO  [ReadStage-1] 2019-11-04 16:04:57,612 NoSpamLogger.java:91 - Maximum 
memory usage reached (512.000MiB), cannot allocate chunk of 8.000MiB
ERROR [Messaging-EventLoop-3-1] 2019-11-04 16:06:40,133 
InboundMessageHandler.java:657 - 
128.93.66.191:7000->128.93.64.42:7000-LARGE_MESSAGES-0b6d3aaa unexpected 
exception caught while processing inbound messages; terminating connection
java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. 
   at 
io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
   at 
io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
[...]
{code}
 

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> 

[jira] [Comment Edited] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Valentin Lorentz (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966733#comment-16966733
 ] 

Valentin Lorentz edited comment on CASSANDRA-15358 at 11/4/19 3:10 PM:
---

1:
{code:java}
INFO [SSTableBatchOpen:1] 2019-11-04 16:02:29,594 BufferPool.java:216 - Global 
buffer pool is enabled, when pool is exhausted (max is 512.000MiB) it will 
allocate on heap{code}
2:
{code:java}
INFO  [ReadStage-1] 2019-11-04 16:04:57,612 NoSpamLogger.java:91 - Maximum 
memory usage reached (512.000MiB), cannot allocate chunk of 8.000MiB
ERROR [Messaging-EventLoop-3-1] 2019-11-04 16:06:40,133 
InboundMessageHandler.java:657 - 
128.93.66.191:7000->128.93.64.42:7000-LARGE_MESSAGES-0b6d3aaa unexpected 
exception caught while processing inbound messages; terminating connection
java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. 
   at 
io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
   at 
io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
[...]
{code}
 


was (Author: progval):
# `15:07  INFO [SSTableBatchOpen:1] 2019-11-04 16:02:29,594 
BufferPool.java:216 - Global buffer pool is enabled, when pool is exhausted 
(max is 512.000MiB) it will allocate on heap`
 # :

{code:java}
INFO  [ReadStage-1] 2019-11-04 16:04:57,612 NoSpamLogger.java:91 - Maximum 
memory usage reached (512.000MiB), cannot allocate chunk of 8.000MiB
ERROR [Messaging-EventLoop-3-1] 2019-11-04 16:06:40,133 
InboundMessageHandler.java:657 - 
128.93.66.191:7000->128.93.64.42:7000-LARGE_MESSAGES-0b6d3aaa unexpected 
exception caught while processing inbound messages; terminating connection
java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. 
   at 
io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
   at 
io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
[...]
{code}
 

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at 

[jira] [Updated] (CASSANDRA-11018) Drop column in results in corrupted table or tables state (reversible)

2019-11-04 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-11018:
---
Component/s: Legacy/Local Write-Read Paths

> Drop column in results in corrupted table or tables state (reversible)
> --
>
> Key: CASSANDRA-11018
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11018
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL, Legacy/Local Write-Read Paths
> Environment: Debian 3.16.7
>Reporter: Jason Kania
>Assignee: Sylvain Lebresne
>Priority: Low
> Fix For: 3.0.3, 3.3
>
>
> After dropping a column from a table, that table is no longer accessible from 
> various commands.
> Initial command in cqlsh;
> {code}
> alter table "sensorUnit" drop "lastCouplingCheckTime";
> {code}
> no errors were reported:
> Subsequently, the following commands fail as follows:
> {code}
> > nodetool compact
> root@marble:/var/log/cassandra# nodetool compact
> error: Unknown column lastCouplingCheckTime in table powermon.sensorUnit
> -- StackTrace --
> java.lang.AssertionError: Unknown column lastCouplingCheckTime in table 
> powermon.sensorUnit
> at 
> org.apache.cassandra.db.LegacyLayout.readLegacyAtom(LegacyLayout.java:964)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.readAtom(UnfilteredDeserializer.java:520)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.hasNext(UnfilteredDeserializer.java:503)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:446)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:422)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289)
> at 
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100)
> at 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:442)
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150)
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
> at 
> org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Also I get the following from cqlsh commands:
> {code}
> cqlsh:sensorTrack> select * from "sensorUnit";
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1258, in perform_simple_statement
> result = future.result()
>   File 
> 

[jira] [Updated] (CASSANDRA-11018) Drop column in results in corrupted table or tables state (reversible)

2019-11-04 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-11018:
---
Description: 
After dropping a column from a table, that table is no longer accessible from 
various commands.

Initial command in cqlsh;
{code}
alter table "sensorUnit" drop "lastCouplingCheckTime";
{code}
no errors were reported:

Subsequently, the following commands fail as follows:
{code}
> nodetool compact

root@marble:/var/log/cassandra# nodetool compact
error: Unknown column lastCouplingCheckTime in table powermon.sensorUnit
-- StackTrace --
java.lang.AssertionError: Unknown column lastCouplingCheckTime in table 
powermon.sensorUnit
at 
org.apache.cassandra.db.LegacyLayout.readLegacyAtom(LegacyLayout.java:964)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.readAtom(UnfilteredDeserializer.java:520)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.hasNext(UnfilteredDeserializer.java:503)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:446)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:422)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289)
at 
org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57)
at 
org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329)
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109)
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100)
at 
org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:442)
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150)
at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
at 
org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226)
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Also I get the following from cqlsh commands:
{code}
cqlsh:sensorTrack> select * from "sensorUnit";
Traceback (most recent call last):
  File "/usr/bin/cqlsh.py", line 1258, in perform_simple_statement
result = future.result()
  File 
"/usr/share/cassandra/lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py",
 line 3122, in result
raise self._final_exception
ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
failed - received 0 responses and 1 failures" info={'failures': 1, 
'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
{code}
However, after I readded the table, access to the database was restored.
{code}
alter table "sensorUnit" add "lastCouplingCheckTime"
{code}
I was not able to reproduce as subsequent attempts to alter worked properly, 
but the problem occurred on two tables that were altered at the same time so 
there may be a need to ensure a drop completes entirely when it is performed.

  was:
After dropping a column from a table, that table 

[jira] [Updated] (CASSANDRA-11018) Drop column in results in corrupted table or tables state (reversible)

2019-11-04 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-11018:
---
Fix Version/s: (was: 3.11.x)

> Drop column in results in corrupted table or tables state (reversible)
> --
>
> Key: CASSANDRA-11018
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11018
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
> Environment: Debian 3.16.7
>Reporter: Jason Kania
>Assignee: Sylvain Lebresne
>Priority: Low
> Fix For: 3.0.3, 3.3
>
>
> After dropping a column from a table, that table is no longer accessible from 
> various commands.
> Initial command in cqlsh;
> alter table "sensorUnit" drop "lastCouplingCheckTime";
> no errors were reported:
> Subsequently, the following commands fail as follows:
> > nodetool compact
> root@marble:/var/log/cassandra# nodetool compact
> error: Unknown column lastCouplingCheckTime in table powermon.sensorUnit
> -- StackTrace --
> java.lang.AssertionError: Unknown column lastCouplingCheckTime in table 
> powermon.sensorUnit
> at 
> org.apache.cassandra.db.LegacyLayout.readLegacyAtom(LegacyLayout.java:964)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.readAtom(UnfilteredDeserializer.java:520)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.hasNext(UnfilteredDeserializer.java:503)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:446)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:422)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289)
> at 
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100)
> at 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:442)
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150)
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
> at 
> org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Also I get the following from cqlsh commands:
> cqlsh:sensorTrack> select * from "sensorUnit";
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1258, in perform_simple_statement
> result = future.result()
>   File 
> "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py",
>  line 3122, in result
> raise 

[jira] [Updated] (CASSANDRA-11018) Drop column in results in corrupted table or tables state (reversible)

2019-11-04 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-11018:
---
Fix Version/s: (was: 3.0.x)
   3.0.3
   3.3

> Drop column in results in corrupted table or tables state (reversible)
> --
>
> Key: CASSANDRA-11018
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11018
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
> Environment: Debian 3.16.7
>Reporter: Jason Kania
>Assignee: Sylvain Lebresne
>Priority: Low
> Fix For: 3.0.3, 3.3, 3.11.x
>
>
> After dropping a column from a table, that table is no longer accessible from 
> various commands.
> Initial command in cqlsh;
> alter table "sensorUnit" drop "lastCouplingCheckTime";
> no errors were reported:
> Subsequently, the following commands fail as follows:
> > nodetool compact
> root@marble:/var/log/cassandra# nodetool compact
> error: Unknown column lastCouplingCheckTime in table powermon.sensorUnit
> -- StackTrace --
> java.lang.AssertionError: Unknown column lastCouplingCheckTime in table 
> powermon.sensorUnit
> at 
> org.apache.cassandra.db.LegacyLayout.readLegacyAtom(LegacyLayout.java:964)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.readAtom(UnfilteredDeserializer.java:520)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$AtomIterator.hasNext(UnfilteredDeserializer.java:503)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:446)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:422)
> at 
> org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289)
> at 
> org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100)
> at 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:442)
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150)
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
> at 
> org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Also I get the following from cqlsh commands:
> cqlsh:sensorTrack> select * from "sensorUnit";
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1258, in perform_simple_statement
> result = future.result()
>   File 
> 

[jira] [Comment Edited] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966714#comment-16966714
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15358 at 11/4/19 2:53 PM:
-

Interesting. Could you:
 # Find the log line that begins with "Global buffer pool is enabled, when pool 
is exhausted" and reproduce it in its entirety here?
 # Find if you have any log lines beginning with "Maximum memory usage reached" 
and reproduce them here?

Thanks!


was (Author: benedict):
Interesting. Could you:
 # Find the log line that begins with "Global buffer pool is enabled, when pool 
is exhausted" and reproduce it in its entirety here?
 # Find if you have any log lines beginning with "Maximum memory usage reached" 
and reproduce them here?

Thanks

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95)
> at 
> org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56)
> at 
> 

[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966714#comment-16966714
 ] 

Benedict Elliott Smith commented on CASSANDRA-15358:


Interesting. Could you:
 # Find the log line that begins with "Global buffer pool is enabled, when pool 
is exhausted" and reproduce it in its entirety here?
 # Find if you have any log lines beginning with "Maximum memory usage reached" 
and reproduce them here?

Thanks

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95)
> at 
> org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
> at 
> io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
> at 
> 

[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Valentin Lorentz (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966709#comment-16966709
 ] 

Valentin Lorentz commented on CASSANDRA-15358:
--

I am actually able to reproduce the issue (this allowed me to trim down the 
config I posted above).

Restarts temporarily solve the issue, but after a couple million records 
fetched with the script above, the error happens again on one of the nodes; and 
keeps happening every time I run the query, until I restart the node.

I just upgraded to 4.0-alpha2, and the problem is unchanged.

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is repeated with 
> difefrent version of Java(8,11 &12) [~benedict]
>  
> Stack trace:
> {code:java}
> INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for 
> CQL clients on /1.3.0.6:9042 (unencrypted)...
> WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - 
> 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message 
> of type PING_REQ whose timeout expired before reaching the network
> WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 
> - CassandraRoleManager skipped default role setup: some nodes were not ready
> INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 
> - Setup task failed with error, rescheduling
> INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - 
> 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) 
> failed: Connection refused: /1.3.0.2:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644)
> at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - 
> 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of 
> type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network
> INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 
> - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect
>  
> ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 
> InboundMessageHandler.java:657 - 
> 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception 
> caught while processing inbound messages; terminating connection
> java.lang.IllegalArgumentException: initialBuffer is not a direct buffer.
> at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87)
> at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59)
> at 
> org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95)
> at 
> org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
> at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
> at 
> io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
> at 
> 

[jira] [Commented] (CASSANDRA-8612) Read metrics should be updated on all types of reads

2019-11-04 Thread Abdul Aziz Ali (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966694#comment-16966694
 ] 

Abdul Aziz Ali commented on CASSANDRA-8612:
---

Hi [~cnlwsu] im interested to pick this up, can you advise if its not yet 
resolved?

> Read metrics should be updated on all types of reads
> 
>
> Key: CASSANDRA-8612
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8612
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Chris Lohfink
>Priority: Low
>  Labels: lhf, metrics
>
> Metrics like "sstables per read" are not updated on a range slice.  Although 
> separating things out for each type of read could make sense like we do for 
> latencies, only exposing the metrics for one type can be a little confusing 
> when people do a query and see nothing increases.  I think its sufficient to 
> use the same metrics for all reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-8612) Read metrics should be updated on all types of reads

2019-11-04 Thread ABDUL AZIZ ALI (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ABDUL AZIZ ALI updated CASSANDRA-8612:
--
Comment: was deleted

(was: Hi [~cnlwsu] im interested to pick this up, can you advise if its not yet 
resolved?)

> Read metrics should be updated on all types of reads
> 
>
> Key: CASSANDRA-8612
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8612
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Chris Lohfink
>Priority: Low
>  Labels: lhf, metrics
>
> Metrics like "sstables per read" are not updated on a range slice.  Although 
> separating things out for each type of read could make sense like we do for 
> latencies, only exposing the metrics for one type can be a little confusing 
> when people do a query and see nothing increases.  I think its sufficient to 
> use the same metrics for all reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13974) Bad prefix matching when figuring out data directory for an sstable

2019-11-04 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-13974:

Status: Ready to Commit  (was: Review In Progress)

LGTM. There's 1 nit which you could ignore or fix on commit : in the 
{{Directories}} constructor, the check for old format directories was inlined 
in the 3.11 branch but in the 4.0 version the {{olderDirectoryExists}} is still 
there.

There's also a couple of failing dtests against trunk, but I'm having trouble 
even running those 2 tests locally, if they look ok to you I'm +1. 


> Bad prefix matching when figuring out data directory for an sstable
> ---
>
> Key: CASSANDRA-13974
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13974
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We do a "startsWith" check when getting data directory for an sstable, we 
> should match including File.separator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13974) Bad prefix matching when figuring out data directory for an sstable

2019-11-04 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-13974:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Bad prefix matching when figuring out data directory for an sstable
> ---
>
> Key: CASSANDRA-13974
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13974
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We do a "startsWith" check when getting data directory for an sstable, we 
> should match including File.separator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8612) Read metrics should be updated on all types of reads

2019-11-04 Thread ABDUL AZIZ ALI (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966652#comment-16966652
 ] 

ABDUL AZIZ ALI commented on CASSANDRA-8612:
---

Hi [~cnlwsu] im interested to pick this up, can you advise if its not yet 
resolved?

> Read metrics should be updated on all types of reads
> 
>
> Key: CASSANDRA-8612
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8612
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Chris Lohfink
>Priority: Low
>  Labels: lhf, metrics
>
> Metrics like "sstables per read" are not updated on a range slice.  Although 
> separating things out for each type of read could make sense like we do for 
> latencies, only exposing the metrics for one type can be a little confusing 
> when people do a query and see nothing increases.  I think its sufficient to 
> use the same metrics for all reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue

2019-11-04 Thread Valentin Lorentz (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964055#comment-16964055
 ] 

Valentin Lorentz edited comment on CASSANDRA-15358 at 11/4/19 1:05 PM:
---

Hello,

 

I had a similar issue earlier today. It was triggered by a client using a code 
similar to this one: 
[https://docs.datastax.com/en/developer/python-driver/3.20/query_paging/#handling-paged-results-with-callbacks]
 with these changes:
 * typo fix ( {{handle_err}} ->  {{handle_error}})
 *  {{"SELECT * FROM users"}} replaced with  {{SimpleStatement("SELECT * FROM 
revision", fetch_size=100)}}

 

The table I'm querying had ~1 billion keys, and is defined with:

 
{code:java}
CREATE TYPE IF NOT EXISTS person (
fullnameblob,
nameblob,
email   blob
);

CREATE TYPE IF NOT EXISTS microtimestamp (
seconds bigint,
microsecondsint
);

CREATE TYPE IF NOT EXISTS microtimestamp_with_timezone (
timestamp   frozen,
offset  smallint,
negative_utcboolean
);

CREATE TABLE IF NOT EXISTS revision (
id  blob PRIMARY KEY,
datemicrotimestamp_with_timezone,
committer_date  microtimestamp_with_timezone,
typeascii,
directory   blob,
message blob,
author  person,
committer   person,
parents frozen>,
synthetic   boolean,
metadatatext
);
{code}
Cluster was initially created with Cassandra 3.11.4, but was migrated to 
4.0-alpha1 a few weeks ago. There are four nodes in my cluster, and no 
replication.

cassandra.yaml was the same as the one shipped with 4.0-alpha1, except for some 
paths/IP changes, and these changes:
{code:java}
concurrent_reads: 32
concurrent_writes: 64
concurrent_counter_writes: 32
trickle_fsync: true
enable_user_defined_functions: true{code}
After a restart, I am unable to reproduce the issue, so I cannot tell if the 
issue was caused by my config.


was (Author: progval):
Hello,

 

I had a similar issue earlier today. It was triggered by a client using a code 
similar to this one: 
[https://docs.datastax.com/en/developer/python-driver/3.20/query_paging/#handling-paged-results-with-callbacks]
 with these changes:
 * typo fix ( {{handle_err}} ->  {{handle_error}})
 *  {{"SELECT * FROM users"}} replaced with  {{SimpleStatement("SELECT * FROM 
revision", fetch_size=100)}}

 

The table I'm querying had ~1 billion keys, and is defined with:

 
{code:java}
CREATE TYPE IF NOT EXISTS person (
fullnameblob,
nameblob,
email   blob
);

CREATE TYPE IF NOT EXISTS microtimestamp (
seconds bigint,
microsecondsint
);

CREATE TYPE IF NOT EXISTS microtimestamp_with_timezone (
timestamp   frozen,
offset  smallint,
negative_utcboolean
);

CREATE TABLE IF NOT EXISTS revision (
id  blob PRIMARY KEY,
datemicrotimestamp_with_timezone,
committer_date  microtimestamp_with_timezone,
typeascii,
directory   blob,
message blob,
author  person,
committer   person,
parents frozen>,
synthetic   boolean,
metadatatext
);
{code}
Cluster was initially created with Cassandra 3.11.4, but was migrated to 
4.0-alpha1 a few weeks ago. There are four nodes in my cluster, and no 
replication.

cassandra.yaml was the same as the one shipped with 4.0-alpha1, except for some 
paths/IP changes, and these changes:
{code:java}
prepared_statements_cache_size_mb: 10
key_cache_size_in_mb: 10
concurrent_reads: 32
concurrent_writes: 64
concurrent_counter_writes: 32
file_cache_size_in_mb: 512
disk_access_mode: mmap
trickle_fsync: true
enable_user_defined_functions: true{code}
After a restart, I am unable to reproduce the issue, so I cannot tell if the 
issue was caused by my config.

> Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
> 
>
> Key: CASSANDRA-15358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Santhosh Kumar Ramalingam
>Assignee: Benedict Elliott Smith
>Priority: Normal
>  Labels: 4.0, alpha
>
> Hitting a bug with cassandra 4 alpha version. The same bug is 

[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966637#comment-16966637
 ] 

Benedict Elliott Smith commented on CASSANDRA-15379:


What is your rationale for an {{EnumSet}} being more maintainable than a member 
function?  As far as I understand we explicitly intend to retire this 
functionality, so planning for future uses seems counterproductive to me.

If we're adding per-table config for this, why are we blanket changing the 
behaviour for all relevant compressors?  This may well be surprising to users, 
and also seems to make the per-table config superfluous (or at least, only 
useful to restore the probably-assumed behaviour of using the same compressor 
for both flush and compaction)

 

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15385) Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default

2019-11-04 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15385:

Status: Ready to Commit  (was: Review In Progress)

+1 Both the C* and dtest changes LGTM 

> Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default
> --
>
> Key: CASSANDRA-15385
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15385
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15385) Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default

2019-11-04 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15385:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Ensure that tracing doesn't break connections in 3.x/4.0 mixed mode by default
> --
>
> Key: CASSANDRA-15385
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15385
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12090) Digest mismatch if static column is NULL

2019-11-04 Thread SandhyaMora (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SandhyaMora reassigned CASSANDRA-12090:
---

Assignee: Tommy Stendahl  (was: SandhyaMora)

> Digest mismatch if static column is NULL
> 
>
> Key: CASSANDRA-12090
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12090
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Normal
> Fix For: 3.0.9, 3.8
>
> Attachments: 12090.txt, trace.txt
>
>
> If a table has a static column and this column has a null value for a 
> partition a SELECT on this partition will always trigger a digest mismatch, 
> but the following full data read will not trigger a read repair since there 
> is  no mismatch in the data.
> This can be recreated using a 3 node ccm cluster with the following commands:
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '3' };
> CREATE TABLE foo.foo ( key int, foo int, col int static, PRIMARY KEY (key, 
> foo) );
> CONSISTENCY QUORUM;
> INSERT INTO foo.foo (key, foo) VALUES ( 1,1);
> TRACING ON;
> SELECT * FROM foo.foo WHERE key = 1 and foo =1;
> {code}
> I have added the trace in an attachment. In the trace you can see that digest 
> read is performed and that there is a digest mismatch, but the full data read 
> does not result in a mismatch. Repeating the SELECT statement will give the 
> same trace over and over.
> The problem seams to be that the name of the static column is included when 
> the digest response is calculated even if the column has no value. When the 
> digest for the data response is calculated the column name is not included.
> I think the can be solved by updating {{UnfilteredRowIterators.digest()}} so 
> excludes the static column if it has no value. I have a patch that does this, 
> it merges to both 3.0 and trunk. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12090) Digest mismatch if static column is NULL

2019-11-04 Thread SandhyaMora (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SandhyaMora reassigned CASSANDRA-12090:
---

Assignee: SandhyaMora  (was: Tommy Stendahl)

> Digest mismatch if static column is NULL
> 
>
> Key: CASSANDRA-12090
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12090
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Tommy Stendahl
>Assignee: SandhyaMora
>Priority: Normal
> Fix For: 3.0.9, 3.8
>
> Attachments: 12090.txt, trace.txt
>
>
> If a table has a static column and this column has a null value for a 
> partition a SELECT on this partition will always trigger a digest mismatch, 
> but the following full data read will not trigger a read repair since there 
> is  no mismatch in the data.
> This can be recreated using a 3 node ccm cluster with the following commands:
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '3' };
> CREATE TABLE foo.foo ( key int, foo int, col int static, PRIMARY KEY (key, 
> foo) );
> CONSISTENCY QUORUM;
> INSERT INTO foo.foo (key, foo) VALUES ( 1,1);
> TRACING ON;
> SELECT * FROM foo.foo WHERE key = 1 and foo =1;
> {code}
> I have added the trace in an attachment. In the trace you can see that digest 
> read is performed and that there is a digest mismatch, but the full data read 
> does not result in a mismatch. Repeating the SELECT statement will give the 
> same trace over and over.
> The problem seams to be that the name of the static column is included when 
> the digest response is calculated even if the column has no value. When the 
> digest for the data response is calculated the column name is not included.
> I think the can be solved by updating {{UnfilteredRowIterators.digest()}} so 
> excludes the static column if it has no value. I have a patch that does this, 
> it merges to both 3.0 and trunk. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org