[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

2016-06-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344556#comment-15344556
 ] 

Stefan Podkowinski commented on CASSANDRA-3486:
---

I've now looked at this issue as a potential use case for CASSANDRA-12016 and 
added a test on top of it. The branch can be found at 
[WIP-3486|https://github.com/spodkowinski/cassandra/tree/WIP-3486] and the test 
I'm talking about in 
[ActiveRepairServiceMessagingTest.java|https://github.com/spodkowinski/cassandra/blob/3a9ba2edcfe5a3a774089884d5fa7f4df4c9b70c/test/unit/org/apache/cassandra/service/ActiveRepairServiceMessagingTest.java].
 

My goal was to be able to make coordination between different nodes in repair 
scenarios easier to test. The basic cases covered so far are pretty simple, but 
I like to add more edge cases in the future. Nonetheless I wanted to share this 
early on in case [~pauloricardomg] and others have any feedback on how this 
approach would be helpful to make progress on this issue.



> Node Tool command to stop repair
> 
>
> Key: CASSANDRA-3486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
> Environment: JVM
>Reporter: Vijay
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: repair
> Fix For: 2.1.x
>
> Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11960) Hints are not seekable

2016-06-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344161#comment-15344161
 ] 

Stefan Podkowinski commented on CASSANDRA-11960:


I just noticed that {{HintsReader.PagesIterator}} will create pages based on 
the current position in the data input. My approach assumed that we could just 
skip pages from the iterator to resume dispatching hints from the last send 
page. But this won't work as long as you are not actually consuming the pages 
and you'll always get the first hint as long as you don't move the input ahead. 
So I'm wondering if the whole approach really makes sense or if we need to bite 
the bullet and make the reader seekable, if thats possible.


> Hints are not seekable
> --
>
> Key: CASSANDRA-11960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11960
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Stefan Podkowinski
>
> Got the following error message on trunk. No idea how to reproduce. But the 
> only thing the (not overridden) seek method does is throwing this exception.
> {code}
> ERROR [HintsDispatcher:2] 2016-06-05 18:51:09,397 CassandraDaemon.java:222 - 
> Exception in thread Thread[HintsDispatcher:2,1,main]
> java.lang.UnsupportedOperationException: Hints are not seekable.
>   at org.apache.cassandra.hints.HintsReader.seek(HintsReader.java:114) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatcher.seek(HintsDispatcher.java:79) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:257)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11960) Hints are not seekable

2016-06-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11960:
---
Status: Awaiting Feedback  (was: In Progress)

> Hints are not seekable
> --
>
> Key: CASSANDRA-11960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11960
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Stefan Podkowinski
>
> Got the following error message on trunk. No idea how to reproduce. But the 
> only thing the (not overridden) seek method does is throwing this exception.
> {code}
> ERROR [HintsDispatcher:2] 2016-06-05 18:51:09,397 CassandraDaemon.java:222 - 
> Exception in thread Thread[HintsDispatcher:2,1,main]
> java.lang.UnsupportedOperationException: Hints are not seekable.
>   at org.apache.cassandra.hints.HintsReader.seek(HintsReader.java:114) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatcher.seek(HintsDispatcher.java:79) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:257)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11960) Hints are not seekable

2016-06-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333995#comment-15333995
 ] 

Stefan Podkowinski commented on CASSANDRA-11960:


I've now created a patch that would move away from file offset based retries 
and instead replays the whole page. As describe above, the 
{{RebufferingInputStream}} data input doesn't provide a way to seek an offset. 
Although this should be possible to implement, I think these changes should be 
considered more carefully, as they have to be done in the common io.utils code. 
Maybe we should open a different ticket for that?

Although replaying a complete page isn't optimal, as we'll deliver duplicate 
hints, we don't guarantee at-most-once semantics for hints anyway. This is not 
so great for non-idempotent operations, such as list appends (counters are not 
hinted), but the current implementation is clearly broken so we have to do 
something about it. But I'm open to ideas how to further optimize this.



> Hints are not seekable
> --
>
> Key: CASSANDRA-11960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11960
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Stefan Podkowinski
>
> Got the following error message on trunk. No idea how to reproduce. But the 
> only thing the (not overridden) seek method does is throwing this exception.
> {code}
> ERROR [HintsDispatcher:2] 2016-06-05 18:51:09,397 CassandraDaemon.java:222 - 
> Exception in thread Thread[HintsDispatcher:2,1,main]
> java.lang.UnsupportedOperationException: Hints are not seekable.
>   at org.apache.cassandra.hints.HintsReader.seek(HintsReader.java:114) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatcher.seek(HintsDispatcher.java:79) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:257)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12016) Create MessagingService mocking classes

2016-06-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-12016:
---
Status: Awaiting Feedback  (was: In Progress)

> Create MessagingService mocking classes
> ---
>
> Key: CASSANDRA-12016
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12016
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>
> Interactions between clients and nodes in the cluster are taking place by 
> exchanging messages through the {{MessagingService}}. Black box testing for 
> message based systems is usually pretty easy, as we're just dealing with 
> messages in/out. My suggestion would be to add tests that make use of this 
> fact by mocking message exchanges via MessagingService. Given the right use 
> case, this would turn out to be a much simpler and more efficient alternative 
> for dtests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12016) Create MessagingService mocking classes

2016-06-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333973#comment-15333973
 ] 

Stefan Podkowinski commented on CASSANDRA-12016:


Please find the suggested implementation in the linked WIP branch. An example 
how a unit test using those classes looks like can be found 
[here|https://github.com/spodkowinski/cassandra/blob/3cd4ef203cd147713a6f8c4b1466703436124e0b/test/unit/org/apache/cassandra/hints/HintsServiceTest.java].
 I'm looking forward for any feedback.


> Create MessagingService mocking classes
> ---
>
> Key: CASSANDRA-12016
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12016
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>
> Interactions between clients and nodes in the cluster are taking place by 
> exchanging messages through the {{MessagingService}}. Black box testing for 
> message based systems is usually pretty easy, as we're just dealing with 
> messages in/out. My suggestion would be to add tests that make use of this 
> fact by mocking message exchanges via MessagingService. Given the right use 
> case, this would turn out to be a much simpler and more efficient alternative 
> for dtests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12016) Create MessagingService mocking classes

2016-06-16 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-12016:
--

 Summary: Create MessagingService mocking classes
 Key: CASSANDRA-12016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12016
 Project: Cassandra
  Issue Type: New Feature
  Components: Testing
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Interactions between clients and nodes in the cluster are taking place by 
exchanging messages through the {{MessagingService}}. Black box testing for 
message based systems is usually pretty easy, as we're just dealing with 
messages in/out. My suggestion would be to add tests that make use of this fact 
by mocking message exchanges via MessagingService. Given the right use case, 
this would turn out to be a much simpler and more efficient alternative for 
dtests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11960) Hints are not seekable

2016-06-13 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski reassigned CASSANDRA-11960:
--

Assignee: Stefan Podkowinski

> Hints are not seekable
> --
>
> Key: CASSANDRA-11960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11960
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Stefan Podkowinski
>
> Got the following error message on trunk. No idea how to reproduce. But the 
> only thing the (not overridden) seek method does is throwing this exception.
> {code}
> ERROR [HintsDispatcher:2] 2016-06-05 18:51:09,397 CassandraDaemon.java:222 - 
> Exception in thread Thread[HintsDispatcher:2,1,main]
> java.lang.UnsupportedOperationException: Hints are not seekable.
>   at org.apache.cassandra.hints.HintsReader.seek(HintsReader.java:114) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatcher.seek(HintsDispatcher.java:79) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:257)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11886) Streaming will miss sections for early opened sstables during compaction

2016-06-13 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327384#comment-15327384
 ] 

Stefan Podkowinski commented on CASSANDRA-11886:


Thanks both of you for responding so quickly and fixing the issue!

> Streaming will miss sections for early opened sstables during compaction
> 
>
> Key: CASSANDRA-11886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Marcus Eriksson
>Priority: Critical
>  Labels: correctness, repair, streaming
> Fix For: 2.1.15, 2.2.7, 3.8, 3.0.8
>
> Attachments: 9700-test-2_1.patch
>
>
> Once validation compaction has been finished, all mismatching sstable 
> sections for a token range will be used for streaming as return by 
> {{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
> restrict the sstable candidates by checking if they can be found in 
> {{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
> {{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
> non-canonical sstables as well. In case of early opened sstables this becomes 
> a problem, as the tree will be update with {{OpenReason.EARLY}} replacements 
> that cannot be found in canonical. But whenever 
> {{getSSTableSectionsForRanges}} will get a early instance from the view, it 
> will fail to retrieve the corresponding canonical version from the map, as 
> the different generation will cause a hashcode mismatch. Please find a test 
> attached.
> As a consequence not all sections for a range are streamed. In our case this 
> has caused deleted data to reappear, as sections holding tombstones were left 
> out due to this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7622) Implement virtual tables

2016-06-10 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324531#comment-15324531
 ] 

Stefan Podkowinski edited comment on CASSANDRA-7622 at 6/10/16 2:34 PM:


bq. Getting access to metrics in a read only, non JMX fashion would be awesome 
from an operational perspective and be 100% worth it by itself.

[~rustyrazorblade], as much as I share your aversion towards JMX, I'm not 
really sure a lot of ops people would even notice this new feature as long as 
we don't pull the plug on JMX. A lot of existing monitoring solutions are based 
on JMX (which indirectly includes solutions on top of jolokia) and I don't 
expect a lot of enthusiasm among vendors to adopt virtual tables instead. So 
JMX will be here to stay, which begs the question who we have in mind for this 
feature?

Just starting with a simplified, non-replicated, read-only version of virtual 
tables is also raising some red flags for me. We should be able to at least 
answer how advanced use cases could be implemented based on the current query 
execution model. If we can't, virtual tables are probably just a dead end road 
for any further steps we want to take to improve operational aspects.



was (Author: spo...@gmail.com):
bq. Getting access to metrics in a read only, non JMX fashion would be awesome 
from an operational perspective and be 100% worth it by itself.

[~rustyrazorblade], as much as I share your aversion towards JMX, I'm not 
really sure a lot of ops people would even notice this new feature as long as 
we don't pull the plug on JMX. All existing monitoring solutions are based on 
JMX (which indirectly includes solutions on top of jolokia) and I don't expect 
a lot of enthusiasm among vendors to adopt virtual tables instead. So JMX will 
be here to stay, which begs the question who we have in mind for this feature?

Just starting with a simplified, non-replicated, read-only version of virtual 
tables is also raising some red flags for me. We should be able to at least 
answer how advanced use cases could be implemented based on the current query 
execution model. If we can't, virtual tables are probably just a dead end road 
for any further steps we want to take to improve operational aspects.


> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2016-06-10 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324531#comment-15324531
 ] 

Stefan Podkowinski commented on CASSANDRA-7622:
---

bq. Getting access to metrics in a read only, non JMX fashion would be awesome 
from an operational perspective and be 100% worth it by itself.

[~rustyrazorblade], as much as I share your aversion towards JMX, I'm not 
really sure a lot of ops people would even notice this new feature as long as 
we don't pull the plug on JMX. All existing monitoring solutions are based on 
JMX (which indirectly includes solutions on top of jolokia) and I don't expect 
a lot of enthusiasm among vendors to adopt virtual tables instead. So JMX will 
be here to stay, which begs the question who we have in mind for this feature?

Just starting with a simplified, non-replicated, read-only version of virtual 
tables is also raising some red flags for me. We should be able to at least 
answer how advanced use cases could be implemented based on the current query 
execution model. If we can't, virtual tables are probably just a dead end road 
for any further steps we want to take to improve operational aspects.


> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-06-10 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324477#comment-15324477
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


You're correct by pointing out that live columns can prevent fully normalizing 
all RTs using the RTL approach in patch v4. It will still be more accurate than 
without RTL consolidation, but the question is if the additional complexity is 
worth it. If you'd be more comfortable going with the patch initially suggested 
by yourself, I'm confident that this will still be a big improvement. 


> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 
> 11349-2.1-v4.patch, 11349-2.1.patch, 11349-2.2-v4.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11960) Hints are not seekable

2016-06-07 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318414#comment-15318414
 ] 

Stefan Podkowinski commented on CASSANDRA-11960:


This looks like this was caused by CASSANDRA-5863, where the 
{{RandomAccessReader}} based input has been replaced with 
{{RebufferingInputStream}}. 

The way I understand how hint dispatching is currently implemented is that we 
deliver hints from a file by sending individual {{HintsReader.Page}} s until 
finished. In cases of any error while transmitting a page, 
{{HintsStore.markDispatchOffset}} will be used to remember from where to resume 
delivering hints by using a long offset. Since the byte offset based seek isn't 
supported anymore due to mentioned changes, the exception is thrown.

> Hints are not seekable
> --
>
> Key: CASSANDRA-11960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11960
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>
> Got the following error message on trunk. No idea how to reproduce. But the 
> only thing the (not overridden) seek method does is throwing this exception.
> {code}
> ERROR [HintsDispatcher:2] 2016-06-05 18:51:09,397 CassandraDaemon.java:222 - 
> Exception in thread Thread[HintsDispatcher:2,1,main]
> java.lang.UnsupportedOperationException: Hints are not seekable.
>   at org.apache.cassandra.hints.HintsReader.seek(HintsReader.java:114) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatcher.seek(HintsDispatcher.java:79) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:257)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-06-02 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312087#comment-15312087
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


I've now attached a patch for the last mentioned implementation as 
{{11349-2.1-v4.patch}} and {{11349-2.2-v4.patch}} to the ticket.

Test results are as follows (reported failures cannot be reproduced locally and 
seem to be unrelated to me):

||2.1||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11349-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11349-2.2]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.2-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.2-testall/]|

Anyone willing to take another look and actually commit a patch for this issue? 
I've pushed my WIP branch 
[here|https://github.com/spodkowinski/cassandra/commits/WIP2-11349] with 
individual commits that might help during the review.


> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 
> 11349-2.1-v4.patch, 11349-2.1.patch, 11349-2.2-v4.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-06-02 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
Attachment: 11349-2.2-v4.patch
11349-2.1-v4.patch

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 
> 11349-2.1-v4.patch, 11349-2.1.patch, 11349-2.2-v4.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11000) Mixing LWT and non-LWT operations can result in an LWT operation being acknowledged but not applied

2016-06-01 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309949#comment-15309949
 ] 

Stefan Podkowinski commented on CASSANDRA-11000:


Technically it should be possible to prevent applying LWTs in case the local 
data has timestamps in the future. But since in most cases you're updating 
individual columns, you have to do this on column-by-column basis. Also I can't 
really see why the behavior on this aspect should be different from the 
behavior of regular updates and would assume that most users would be rather 
surprised by it. 

We could probably make LWTs and SERIAL reads mandatory using a table property 
to prevent _unintentionally_ mixing LWT/non-LWT statements. But there are still 
valid use cases where you would not want to take away this option (mixing 
LWTs/non-LWTs), e.g. when strictly updating different columns or optimizing for 
performance. So I doubt that this would be really useful, as users enabling 
such a table property would already be aware of the details around the topic 
anyway and not need any hand holding by locking down functionality.



> Mixing LWT and non-LWT operations can result in an LWT operation being 
> acknowledged but not applied
> ---
>
> Key: CASSANDRA-11000
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11000
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
> Environment: Cassandra 2.1, 2.2, and 3.0 on Linux and OS X.
>Reporter: Sebastian Marsching
>
> When mixing light-weight transaction (LWT, a.k.a. compare-and-set, 
> conditional update) operations with regular operations, it can happen that an 
> LWT operation is acknowledged (applied = True), even though the update has 
> not been applied and a SELECT operation still returns the old data.
> For example, consider the following table:
> {code}
> CREATE TABLE test (
> pk text,
> ck text,
> v text,
> PRIMARY KEY (pk, ck)
> );
> {code}
> We start with an empty table and insert data using a regular (non-LWT) 
> operation:
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> A following SELECT statement returns the data as expected. Now we do a 
> conditional update (LWT):
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> As expected, the update is applied and a following SELECT statement shows the 
> updated value.
> Now we do the same but use a time stamp that is slightly in the future (e.g. 
> a few seconds) for the INSERT statement (obviously $time$ needs to be 
> replaced by a time stamp that is slightly ahead of the system clock).
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP 
> $time$;
> {code}
> Now, running the same UPDATE statement still report success (applied = True). 
> However, a subsequent SELECT yields the old value ('123') instead of the 
> updated value ('456'). Inspecting the time stamp of the value indicates that 
> it has not been replaced (the value from the original INSERT is still in 
> place).
> This behavior is exhibited in an single-node cluster running Cassandra 
> 2.1.11, 2.2.4, and 3.0.1.
> Testing this for a multi-node cluster is a bit more tricky, so I only tested 
> it with Cassandra 2.2.4. Here, I made one of the nodes lack behind in time 
> for a few seconds (using libfaketime). I used a replication factor of three 
> for the test keyspace. In this case, the behavior can be demonstrated even 
> without using an explicitly specified time stamp. Running
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> on a node with the regular clock followed by
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> on the node lagging behind results in the UPDATE to report success, but the 
> old value still being used.
> Interestingly, everything works as expected if using LWT operations 
> consistently: When running
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456';
> {code}
> in an alternating fashion on two nodes (one with a "normal" clock, one with 
> the clock lagging behind), the updates are applied as expected. When checking 
> the time stamps ("{{SELECT WRITETIME(v) FROM test;}}"), one can see that the 
> time stamp is increased by just a single tick when the statement is executed 
> on the node lagging behind.
> I think that this problem is strongly related to (or maybe even the same as) 
> the one described in CASSANDRA-7801, even though CASSANDRA-7801 was mainly 
> concerned about a single-node cluster. However, the fact that this problem 
> still exists in current 

[jira] [Commented] (CASSANDRA-11000) Mixing LWT and non-LWT operations can result in an LWT operation being acknowledged but not applied

2016-05-31 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307808#comment-15307808
 ] 

Stefan Podkowinski commented on CASSANDRA-11000:


The reason for using LWTs is to make all Cassandra nodes agree on a certain 
order how statements are executed. Executing non-LWT statements in-between is 
causing unexpected results, as those statements will not be part of the paxos 
execution path and cause non-deterministic effects based on the (wall) time 
they are actually executed.
This is described in the datastax documentation 
[here|https://docs.datastax.com/en/cassandra/2.2/cassandra/dml/dmlLtwtTransactions.html]
 including a clear warning not to mix both types of statements.

> Mixing LWT and non-LWT operations can result in an LWT operation being 
> acknowledged but not applied
> ---
>
> Key: CASSANDRA-11000
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11000
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
> Environment: Cassandra 2.1, 2.2, and 3.0 on Linux and OS X.
>Reporter: Sebastian Marsching
>
> When mixing light-weight transaction (LWT, a.k.a. compare-and-set, 
> conditional update) operations with regular operations, it can happen that an 
> LWT operation is acknowledged (applied = True), even though the update has 
> not been applied and a SELECT operation still returns the old data.
> For example, consider the following table:
> {code}
> CREATE TABLE test (
> pk text,
> ck text,
> v text,
> PRIMARY KEY (pk, ck)
> );
> {code}
> We start with an empty table and insert data using a regular (non-LWT) 
> operation:
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> A following SELECT statement returns the data as expected. Now we do a 
> conditional update (LWT):
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> As expected, the update is applied and a following SELECT statement shows the 
> updated value.
> Now we do the same but use a time stamp that is slightly in the future (e.g. 
> a few seconds) for the INSERT statement (obviously $time$ needs to be 
> replaced by a time stamp that is slightly ahead of the system clock).
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP 
> $time$;
> {code}
> Now, running the same UPDATE statement still report success (applied = True). 
> However, a subsequent SELECT yields the old value ('123') instead of the 
> updated value ('456'). Inspecting the time stamp of the value indicates that 
> it has not been replaced (the value from the original INSERT is still in 
> place).
> This behavior is exhibited in an single-node cluster running Cassandra 
> 2.1.11, 2.2.4, and 3.0.1.
> Testing this for a multi-node cluster is a bit more tricky, so I only tested 
> it with Cassandra 2.2.4. Here, I made one of the nodes lack behind in time 
> for a few seconds (using libfaketime). I used a replication factor of three 
> for the test keyspace. In this case, the behavior can be demonstrated even 
> without using an explicitly specified time stamp. Running
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> on a node with the regular clock followed by
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> on the node lagging behind results in the UPDATE to report success, but the 
> old value still being used.
> Interestingly, everything works as expected if using LWT operations 
> consistently: When running
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456';
> {code}
> in an alternating fashion on two nodes (one with a "normal" clock, one with 
> the clock lagging behind), the updates are applied as expected. When checking 
> the time stamps ("{{SELECT WRITETIME(v) FROM test;}}"), one can see that the 
> time stamp is increased by just a single tick when the statement is executed 
> on the node lagging behind.
> I think that this problem is strongly related to (or maybe even the same as) 
> the one described in CASSANDRA-7801, even though CASSANDRA-7801 was mainly 
> concerned about a single-node cluster. However, the fact that this problem 
> still exists in current versions of Cassandra makes me suspect that either it 
> is a different problem or the original problem was not fixed completely with 
> the patch from CASSANDRA-7801.
> I found CASSANDRA-9655 which suggest removing the changes introduced with 
> CASSANDRA-7801 because they can be problematic under certain circumstances, 
> but I am not sure whether this is the right place to discuss the issue I am 
> 

[jira] [Updated] (CASSANDRA-11886) Streaming will miss sections for early opened sstables during compaction

2016-05-24 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11886:
---
Description: 
Once validation compaction has been finished, all mismatching sstable sections 
for a token range will be used for streaming as return by 
{{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
restrict the sstable candidates by checking if they can be found in 
{{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
{{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
non-canonical sstables as well. In case of early opened sstables this becomes a 
problem, as the tree will be update with {{OpenReason.EARLY}} replacements that 
cannot be found in canonical. But whenever {{getSSTableSectionsForRanges}} will 
get a early instance from the view, it will fail to retrieve the corresponding 
canonical version from the map, as the different generation will cause a 
hashcode mismatch. Please find a test attached.

As a consequence not all sections for a range are streamed. In our case this 
has caused deleted data to reappear, as sections holding tombstones were left 
out due to this behavior.

  was:
Once validation compaction has been finished, all mismatching sstable sections 
for a token range will be used for streaming as return by 
{{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
restrict the sstable candidates by checking if they can be found in 
{{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
{{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
non-canonical sstables as well. In case of early opened sstables this becomes a 
problem, as the tree will be update with {{OpenReason.EARLY}} replacements that 
cannot be found in canonical. In this case {{getSSTableSectionsForRanges}} will 
get a early instance from the view, but fails to retrieve the corresponding 
canonical version from the map, as the different generation will cause a 
hashcode mismatch. Please find a test attached.

As a consequence not all sections for a range are streamed. In our case this 
has caused deleted data to reappear, as sections holding tombstones were left 
out due to this behavior.


> Streaming will miss sections for early opened sstables during compaction
> 
>
> Key: CASSANDRA-11886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Priority: Critical
>  Labels: correctness, repair, streaming
> Attachments: 9700-test-2_1.patch
>
>
> Once validation compaction has been finished, all mismatching sstable 
> sections for a token range will be used for streaming as return by 
> {{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
> restrict the sstable candidates by checking if they can be found in 
> {{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
> {{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
> non-canonical sstables as well. In case of early opened sstables this becomes 
> a problem, as the tree will be update with {{OpenReason.EARLY}} replacements 
> that cannot be found in canonical. But whenever 
> {{getSSTableSectionsForRanges}} will get a early instance from the view, it 
> will fail to retrieve the corresponding canonical version from the map, as 
> the different generation will cause a hashcode mismatch. Please find a test 
> attached.
> As a consequence not all sections for a range are streamed. In our case this 
> has caused deleted data to reappear, as sections holding tombstones were left 
> out due to this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11886) Streaming will miss sections for early opened sstables during compaction

2016-05-24 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11886:
---
Description: 
Once validation compaction has been finished, all mismatching sstable sections 
for a token range will be used for streaming as return by 
{{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
restrict the sstable candidates by checking if they can be found in 
{{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
{{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
non-canonical sstables as well. In case of early opened sstables this becomes a 
problem, as the tree will be update with {{OpenReason.EARLY}} replacements that 
cannot be found in canonical. In this case {{getSSTableSectionsForRanges}} will 
get a early instance from the view, but fails to retrieve the corresponding 
canonical version from the map, as the different generation will cause a 
hashcode mismatch. Please find a test attached.

As a consequence not all sections for a range are streamed. In our case this 
has caused deleted data to reappear, as sections holding tombstones were left 
out due to this behavior.

  was:
Once validation compaction has been finished, all mismatching sstable sections 
for a token range will be used for streaming as return by 
{{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
restrict the sstable candidates by checking if they can be found in 
{{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
{{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
sstables as well, that are not necessarily in canonical. In case of early 
opened sstables this becomes a problem, as the tree will be update with 
{{OpenReason.EARLY}} replacements that cannot be found in canonical. In this 
case {{getSSTableSectionsForRanges}} will get a early instance from the view, 
but fails to retrieve the corresponding canonical version from the map, as the 
different generation will cause a hashcode mismatch. Please find a test 
attached.

As a consequence not all sections for a range are streamed. In our case this 
has caused deleted data to reappear, as sections holding tombstones were left 
out due to this behavior.


> Streaming will miss sections for early opened sstables during compaction
> 
>
> Key: CASSANDRA-11886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Priority: Critical
>  Labels: correctness, repair, streaming
> Attachments: 9700-test-2_1.patch
>
>
> Once validation compaction has been finished, all mismatching sstable 
> sections for a token range will be used for streaming as return by 
> {{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
> restrict the sstable candidates by checking if they can be found in 
> {{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
> {{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
> non-canonical sstables as well. In case of early opened sstables this becomes 
> a problem, as the tree will be update with {{OpenReason.EARLY}} replacements 
> that cannot be found in canonical. In this case 
> {{getSSTableSectionsForRanges}} will get a early instance from the view, but 
> fails to retrieve the corresponding canonical version from the map, as the 
> different generation will cause a hashcode mismatch. Please find a test 
> attached.
> As a consequence not all sections for a range are streamed. In our case this 
> has caused deleted data to reappear, as sections holding tombstones were left 
> out due to this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11886) Streaming will miss sections for early opened sstables during compaction

2016-05-24 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-11886:
--

 Summary: Streaming will miss sections for early opened sstables 
during compaction
 Key: CASSANDRA-11886
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11886
 Project: Cassandra
  Issue Type: Bug
Reporter: Stefan Podkowinski
Priority: Critical
 Attachments: 9700-test-2_1.patch

Once validation compaction has been finished, all mismatching sstable sections 
for a token range will be used for streaming as return by 
{{StreamSession.getSSTableSectionsForRanges}}. Currently 2.1 will try to 
restrict the sstable candidates by checking if they can be found in 
{{CANONICAL_SSTABLES}} and will ignore them otherwise. At the same time 
{{IntervalTree}} in the {{DataTracker}} will be build based on replaced 
sstables as well, that are not necessarily in canonical. In case of early 
opened sstables this becomes a problem, as the tree will be update with 
{{OpenReason.EARLY}} replacements that cannot be found in canonical. In this 
case {{getSSTableSectionsForRanges}} will get a early instance from the view, 
but fails to retrieve the corresponding canonical version from the map, as the 
different generation will cause a hashcode mismatch. Please find a test 
attached.

As a consequence not all sections for a range are streamed. In our case this 
has caused deleted data to reappear, as sections holding tombstones were left 
out due to this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-20 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293719#comment-15293719
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


I've now created a new patch version 
[here|https://github.com/spodkowinski/cassandra/commit/c8601f8cd3921e754bcbe8c9362cf3d2e7072e1e]
  that basically combines both of your ideas of doing the digest updates in the 
serializer and using {{RangeTombstonesList}} to normalize RT intervals. Tests 
look good, feel free to add your own. [~blambov], can you think of any further 
cases that would not be covered by this approach? 

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-20 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291211#comment-15291211
 ] 

Stefan Podkowinski edited comment on CASSANDRA-11349 at 5/20/16 8:26 AM:
-

I've been debuging the latest mentioned error case using the following cql/ccm 
statements and a local 2 node cluster.

{code}
create keyspace ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
use ks;
CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float,
 PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'c';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'd';
ccm node1 flush
{code}

Timestamps have been added for easier tracking of the specific tombstone in the 
debugger.

ColmnIndex.Builder.buildForCompaction() will add tombstones in the following 
order to the tracker:

*Node1*

{{1463656272792: c1 = 'a' AND c2 = 'b'}}
First RT, added to unwritten + opened tombstones

{{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}}
Overshadowed by RT added before while being older at the same time. Will not be 
added and simply ignored.

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}}
Overshaded by first and only RT added to opened so far, but newer and will thus 
be added to unwritten+opened

We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for 
message digest.


*Node2*

{{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START)
First RT, added to unwritten + opened tombstones

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END)
comparision of EOC flag (Tracker:251) of previously added RT will cause having 
it removed from the opened list (Tracker:258). Afterwards the current RT will 
be added to unwritten + opened.

{{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color})
Gets compared with prev. added RT, which supersedes the current one and thus 
stays in the list. Will again be added to unwritten + opened list.

We end up with 3 unwritten RTs, including 1463656272792 twice.

-I still haven't been able to exactly pinpoint why the reducer will be called 
twice with the same TS, but since [~blambov] explicitly mentioned that 
possibility, I guess it's intended behavior (but why? :)).-

Running sstable2json makes it more obvious how node2 flushes the RTs:

{noformat}
[
{"key": "a",
 "cells": [["b:_","b:d:_",1463656272792,"t",1463731877],
   ["b:d:_","b:d:!",1463656272793,"t",1463731886],
   ["b:d:!","b:!",1463656272792,"t",1463731877]]}
]
{noformat}


was (Author: spo...@gmail.com):
I've been debuging the latest mentioned error case using the following cql/ccm 
statements and a local 2 node cluster.

{code}
create keyspace ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
use ks;
CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float,
 PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'c';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'd';
ccm node1 flush
{code}

Timestamps have been added for easier tracking of the specific tombstone in the 
debugger.

ColmnIndex.Builder.buildForCompaction() will add tombstones in the following 
order to the tracker:

*Node1*

{{1463656272792: c1 = 'a' AND c2 = 'b'}}
First RT, added to unwritten + opened tombstones

{{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}}
Overshadowed by RT added before while being older at the same time. Will not be 
added and simply ignored.

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}}
Overshaded by first and only RT added to opened so far, but newer and will thus 
be added to unwritten+opened

We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for 
message digest.


*Node2*

{{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START)
First RT, added to unwritten + opened tombstones

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END)
comparision of EOC flag (Tracker:251) of previously added RT will cause having 
it removed from the opened list (Tracker:258). Afterwards the current RT will 
be added to unwritten + opened.

{{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color})
Gets compared with prev. added RT, which supersedes the current one and thus 
stays in the list. Will again be added to unwritten + opened list.

We end up with 3 unwritten RTs, 

[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-19 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291211#comment-15291211
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


I've been debuging the latest mentioned error case using the following cql/ccm 
statements and a local 2 node cluster.

{code}
create keyspace ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
use ks;
CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float,
 PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'c';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'd';
ccm node1 flush
{code}

Timestamps have been added for easier tracking of the specific tombstone in the 
debugger.

ColmnIndex.Builder.buildForCompaction() will add tombstones in the following 
order to the tracker:

*Node1*

{{1463656272792: c1 = 'a' AND c2 = 'b'}}
First RT, added to unwritten + opened tombstones

{{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}}
Overshadowed by RT added before while being older at the same time. Will not be 
added and simply ignored.

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}}
Overshaded by first and only RT added to opened so far, but newer and will thus 
be added to unwritten+opened

We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for 
message digest.


*Node2*

{{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START)
First RT, added to unwritten + opened tombstones

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END)
comparision of EOC flag (Tracker:251) of previously added RT will cause having 
it removed from the opened list (Tracker:258). Afterwards the current RT will 
be added to unwritten + opened.

{{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color})
Gets compared with prev. added RT, which supersedes the current one and thus 
stays in the list. Will again be added to unwritten + opened list.

We end up with 3 unwritten RTs, including 1463656272792 twice.

I still haven't been able to exactly pinpoint why the reducer will be called 
twice with the same TS, but since [~blambov] explicitly mentioned that 
possibility, I guess it's intended behavior (but why? :)). 

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences 

[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-11 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280088#comment-15280088
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


Thanks for the clarification. It's really helpful to understand the intention 
of how those parts are suppose to work together. 

The serializer approach seems to be a good idea how to handle this, but  there 
are still 
[cases|https://github.com/spodkowinski/cassandra-dtest/blob/b110685bceddbcb63ebc744ba54a25cb268f2478/repair_tests/repair_test.py#L438:L451]
 \[1\] not handled correctly. I'm going to take a closer look to understand 
why. I'd also like to do some more testing for potential digest mismatch storms 
during rolling upgrades, but wouldn't expect any blockers so far. 

\[1\] nosetests 
repair_tests/repair_test.py:TestRepair.shadowed_range_tombstone_digest_parallel_repair_test


> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-10 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277977#comment-15277977
 ] 

Stefan Podkowinski edited comment on CASSANDRA-11349 at 5/10/16 11:44 AM:
--

To quickly sum up the current behavior.. {{ColumnIndex.Builder}} is created for 
each {{LazilyCompactedRow.update()}} call. The builder will iterate through all 
atoms produced by the {{MergeIterator}} and uses a {{RangeTombstone.Tracker}} 
instance for tombstone normalization. Tombstones will be added to the tracker 
from {{Builder.add()}} and by {{LCR.Reducer.getReduced()}}, which in turn will 
be called once for all atoms for the same column as considered by 
{{onDiskAtomComparator}}. 

[~blambov], so what you're saying is that we can't be sure that the 
{{MergeIterator}} will always be able to provide deterministically ordered 
values, as write order may be different and we therefor cannot simply iterate 
through the reducer to create a correct digest. 

What I'm a bit concerned about while trying to understand Branimir's approach 
is that at some point {{getReduced()}} will add the RT to the tracker while in 
another scenario the RT will be added later and will cause the serializer be 
called differently as well. Or to put this in other words, if we can't be sure 
about the reducer returning deterministically ordered values, won't this effect 
the tracker and digest calculation in the builder as well?


was (Author: spo...@gmail.com):
To quickly sum up the current behavior.. {{ColumnIndex.Builder}} is created for 
each {{LazilyCompactedRow.update()}} call. The builder will iterate through all 
atoms produced by the {{MergeIterator}} and uses a {{RangeTombstone.Tracker}} 
instance for tombstone normalization. Tombstones will be added to the tracker 
from {{Builder.add()}} and by {{LCR.Reducer.getReduced()}}, which in turn will 
be called once for all atoms for the same column as considered by 
{{onDiskAtomComparator}}. 

[~blambov], so what you're saying is that we can't be sure that the 
{{MergeIterator}} will always be able to provide deterministic ordered values, 
as write order may be different and we therefor cannot simply iterate through 
the reducer to create a correct digest. 

What I'm a bit concerned about while trying to understand Branimir's approach 
is that at some point {{getReduced()}} will add the RT to the tracker while in 
another scenario the RT will be added later and will cause the serializer be 
called differently as well. Or to put this in other words, if we can't be sure 
about the reducer returning deterministic ordered values, won't this effect the 
tracker and digest calculation in the builder as well?

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> 

[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-10 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277977#comment-15277977
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


To quickly sum up the current behavior.. {{ColumnIndex.Builder}} is created for 
each {{LazilyCompactedRow.update()}} call. The builder will iterate through all 
atoms produced by the {{MergeIterator}} and uses a {{RangeTombstone.Tracker}} 
instance for tombstone normalization. Tombstones will be added to the tracker 
from {{Builder.add()}} and by {{LCR.Reducer.getReduced()}}, which in turn will 
be called once for all atoms for the same column as considered by 
{{onDiskAtomComparator}}. 

[~blambov], so what you're saying is that we can't be sure that the 
{{MergeIterator}} will always be able to provide deterministic ordered values, 
as write order may be different and we therefor cannot simply iterate through 
the reducer to create a correct digest. 

What I'm a bit concerned about while trying to understand Branimir's approach 
is that at some point {{getReduced()}} will add the RT to the tracker while in 
another scenario the RT will be added later and will cause the serializer be 
called differently as well. Or to put this in other words, if we can't be sure 
about the reducer returning deterministic ordered values, won't this effect the 
tracker and digest calculation in the builder as well?

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-03 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268627#comment-15268627
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


Sounds reasonable and I agree that the code changes should be less invasive as 
possible. We're talking about 2.x so we should avoid heavy refactoring. 
Mentioned class design could possibly still be improved, but that depends where 
to go from here..

To wrap up available patch options:

1) {{1349-2.1.patch}} with 2 changed lines would address the issue initially 
described
2) {{11349-2.1-v3.patch}} introduces a bit more changes but will also create 
correct digests for shadowed range tombstones and cells (see 
[dtest|https://github.com/spodkowinski/cassandra-dtest/blob/CASSANDRA-11349/repair_tests/repair_test.py#L425])

Any opinions on this except from Fabian's and mine? It would be good to get 
some feedback from someone how would be actually willing to commit something 
like this.

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-02 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266474#comment-15266474
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


I'm not sure introducing a new tracker interface is the best way to handle 
this. It took me a while to actually figure out the differences between the 
{{update}} implementations in both trackers, since for most parts it's sharing 
the same copied code. It would probably be better to have ValidationTracker 
subclass RegularCompactionTracker, add {{remove/addUnwrittenTombstone}} 
implemented empty for validation.

The {{addRangeTombstone}} semantics also look like a case of leaky abstractions 
to me. It's adding nothing at all for regular compaction, but serves as early 
exit path for validation. 

Good news is that the dtests and unit tests seem to pass with the patch. :)

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11684) Cleanup key ranges during compaction

2016-04-29 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11684:
---
Description: 
Currently cleanup is considered an optional, manual operation that users are 
told to run to free disk space after a node was affected by topology changes. 
However, unmanaged key ranges could also end up on a node through other ways, 
e.g. manual added sstable files by an admin. 

I'm also not sure unmanaged data is really that harmless and cleanup should 
really be optional, if you don't need to reclaim the disk space. When it comes 
to repairs, users are expected to purge a node after downtime in case it was 
not fully covered by a repair within gc_grace afterwards, in order to avoid 
re-introducing deleted data. But the same could happen with unmanaged data, 
e.g. after topology changes activate unmanaged ranges again or after restoring 
backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
node and older than gc_grace during compactions. 

Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
candidates based on SSTable.first/last in case we don't have pending regular or 
tombstone compactions.

  was:
Currently cleanup is considered an optional, manual operation that users are 
told to run to free disk space after a node was affected by topology changes. 
However, unmanaged key ranges could also end up on a node through other ways, 
e.g. manual added sstable files by an admin or over streaming during repairs. 

I'm also not sure unmanaged data is really that harmless and cleanup should 
really be optional, if you don't need to reclaim the disk space. When it comes 
to repairs, users are expected to purge a node after downtime in case it was 
not fully covered by a repair within gc_grace afterwards, in order to avoid 
re-introducing deleted data. But the same could happen with unmanaged data, 
e.g. after topology changes activate unmanaged ranges again or after restoring 
backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
node and older than gc_grace during compactions. 

Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
candidates based on SSTable.first/last in case we don't have pending regular or 
tombstone compactions.


> Cleanup key ranges during compaction
> 
>
> Key: CASSANDRA-11684
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>
> Currently cleanup is considered an optional, manual operation that users are 
> told to run to free disk space after a node was affected by topology changes. 
> However, unmanaged key ranges could also end up on a node through other ways, 
> e.g. manual added sstable files by an admin. 
> I'm also not sure unmanaged data is really that harmless and cleanup should 
> really be optional, if you don't need to reclaim the disk space. When it 
> comes to repairs, users are expected to purge a node after downtime in case 
> it was not fully covered by a repair within gc_grace afterwards, in order to 
> avoid re-introducing deleted data. But the same could happen with unmanaged 
> data, e.g. after topology changes activate unmanaged ranges again or after 
> restoring backups.
> I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
> node and older than gc_grace during compactions. 
> Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
> candidates based on SSTable.first/last in case we don't have pending regular 
> or tombstone compactions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11684) Cleanup key ranges during compaction

2016-04-29 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-11684:
--

 Summary: Cleanup key ranges during compaction
 Key: CASSANDRA-11684
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11684
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Currently cleanup is considered an optional, manual operation that users are 
told to run to free disk space after a node was affected by topology changes. 
However, unmanaged key ranges could also end up on a node through other ways, 
e.g. manual added sstable files by an admin or over streaming during repairs. 

I'm also not sure unmanaged data is really that harmless and cleanup should 
really be optional, if you don't need to reclaim the disk space. When it comes 
to repairs, users are expected to purge a node after downtime in case it was 
not fully covered by a repair within gc_grace afterwards, in order to avoid 
re-introducing deleted data. But the same could happen with unmanaged data, 
e.g. after topology changes activate unmanaged ranges again or after restoring 
backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a 
node and older than gc_grace during compactions. 

Maybe we could also introduce another CLEANUP_COMPACTION operation to find 
candidates based on SSTable.first/last in case we don't have pending regular or 
tombstone compactions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254030#comment-15254030
 ] 

Stefan Podkowinski commented on CASSANDRA-11547:


bq. With this patch, we're not trying catch things at the smallest size, a la 
jHiccup, but really just want to catch things after large enough time distances.

Is this how clock drift actually happens? I was assuming clocks between systems 
are drifting apart slowly over time, instead of just jumping seconds forward or 
back.


> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgeable tombstones

2016-04-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253550#comment-15253550
 ] 

Stefan Podkowinski commented on CASSANDRA-11427:


bq. this only concerns partition and range tombstones, so this is imo a fairly 
minor efficiency problem, not a correction issue.

Although the performance overhead implied by this bug should be fairly low in 
most cases, it's leaving people confused why these read repairs happen in first 
place. If you have some monitoring in place telling you the system constantly 
has to read repair data, you should get to investigate. That's what I did.

bq. So that I wonder if it's worth taking any risk just for 2.2. 

Any tests I can add to reduce the risks of possible side effects?


> Range slice queries CL > ONE trigger read-repair of purgeable tombstones
> 
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch, 11427-2.2_v2.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 compact;
> {noformat}
> {noformat}
> use test1;
> consistency all;
> tracing on;
> select * from test1;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253514#comment-15253514
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


Can we keep the conversation going to get this patch into the next 2.x release?

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10756) Timeout failures in NativeTransportService.testConcurrentDestroys unit test

2016-04-13 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238871#comment-15238871
 ] 

Stefan Podkowinski commented on CASSANDRA-10756:


That will only make sure {{Server.stop}} is called once, but doesn't guarantee 
it completes execution before concurrent threads entering destroy again. 

{noformat}
Thread1 -> destroy() -> stop() --[paused until after shutdown--]--> 
connectionTracker.closeAll()
Thread2 -> destroy() -> stop() -> executor.shutdown()
{noformat}

There's also a concurrency issue between threads pausing after entering 
{{stop}} and concurrent threads overriding {{servers = 
Collections.emptyList();}}, which prevents calling close on servers.

I still think the best way to address this is with the proposed patch.



> Timeout failures in NativeTransportService.testConcurrentDestroys unit test
> ---
>
> Key: CASSANDRA-10756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10756
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Alex Petrov
>
> History of test on trunk 
> [here|http://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/NativeTransportServiceTest/testConcurrentDestroys/history/].
> I've seen these failures across 3.0/trunk for a while. I ran the test looping 
> locally for a while and the timeout is fairly easy to reproduce. The timeout 
> appears to be an indefinite hang and not a timing issue.
> When the timeout occurs, the following stack trace is at the end of the logs 
> for the unit test.
> {code}
> ERROR [ForkJoinPool.commonPool-worker-1] 2015-11-22 21:30:53,635 Failed to 
> submit a listener notification task. Event loop shut down?
> java.util.concurrent.RejectedExecutionException: event executor terminated
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:745)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:322)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:728)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.execute(DefaultPromise.java:671) 
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:641)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138) 
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroupFuture.(DefaultChannelGroupFuture.java:116)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroup.close(DefaultChannelGroup.java:275)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroup.close(DefaultChannelGroup.java:167)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> org.apache.cassandra.transport.Server$ConnectionTracker.closeAll(Server.java:277)
>  [main/:na]
>   at org.apache.cassandra.transport.Server.close(Server.java:180) 
> [main/:na]
>   at org.apache.cassandra.transport.Server.stop(Server.java:116) 
> [main/:na]
>   at java.util.Collections$SingletonSet.forEach(Collections.java:4767) 
> ~[na:1.8.0_60]
>   at 
> org.apache.cassandra.service.NativeTransportService.stop(NativeTransportService.java:136)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.NativeTransportService.destroy(NativeTransportService.java:144)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.NativeTransportServiceTest.lambda$withService$102(NativeTransportServiceTest.java:201)
>  ~[classes/:na]
>   at java.util.stream.IntPipeline$3$1.accept(IntPipeline.java:233) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>  ~[na:1.8.0_60]
>   at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
> ~[na:1.8.0_60]
>   at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) 
> ~[na:1.8.0_60]
>   at 

[jira] [Commented] (CASSANDRA-11097) Idle session timeout for secure environments

2016-04-11 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235982#comment-15235982
 ] 

Stefan Podkowinski commented on CASSANDRA-11097:


Another option would be to address this at native transport protocol level by 
extending the specification to a) allow to indicate if the client session is 
interactive/non-interactive b) ask interactive clients to re-authenticate 
through established connection after inactivity timeout. 


> Idle session timeout for secure environments
> 
>
> Key: CASSANDRA-11097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11097
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Priority: Minor
>  Labels: lhf, ponies
>
> A thread on the user list pointed out that some use cases may prefer to have 
> a database disconnect sessions after some idle timeout. An example would be 
> an administrator who connected via ssh+cqlsh and then walked away. 
> Disconnecting that user and forcing it to re-authenticate could protect 
> against unauthorized access.
> It seems like it may be possible to do this using a netty 
> {{IdleStateHandler}} in a way that's low risk and perhaps off by default.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11097) Idle session timeout for secure environments

2016-04-11 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235074#comment-15235074
 ] 

Stefan Podkowinski commented on CASSANDRA-11097:


This is probably better handled by the client, so we could enable the timeout 
specifically e.g. for  cqlsh users but not for native transport connections 
from services. Also see 
[JAVA-204|https://datastax-oss.atlassian.net/browse/JAVA-204].

> Idle session timeout for secure environments
> 
>
> Key: CASSANDRA-11097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11097
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Priority: Minor
>  Labels: lhf, ponies
>
> A thread on the user list pointed out that some use cases may prefer to have 
> a database disconnect sessions after some idle timeout. An example would be 
> an administrator who connected via ssh+cqlsh and then walked away. 
> Disconnecting that user and forcing it to re-authenticate could protect 
> against unauthorized access.
> It seems like it may be possible to do this using a netty 
> {{IdleStateHandler}} in a way that's low risk and perhaps off by default.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgeable tombstones

2016-04-08 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232007#comment-15232007
 ] 

Stefan Podkowinski commented on CASSANDRA-11427:


I've now created {{11427-2.2_v2.patch}} by purging tombstones in 
{{ColumnFamilyStore.filter}} as suggested. I agree that it's easier and more 
efficient, just wondering why it already hasn't been done there. Tests look 
good, WDYT [~slebresne]?

||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11427-2.2]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11427-2.2-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11427-2.2-testall/]|


> Range slice queries CL > ONE trigger read-repair of purgeable tombstones
> 
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 compact;
> {noformat}
> {noformat}
> use test1;
> consistency all;
> tracing on;
> select * from test1;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgeable tombstones

2016-04-08 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Attachment: 11427-2.2_v2.patch

> Range slice queries CL > ONE trigger read-repair of purgeable tombstones
> 
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch, 11427-2.2_v2.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 compact;
> {noformat}
> {noformat}
> use test1;
> consistency all;
> tracing on;
> select * from test1;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-07 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
Attachment: 11349-2.1.patch

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-07 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
Attachment: (was: 11349-2.1.patch)

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-07 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230127#comment-15230127
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


[~frousseau], what makes things more complicated here is that changes to LCR 
will effect regular compactions as well. Adding all tombstones as expired in 
your {{11349-2.1-v2.patch}} will have unwanted side effects for regular 
compactions, e.g. try {{RangeTombstoneMergeTest}} with it.

I've now spend some time trying to make use of the RT.Tracker there but without 
much success. Adding non-expired range tombstones to the tracker from within 
LCR would cause corrupted sstables. Even creating an edge case just for 
validation compaction would not handle all potential TS shadowing scenarios and 
will probably cause more harm than good (and potential digest mismatch storms). 
I'm not even sure it's possible given the current iterative MergeIterator > 
LazilyCompactedRow > RT.Tracker interaction. 

I'm now at a point where I'd suggest to just stick with {{11349-2.1.patch}} 
unless someone else has a better idea how to solve this. I've updated the 
[dtest PR|https://github.com/riptano/cassandra-dtest/pull/881] with two of the 
described shadowing scenarios that will only work with 3.0+ even after the 
patch, if someone wants to give it a try.

Cassci results for {{11349-2.1.patch}}:

||2.1||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11349-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11349-2.2]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.2-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.2-testall/]|



> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-02 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
Attachment: (was: 11349-2.1.patch)

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-02 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
Attachment: 11349-2.1.patch

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-02 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222902#comment-15222902
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


It makes sense just to modify {{onDiskAtomComparator}}. Given the generic name 
I assumed the comparator is used in other places as well, but since its only 
used in {{LazyCompactedRow}} we can just change the patch as suggested and 
simply remove the timestamp tie-break behaviour in {{onDiskAtomComparator}}. 

As for regular compactions, I agree with Tyler that this should not effect 
compactions in a way that it does with validation compaction. Before the patch, 
{{LazyCompactedRow}} would not reduce both RTs but instead have 
{{ColumnIndex.buildForCompaction()}} iterate over both RTs and have them added 
to the {{RangeTombstone.Tracker}}. The tracker would merge them in a way 
{{LCR.Reducer.getReduced}} would after the patch. However, I’m not fully sure 
if there could be some other for more complex cases where this still would 
cause problems.

Although the patch should fix the described issue, the way we deal with RTs 
during validation compaction is still not ideal. The problem is that LCR lacks 
some handling of relationships between RTs compared to 
{{RangeTombstone.Tracker}}. If we create digests column by column, we get wrong 
results for shadowing tombstones not sharing the same intervals.

{noformat}
CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
USE test_rt;
CREATE TABLE IF NOT EXISTS table1 (
c1 text,
c2 text,
c3 text,
c4 float,
PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b' AND c3 = 'c';

ccm node1 flush

DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';

ccm node1 repair test_rt table1
{noformat}


In this case the (c1, c2, c3) RT will always be repaired after it has been 
compacted with (c1, c2) on any node. 
So I’m wondering if we shouldn’t take a more bold approach here than the patch 
does. 


> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2016-03-31 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219651#comment-15219651
 ] 

Stefan Podkowinski commented on CASSANDRA-9325:
---

cassci test results:

||2.2||3.0||trunk||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9325-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9325-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9325-trunk]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9325-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9325-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9325-trunk-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9325-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9325-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9325-trunk-testall/]|

> cassandra-stress requires keystore for SSL but provides no way to configure it
> --
>
> Key: CASSANDRA-9325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: J.B. Langston
>Assignee: Stefan Podkowinski
>  Labels: lhf, stress
> Fix For: 2.2.x
>
> Attachments: 9325-2.1.patch
>
>
> Even though it shouldn't be required unless client certificate authentication 
> is enabled, the stress tool is looking for a keystore in the default location 
> of conf/.keystore with the default password of cassandra. There is no command 
> line option to override these defaults so you have to provide a keystore that 
> satisfies the default. It looks for conf/.keystore in the working directory, 
> so you need to create this in the directory you are running cassandra-stress 
> from.It doesn't really matter what's in the keystore; it just needs to exist 
> in the expected location and have a password of cassandra.
> Since the keystore might be required if client certificate authentication is 
> enabled, we need to add -transport parameters for keystore and 
> keystore-password.  Ideally, these should be optional and stress shouldn't 
> require the keystore unless client certificate authentication is enabled on 
> the server.
> In case it wasn't apparent, this is for Cassandra 2.1 and later's stress 
> tool.  I actually had even more problems getting Cassandra 2.0's stress tool 
> working with SSL and gave up on it.  We probably don't need to fix 2.0; we 
> can just document that it doesn't support SSL and recommend using 2.1 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2016-03-30 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218137#comment-15218137
 ] 

Stefan Podkowinski commented on CASSANDRA-9325:
---

I've rebased and recreated the patch to make sure it applies cleanly from 2.1 
up to trunk. [~tjake], let me know if you need me to fire up cassci runs for 
the patch.

> cassandra-stress requires keystore for SSL but provides no way to configure it
> --
>
> Key: CASSANDRA-9325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: J.B. Langston
>Assignee: Stefan Podkowinski
>  Labels: lhf, stress
> Fix For: 2.2.x
>
> Attachments: 9325-2.1.patch
>
>
> Even though it shouldn't be required unless client certificate authentication 
> is enabled, the stress tool is looking for a keystore in the default location 
> of conf/.keystore with the default password of cassandra. There is no command 
> line option to override these defaults so you have to provide a keystore that 
> satisfies the default. It looks for conf/.keystore in the working directory, 
> so you need to create this in the directory you are running cassandra-stress 
> from.It doesn't really matter what's in the keystore; it just needs to exist 
> in the expected location and have a password of cassandra.
> Since the keystore might be required if client certificate authentication is 
> enabled, we need to add -transport parameters for keystore and 
> keystore-password.  Ideally, these should be optional and stress shouldn't 
> require the keystore unless client certificate authentication is enabled on 
> the server.
> In case it wasn't apparent, this is for Cassandra 2.1 and later's stress 
> tool.  I actually had even more problems getting Cassandra 2.0's stress tool 
> working with SSL and gave up on it.  We probably don't need to fix 2.0; we 
> can just document that it doesn't support SSL and recommend using 2.1 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-9325:
--
Attachment: 9325-2.1.patch

> cassandra-stress requires keystore for SSL but provides no way to configure it
> --
>
> Key: CASSANDRA-9325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: J.B. Langston
>Assignee: Stefan Podkowinski
>  Labels: lhf, stress
> Fix For: 2.2.x
>
> Attachments: 9325-2.1.patch
>
>
> Even though it shouldn't be required unless client certificate authentication 
> is enabled, the stress tool is looking for a keystore in the default location 
> of conf/.keystore with the default password of cassandra. There is no command 
> line option to override these defaults so you have to provide a keystore that 
> satisfies the default. It looks for conf/.keystore in the working directory, 
> so you need to create this in the directory you are running cassandra-stress 
> from.It doesn't really matter what's in the keystore; it just needs to exist 
> in the expected location and have a password of cassandra.
> Since the keystore might be required if client certificate authentication is 
> enabled, we need to add -transport parameters for keystore and 
> keystore-password.  Ideally, these should be optional and stress shouldn't 
> require the keystore unless client certificate authentication is enabled on 
> the server.
> In case it wasn't apparent, this is for Cassandra 2.1 and later's stress 
> tool.  I actually had even more problems getting Cassandra 2.0's stress tool 
> working with SSL and gave up on it.  We probably don't need to fix 2.0; we 
> can just document that it doesn't support SSL and recommend using 2.1 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgeable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Summary: Range slice queries CL > ONE trigger read-repair of purgeable 
tombstones  (was: Range slice queries CL > ONE trigger read-repair of purgable 
tombstones)

> Range slice queries CL > ONE trigger read-repair of purgeable tombstones
> 
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 compact;
> {noformat}
> {noformat}
> use test1;
> consistency all;
> tracing on;
> select * from test1;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Reproduced In:   (was: 2.1.13)

> Range slice queries CL > ONE trigger read-repair of purgable tombstones
> ---
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 compact;
> {noformat}
> {noformat}
> use test1;
> consistency all;
> tracing on;
> select * from test1;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Description: 
Range queries will trigger read repairs for purgeable tombstones on hosts that 
already compacted given tombstones. Clusters with periodical jobs for scanning 
data ranges will likely see tombstones ressurected through RRs just to have 
them compacted again later at the destination host.

Executing range queries (e.g. for reading token ranges) will compare the actual 
data instead of using digests when executed with CL > ONE. Responses will be 
consolidated by {{RangeSliceResponseResolver.Reducer}}, where the result of 
{{RowDataResolver.resolveSuperset}} is used as the reference version for the 
results. {{RowDataResolver.scheduleRepairs}} will then send the superset to all 
nodes that returned a different result before. 

Unfortunately this does also involve cases where the superset is just made up 
of purgeable tombstone(s) that already have been compacted on the other nodes. 
In this case a read-repair will be triggered for transfering the purgeable 
tombstones to all other nodes nodes that returned an empty result.

The issue can be reproduced with the provided dtest or manually using the 
following steps:

{noformat}
create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
'replication_factor' : 2 };
use test1;
create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
{'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;

delete from test1 where a = 'a';
{noformat}

{noformat}
ccm flush;
ccm node2 compact;
{noformat}

{noformat}
use test1;
consistency all;
tracing on;
select * from test1;
{noformat}



  was:
Range queries will trigger read repairs for purgeable tombstones on hosts that 
already compacted given tombstones. Clusters with periodical jobs for scanning 
data ranges will likely see tombstones ressurected through RRs just to have 
them compacted again later at the destination host.

Executing range queries (e.g. for reading token ranges) will compare the actual 
data instead of using digests when executed with CL > ONE. Responses will be 
consolidated by {{RangeSliceResponseResolver.Reducer}}, where the result of 
{{RowDataResolver.resolveSuperset}} is used as the reference version for the 
results. {{RowDataResolver.scheduleRepairs}} will then send the superset to all 
nodes that returned a different result before. 

Unfortunately this does also involve cases where the superset is just made up 
of purgeable tombstone(s) that already have been compacted on the other nodes. 
In this case a read-repair will be triggered for transfering the purgeable 
tombstones to all other nodes nodes that returned an empty result.




> Range slice queries CL > ONE trigger read-repair of purgable tombstones
> ---
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 

[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Description: 
Range queries will trigger read repairs for purgeable tombstones on hosts that 
already compacted given tombstones. Clusters with periodical jobs for scanning 
data ranges will likely see tombstones ressurected through RRs just to have 
them compacted again later at the destination host.

Executing range queries (e.g. for reading token ranges) will compare the actual 
data instead of using digests when executed with CL > ONE. Responses will be 
consolidated by {{RangeSliceResponseResolver.Reducer}}, where the result of 
{{RowDataResolver.resolveSuperset}} is used as the reference version for the 
results. {{RowDataResolver.scheduleRepairs}} will then send the superset to all 
nodes that returned a different result before. 

Unfortunately this does also involve cases where the superset is just made up 
of purgeable tombstone(s) that already have been compacted on the other nodes. 
In this case a read-repair will be triggered for transfering the purgeable 
tombstones to all other nodes nodes that returned an empty result.



  was:
Executing range queries (e.g. for reading token ranges) will compare the actual 
data instead of using digests when executed with CL > ONE. Responses will be 
consolidated by {{RangeSliceResponseResolver.Reducer}}, where the result of 
{{RowDataResolver.resolveSuperset}} is used as the reference version for the 
results. {{RowDataResolver.scheduleRepairs}} will then send the superset to all 
nodes that returned a different result before. 

Unfortunately this does also involve cases where the superset is just made up 
of purgable tombstone(s) that already have been compacted on the other nodes. 
In this case a read-repair will be triggered for transfering the purgable 
tombstones to all other nodes nodes that returned an empty result.




> Range slice queries CL > ONE trigger read-repair of purgable tombstones
> ---
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Fix Version/s: 2.2.x
   2.1.x
   Status: Patch Available  (was: In Progress)

> Range slice queries CL > ONE trigger read-repair of purgable tombstones
> ---
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch
>
>
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgable tombstone(s) that already have been compacted on the other nodes. 
> In this case a read-repair will be triggered for transfering the purgable 
> tombstones to all other nodes nodes that returned an empty result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217871#comment-15217871
 ] 

Stefan Podkowinski commented on CASSANDRA-11427:


Proposed patch added (2.1 + 2.2).


||2.1||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11427-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11427-2.2]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11427-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11427-2.2-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11427-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11427-2.2-dtest/]|


> Range slice queries CL > ONE trigger read-repair of purgable tombstones
> ---
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Attachments: 11427-2.1.patch
>
>
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgable tombstone(s) that already have been compacted on the other nodes. 
> In this case a read-repair will be triggered for transfering the purgable 
> tombstones to all other nodes nodes that returned an empty result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11427:
---
Attachment: 11427-2.1.patch

> Range slice queries CL > ONE trigger read-repair of purgable tombstones
> ---
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Attachments: 11427-2.1.patch
>
>
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgable tombstone(s) that already have been compacted on the other nodes. 
> In this case a read-repair will be triggered for transfering the purgable 
> tombstones to all other nodes nodes that returned an empty result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgable tombstones

2016-03-24 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-11427:
--

 Summary: Range slice queries CL > ONE trigger read-repair of 
purgable tombstones
 Key: CASSANDRA-11427
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
 Project: Cassandra
  Issue Type: Bug
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Executing range queries (e.g. for reading token ranges) will compare the actual 
data instead of using digests when executed with CL > ONE. Responses will be 
consolidated by {{RangeSliceResponseResolver.Reducer}}, where the result of 
{{RowDataResolver.resolveSuperset}} is used as the reference version for the 
results. {{RowDataResolver.scheduleRepairs}} will then send the superset to all 
nodes that returned a different result before. 

Unfortunately this does also involve cases where the superset is just made up 
of purgable tombstone(s) that already have been compacted on the other nodes. 
In this case a read-repair will be triggered for transfering the purgable 
tombstones to all other nodes nodes that returned an empty result.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-03-23 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
   Labels: repair  (was: )
Fix Version/s: 2.2.x
   2.1.x
   Status: Patch Available  (was: In Progress)

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-03-23 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11349:
---
Attachment: 11349-2.1.patch

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
> Attachments: 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-03-23 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208345#comment-15208345
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:




I gave the patch some more thoughts and I'm now confident that the proposed 
change is the best way to address the issue. 

Basically what happens during validation compaction is that a scanner is 
created for each sstable. The {{CompactionIterable.Reducer}} will then create a 
{{LazilyCompactedRow}} with an iterable of {{OnDiskAtom}} for the same key in 
each sstable. The purpose of {{LazilyCompactedRow}} during validation 
compaction is to create a digest of the compacted version of all atoms that 
would represent a single row. This is done cell by cell, where each collection 
of atoms for a single cell name is consumed by {{LazilyCompactedRow.Reducer}}.
 
 The decision on whether {{LazilyCompactedRow.Reducer}} should finish to merge 
a cell and move to the next one is currently being done by 
{{AbstractCellNameType.onDiskAtomComparator}}, as evaluated by 
{{MergeIterator.ManyToOne}}. However, the comparator does not only compare by 
name, but also by {{DeletionTime}} in case of {{RangeTombstone}}. As a 
consequence, {{MergeIterator.ManyToOne}} will advance in case two 
{{RangeTombstone}} with different deletion times are read, which breaks the 
"_will be called one or more times with cells that share the same column name_" 
contract in {{LazilyCompactedRow.Reducer}}.

The submitted patch will introduce a new {{Comparator}} that will 
basically work like {{onDiskAtomComparator}}, but does not compare deletion 
time. As simple as that.


||2.1||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11349-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11349-2.2]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.2-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11349-2.2-dtest/]|


The only other places where {{LazilyCompactedRow}} is being used except 
validation compaction are the cleanup and scrub functions, which shouldn't be 
affected, as those are working on individual sstables and I assume that there's 
no case where an sstable can have multiple identical range tombstones with 
different timestamps.


> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many 

[jira] [Commented] (CASSANDRA-11405) Server encryption cannot be enabled with the IBM JRE 1.7

2016-03-23 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208330#comment-15208330
 ] 

Stefan Podkowinski commented on CASSANDRA-11405:


This has been addressed in CASSANDRA-10508, but likely won't be back ported to 
2.2, as the issue doesn't effect supported JDKs.

> Server encryption cannot be enabled with the IBM JRE 1.7
> 
>
> Key: CASSANDRA-11405
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11405
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
> Environment: Linux, IBM JRE 1.7
>Reporter: Guillermo Vega-Toro
> Fix For: 2.2.6
>
>
> When enabling server encryption with the IBM JRE (algorithm: IbmX509), an 
> IllegalArgumentException is thrown from the IBM JSSE when the server is 
> started:
> ERROR 10:04:37,326 Exception encountered during startup
> java.lang.IllegalArgumentException: SSLv2Hello
> at com.ibm.jsse2.qb.a(qb.java:50)
> at com.ibm.jsse2.pb.a(pb.java:101)
> at com.ibm.jsse2.pb.(pb.java:77)
> at com.ibm.jsse2.oc.setEnabledProtocols(oc.java:77)
> at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:64)
> at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:425)
> at 
> org.apache.cassandra.net.MessagingService.listen(MessagingService.java:409)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:693)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:437)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:656)
> The problem is that the IBM JSSE does not support SSLv2Hello, but this 
> protocol is hard-coded in class org.apache.cassandra.security.SSLFactory:
> public static final String[] ACCEPTED_PROTOCOLS = new String[] {"SSLv2Hello", 
> "TLSv1", "TLSv1.1", "TLSv1.2"};
> public static SSLServerSocket getServerSocket(EncryptionOptions options, 
> InetAddress address, int port) throws IOException
> {
> SSLContext ctx = createSSLContext(options, true);
> SSLServerSocket serverSocket = 
> (SSLServerSocket)ctx.getServerSocketFactory().createServerSocket();
> serverSocket.setReuseAddress(true);
> String[] suits = 
> filterCipherSuites(serverSocket.getSupportedCipherSuites(), 
> options.cipher_suites);
> serverSocket.setEnabledCipherSuites(suits);
> serverSocket.setNeedClientAuth(options.require_client_auth);
> serverSocket.setEnabledProtocols(ACCEPTED_PROTOCOLS);
> serverSocket.bind(new InetSocketAddress(address, port), 500);
> return serverSocket;
> }
> This ACCEPTED_PROTOCOLS array should not be hard-coded. It should rather read 
> the protocols from configuration, or if the algorithm is IbmX509, simply do 
> not call setEnabledProtocols - with the IBM JSSE, the enabled protocol is 
> controlled by the protocol passed to SSLContext.getInstance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-03-21 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski reassigned CASSANDRA-11349:
--

Assignee: Stefan Podkowinski

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-03-21 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204971#comment-15204971
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


Looks like the {{MergeIterator.ManyToOne}} logic gets in the way of 
{{LazilyCompactedRow.Reducer}} doing it's job. The iterator will stop adding 
atoms to the reducer and continue to advance, once two range tombstones with 
different deletion times are about to be merged.

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10956) Enable authentication of native protocol users via client certificates

2016-03-15 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194993#comment-15194993
 ] 

Stefan Podkowinski commented on CASSANDRA-10956:




I’d assume that authentication should be handled by providing a IAuthenticator 
implementation, but I can see how this is not a good fit here as we can’t 
provide any SASL support.
I also like about the your approach that it can be used on top of regular 
authentication, e.g. by falling back to password based authentication if no 
certificate has been provided.

Two small remarks regards {{cassandra.yaml}}:

bq. Client supplied certificates must be present in the configured truststore 
when using this authentication

I first read this that each client certificate must be present in the 
truststore. Maybe explicitly mention importing a common CA in the truststore 
works as well.

bq. NOT_REQUIRED : no attempt is made to obtain user identity from the cert 
chain.

The reason for presenting this option in the yaml config is not really clear to 
me. It’s contrary to the idea of using the certificate authenticator.


> Enable authentication of native protocol users via client certificates
> --
>
> Key: CASSANDRA-10956
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10956
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Samuel Klock
>Assignee: Samuel Klock
> Attachments: 10956.patch
>
>
> Currently, the native protocol only supports user authentication via SASL.  
> While this is adequate for many use cases, it may be superfluous in scenarios 
> where clients are required to present an SSL certificate to connect to the 
> server.  If the certificate presented by a client is sufficient by itself to 
> specify a user, then an additional (series of) authentication step(s) via 
> SASL merely add overhead.  Worse, for uses wherein it's desirable to obtain 
> the identity from the client's certificate, it's necessary to implement a 
> custom SASL mechanism to do so, which increases the effort required to 
> maintain both client and server and which also duplicates functionality 
> already provided via SSL/TLS.
> Cassandra should provide a means of using certificates for user 
> authentication in the native protocol without any effort above configuring 
> SSL on the client and server.  Here's a possible strategy:
> * Add a new authenticator interface that returns {{AuthenticatedUser}} 
> objects based on the certificate chain presented by the client.
> * If this interface is in use, the user is authenticated immediately after 
> the server receives the {{STARTUP}} message.  It then responds with a 
> {{READY}} message.
> * Otherwise, the existing flow of control is used (i.e., if the authenticator 
> requires authentication, then an {{AUTHENTICATE}} message is sent to the 
> client).
> One advantage of this strategy is that it is backwards-compatible with 
> existing schemes; current users of SASL/{{IAuthenticator}} are not impacted.  
> Moreover, it can function as a drop-in replacement for SASL schemes without 
> requiring code changes (or even config changes) on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption

2016-03-12 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191830#comment-15191830
 ] 

Stefan Podkowinski commented on CASSANDRA-10735:


{quote}
The transport encryption options do not let you override the keystore and 
keystore password used for jdk ssl, it just uses defaults (conf/.keystore and 
cassandra for password).
{quote}

This has been addressed in CASSANDRA-9325


> Support netty openssl (netty-tcnative) for client encryption
> 
>
> Key: CASSANDRA-10735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10735
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andy Tolbert
>Assignee: Aleksey Yeschenko
> Fix For: 3.x
>
> Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, 
> nettysslbench.png, nettysslbench_small.png, sslbench12-03.png
>
>
> The java-driver recently added support for using netty openssl via 
> [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in 
> [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a 
> very measured improvement (numbers incoming on that ticket).   It seems 
> likely that this can offer improvement if implemented C* side as well.
> Since netty-tcnative has platform specific requirements, this should not be 
> made the default, but rather be an option that one can use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10805) Additional Compaction Logging

2016-03-08 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184916#comment-15184916
 ] 

Stefan Podkowinski commented on CASSANDRA-10805:


This looks really promising. I've played around with the branch and did some 
minor changes in a [PR|https://github.com/carlyeks/cassandra/pull/1/files].

However, I'm still not sure why you plan to implement your own file rolling 
logic. Getting files rolled by logback and archive them manually afterwards 
would work perfectly fine for me.

> Additional Compaction Logging
> -
>
> Key: CASSANDRA-10805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10805
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Compaction, Observability
>Reporter: Carl Yeksigian
>Assignee: Carl Yeksigian
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, viewing the results of past compactions requires parsing the log 
> and looking at the compaction history system table, which doesn't have 
> information about, for example, flushed sstables not previously compacted.
> This is a proposal to extend the information captured for compaction. 
> Initially, this would be done through a JMX call, but if it proves to be 
> useful and not much overhead, it might be a feature that could be enabled for 
> the compaction strategy all the time.
> Initial log information would include:
> - The compaction strategy type controlling each column family
> - The set of sstables included in each compaction strategy
> - Information about flushes and compactions, including times and all involved 
> sstables
> - Information about sstables, including generation, size, and tokens
> - Any additional metadata the strategy wishes to add to a compaction or an 
> sstable, like the level of an sstable or the type of compaction being 
> performed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-03-01 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Attachment: 10508-trunk.patch

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
> Attachments: 10508-trunk.patch
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> -Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.- (fixed in CASSANDRA-11164)
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-03-01 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Description: 
Currently each SSL connections will be initialized using a hard-coded list of 
protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
now require Java 8 which comes with solid defaults for these kind of SSL 
settings and I'm wondering if the current behavior shouldn't be re-evaluated. 

In my impression the way cipher suites are currently whitelisted is 
problematic, as this will prevent the JVM from using more recent and more 
secure suites that haven't been added to the hard-coded list. JVM updates may 
also cause issues in case the limited number of ciphers cannot be used, e.g. 
see CASSANDRA-6613.

-Looking at the source I've also stumbled upon a bug in the 
{{filterCipherSuites()}} method that would return the filtered list of ciphers 
in undetermined order where the result is passed to 
{{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
order of preference ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) 
and therefore you may end up with weaker algorithms on the top. Currently it's 
not that critical, as we only whitelist a couple of ciphers anyway. But it adds 
to the question if it still really makes sense to work with the cipher list at 
all in the Cassandra code base.- (fixed in CASSANDRA-11164)

Another way to effect used ciphers is by changing the security properties. This 
is a more versatile way to work with cipher lists instead of relying on 
hard-coded values, see 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
 for details.

The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted anyway 
and will stop using safer protocol sets on new JVM releases or user request. 
Again, we should stick with the JVM defaults. Using the 
{{jdk.tls.client.protocols}} systems property will always allow to restrict the 
set of protocols in case another emergency fix is needed. 



  was:
Currently each SSL connections will be initialized using a hard-coded list of 
protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
now require Java 8 which comes with solid defaults for these kind of SSL 
settings and I'm wondering if the current behavior shouldn't be re-evaluated. 

In my impression the way cipher suites are currently whitelisted is 
problematic, as this will prevent the JVM from using more recent and more 
secure suites that haven't been added to the hard-coded list. JVM updates may 
also cause issues in case the limited number of ciphers cannot be used, e.g. 
see CASSANDRA-6613.

-Looking at the source I've also stumbled upon a bug in the 
{{filterCipherSuites()}} method that would return the filtered list of ciphers 
in undetermined order where the result is passed to 
{{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
order of preference ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) 
and therefore you may end up with weaker algorithms on the top. Currently it's 
not that critical, as we only whitelist a couple of ciphers anyway. But it adds 
to the question if it still really makes sense to work with the cipher list at 
all in the Cassandra code base.-

Another way to effect used ciphers is by changing the security properties. This 
is a more versatile way to work with cipher lists instead of relying on 
hard-coded values, see 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
 for details.

The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted anyway 
and will stop using safer protocol sets on new JVM releases or user request. 
Again, we should stick with the JVM defaults. Using the 
{{jdk.tls.client.protocols}} systems property will always allow to restrict the 
set of protocols in case another emergency fix is needed. 




> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if 

[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-03-01 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Description: 
Currently each SSL connections will be initialized using a hard-coded list of 
protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
now require Java 8 which comes with solid defaults for these kind of SSL 
settings and I'm wondering if the current behavior shouldn't be re-evaluated. 

In my impression the way cipher suites are currently whitelisted is 
problematic, as this will prevent the JVM from using more recent and more 
secure suites that haven't been added to the hard-coded list. JVM updates may 
also cause issues in case the limited number of ciphers cannot be used, e.g. 
see CASSANDRA-6613.

-Looking at the source I've also stumbled upon a bug in the 
{{filterCipherSuites()}} method that would return the filtered list of ciphers 
in undetermined order where the result is passed to 
{{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
order of preference ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) 
and therefore you may end up with weaker algorithms on the top. Currently it's 
not that critical, as we only whitelist a couple of ciphers anyway. But it adds 
to the question if it still really makes sense to work with the cipher list at 
all in the Cassandra code base.-

Another way to effect used ciphers is by changing the security properties. This 
is a more versatile way to work with cipher lists instead of relying on 
hard-coded values, see 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
 for details.

The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted anyway 
and will stop using safer protocol sets on new JVM releases or user request. 
Again, we should stick with the JVM defaults. Using the 
{{jdk.tls.client.protocols}} systems property will always allow to restrict the 
set of protocols in case another emergency fix is needed. 



  was:
Currently each SSL connections will be initialized using a hard-coded list of 
protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
now require Java 8 which comes with solid defaults for these kind of SSL 
settings and I'm wondering if the current behavior shouldn't be re-evaluated. 

In my impression the way cipher suites are currently whitelisted is 
problematic, as this will prevent the JVM from using more recent and more 
secure suites that haven't been added to the hard-coded list. JVM updates may 
also cause issues in case the limited number of ciphers cannot be used, e.g. 
see CASSANDRA-6613.

Looking at the source I've also stumbled upon a bug in the 
{{filterCipherSuites()}} method that would return the filtered list of ciphers 
in undetermined order where the result is passed to 
{{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
order of preference ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) 
and therefore you may end up with weaker algorithms on the top. Currently it's 
not that critical, as we only whitelist a couple of ciphers anyway. But it adds 
to the question if it still really makes sense to work with the cipher list at 
all in the Cassandra code base.

Another way to effect used ciphers is by changing the security properties. This 
is a more versatile way to work with cipher lists instead of relying on 
hard-coded values, see 
[here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
 for details.

The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted anyway 
and will stop using safer protocol sets on new JVM releases or user request. 
Again, we should stick with the JVM defaults. Using the 
{{jdk.tls.client.protocols}} systems property will always allow to restrict the 
set of protocols in case another emergency fix is needed. 

You can find a patch with where I ripped out the mentioned options here:
[Diff 
trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]


> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", 

[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-03-01 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Status: Patch Available  (was: Open)

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-03-01 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Issue Type: Improvement  (was: Bug)

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-03-01 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173494#comment-15173494
 ] 

Stefan Podkowinski commented on CASSANDRA-10508:


Branch has been rebased on trunk. Scope of this ticket will now only include 
removing hard-coded default SSL protocols and ciphers, as the 
{{filterCipherSuites}} bug has now be fixed in CASSANDRA-11164.

||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...spodkowinski:CASSANDRA-10508-trunk]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-10508-trunk-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-10508-trunk-dtest/]|



> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-26 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Attachment: (was: 10508-2.2.patch)

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: docs-impacting, lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11238) Update cassandra-stress help

2016-02-26 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168675#comment-15168675
 ] 

Stefan Podkowinski commented on CASSANDRA-11238:


Are you referring to the username/password parameters for native 
transport/thrift? You can use {{cassandra-stress help -mode}} to have them 
listed.

> Update cassandra-stress help
> 
>
> Key: CASSANDRA-11238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Lorina Poland
>Priority: Minor
> Fix For: 3.3
>
>
> Please add authentication options to cassandra-stress help - 
> username=username password=password
> Not sure how far back this one goes, in terms of versions. I marked 3.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-25 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: 11164-2.2_spod.patch

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-2.2_spod.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-25 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: (was: 11164-2.2_1_preserve_cipher_order.patch)

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-25 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: (was: 11164-2.2_2_call_filterCipherSuites_everywhere.patch)

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-25 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167015#comment-15167015
 ] 

Stefan Podkowinski commented on CASSANDRA-11164:


{{SSLFactoryTest.testServerSocketCiphers()}} has been fixed and will now work 
with JVMs that won't support all ciphers specified by config. Removed 
{{UnknownHostException}} and added line to CHANGES.txt.


||2.2||3.0||trunk||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11164]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11164-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-11164-trunk]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-trunk-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-trunk-dtest/]|

Patch merged upwards cleanly for me.

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-2.2_1_preserve_cipher_order.patch, 
> 11164-2.2_2_call_filterCipherSuites_everywhere.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10956) Enable authentication of native protocol users via client certificates

2016-02-24 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163275#comment-15163275
 ] 

Stefan Podkowinski commented on CASSANDRA-10956:


bq. Regarding anonymous authentication: would it be reasonable to make this 
behavior configurable? The intent is to enable operators to provide some level 
of access (perhaps read-only) to users who are not capable of authenticating.

I have a similar use case where we like to use different authenticators for 
application logins and employees. The idea would be to use 
{{PasswordAuthenticator}} for applications and LDAP for human users. One way to 
implement this would be to create a chain of individual authenticators that are 
to be checked by order, maybe with a default action if all of them fail 
(deny/anonymous). However, as you probably also want to use a different role 
manager depending on used authenticator, you'd need some kind of mapping for 
that as well and I don't like to end up creating iptables for the Cassandra 
login process.

Maybe the better option would be to allow configuring multiple native transport 
ports, each of them using a single authenticator and role manager. This should 
be relatively easy to implement and would also allow to restrict access on 
network level, e.g. setting up different firewall rules for the "applications" 
or "developer" ports. 


> Enable authentication of native protocol users via client certificates
> --
>
> Key: CASSANDRA-10956
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10956
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Samuel Klock
>Assignee: Samuel Klock
> Attachments: 10956.patch
>
>
> Currently, the native protocol only supports user authentication via SASL.  
> While this is adequate for many use cases, it may be superfluous in scenarios 
> where clients are required to present an SSL certificate to connect to the 
> server.  If the certificate presented by a client is sufficient by itself to 
> specify a user, then an additional (series of) authentication step(s) via 
> SASL merely add overhead.  Worse, for uses wherein it's desirable to obtain 
> the identity from the client's certificate, it's necessary to implement a 
> custom SASL mechanism to do so, which increases the effort required to 
> maintain both client and server and which also duplicates functionality 
> already provided via SSL/TLS.
> Cassandra should provide a means of using certificates for user 
> authentication in the native protocol without any effort above configuring 
> SSL on the client and server.  Here's a possible strategy:
> * Add a new authenticator interface that returns {{AuthenticatedUser}} 
> objects based on the certificate chain presented by the client.
> * If this interface is in use, the user is authenticated immediately after 
> the server receives the {{STARTUP}} message.  It then responds with a 
> {{READY}} message.
> * Otherwise, the existing flow of control is used (i.e., if the authenticator 
> requires authentication, then an {{AUTHENTICATE}} message is sent to the 
> client).
> One advantage of this strategy is that it is backwards-compatible with 
> existing schemes; current users of SASL/{{IAuthenticator}} are not impacted.  
> Moreover, it can function as a drop-in replacement for SASL schemes without 
> requiring code changes (or even config changes) on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-23 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: 11164-2.2_2_call_filterCipherSuites_everywhere.patch

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-2.2_1_preserve_cipher_order.patch, 
> 11164-2.2_2_call_filterCipherSuites_everywhere.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-23 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: (was: 11164-on-10508-2.2.patch)

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-2.2_1_preserve_cipher_order.patch, 
> 11164-2.2_2_call_filterCipherSuites_everywhere.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-23 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: 11164-2.2_1_preserve_cipher_order.patch

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-2.2_1_preserve_cipher_order.patch, 
> 11164-2.2_2_call_filterCipherSuites_everywhere.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-23 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159131#comment-15159131
 ] 

Stefan Podkowinski commented on CASSANDRA-11164:


Scope of this ticket as reported would be:
- respect ordering of enabled ciphers
- apply cipher filtering wherever SSL is used

I've now created two patches for that:
- {{11164-2.2_1_preserve_cipher_order.patch}} - cherry picked 
{{filterCipherSuites}} implementation and unit test from CASSANDRA-10508 with 
some of your suggested changes
- {{11164-2.2_2_call_filterCipherSuites_everywhere.patch}} - this is 
{{11164-2.2.txt}} from Tom minus the {{filterCipherSuites}} implementation

||2.2||
|[Branch|https://github.com/spodkowinski/cassandra/commits/CASSANDRA-11164]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-11164-dtest/]|


> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-on-10508-2.2.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10458) cqlshrc: add option to always use ssl

2016-02-19 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154434#comment-15154434
 ] 

Stefan Podkowinski commented on CASSANDRA-10458:


Merge LGTM for this ticket.

> cqlshrc: add option to always use ssl
> -
>
> Key: CASSANDRA-10458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10458
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Matt Wringe
>Assignee: Stefan Podkowinski
>  Labels: lhf
>
> I am currently running on a system in which my cassandra cluster is only 
> accessible over tls.
> The cqlshrc file is used to specify the host, the certificates and other 
> configurations, but one option its missing is to always connect over ssl.
> I would like to be able to call 'cqlsh' instead of always having to specify 
> 'cqlsh --ssl'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152235#comment-15152235
 ] 

Stefan Podkowinski edited comment on CASSANDRA-10508 at 2/18/16 12:22 PM:
--

CASSANDRA-11164 will apply {{filterCipherSuites}} to 
{{getSupportedCipherSuites}} before calling {{setEnabledCipherSuites}}. This 
likely makes sense so I've added {{11164-on-10508-2.2.patch}} in 11164 that 
would merge.


was (Author: spo...@gmail.com):
CASSANDRA-11164 will apply {{filterCipherSuites}} to 
{{getSupportedCipherSuites}} before calling {{setEnabledCipherSuites}}. I'm not 
sure that's really needed, but I've added {{11164-on-10508-2.2.patch}} in 11164 
that would merge.

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 10508-2.2.patch
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152235#comment-15152235
 ] 

Stefan Podkowinski edited comment on CASSANDRA-10508 at 2/18/16 12:21 PM:
--

CASSANDRA-11164 will apply {{filterCipherSuites}} to 
{{getSupportedCipherSuites}} before calling {{setEnabledCipherSuites}}. I'm not 
sure that's really needed, but I've added {{11164-on-10508-2.2.patch}} in 11164 
that would merge.


was (Author: spo...@gmail.com):
CASSANDRA-11164 will apply {{filterCipherSuites}} to 
{{getSupportedCipherSuites}} before calling {{setEnabledCipherSuites}}. I'm not 
sure that's really needed, but I've added {{11164-on-10508-2.2.patch}} that 
would merge.

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 10508-2.2.patch
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152235#comment-15152235
 ] 

Stefan Podkowinski commented on CASSANDRA-10508:


CASSANDRA-11164 will apply {{filterCipherSuites}} to 
{{getSupportedCipherSuites}} before calling {{setEnabledCipherSuites}}. I'm not 
sure that's really needed, but I've added {{11164-on-10508-2.2.patch}} that 
would merge.

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 10508-2.2.patch
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11164) Order and filter cipher suites correctly

2016-02-18 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-11164:
---
Attachment: 11164-on-10508-2.2.patch

> Order and filter cipher suites correctly
> 
>
> Key: CASSANDRA-11164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11164
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom Petracca
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: 11164-2.2.txt, 11164-on-10508-2.2.patch
>
>
> As pointed out in https://issues.apache.org/jira/browse/CASSANDRA-10508, 
> SSLFactory.filterCipherSuites() doesn't respect the ordering of desired 
> ciphers in cassandra.yaml.
> Also the fix that occurred for 
> https://issues.apache.org/jira/browse/CASSANDRA-3278 is incomplete and needs 
> to be applied to all locations where we create an SSLSocket so that JCE is 
> not required out of the box or with additional configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152211#comment-15152211
 ] 

Stefan Podkowinski commented on CASSANDRA-10508:


There already is a {{protocol}} option in the encryption settings that is used 
for creating the SSLContext.

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 10508-2.2.patch
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-18 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Attachment: 10508-2.2.patch

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 10508-2.2.patch
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10956) Enable authentication of native protocol users via client certificates

2016-02-17 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150534#comment-15150534
 ] 

Stefan Podkowinski commented on CASSANDRA-10956:


This looks really useful, thanks! I've added some comments 
[here|https://github.com/spodkowinski/cassandra/commit/a8c95b62bfe0b0e3b692ec5c175c045cdcedb860]
 while looking at the code.  

However, I'm not sure if the user should be automatically authenticated as 
anonymous in case no client cert was presented. After all, the 
{{ICertificateAuthenticator}} was explicitly configured for the cluster, so 
this behavior is rather unexpected. We also don't do this for the 
{{PasswordAuthenticator}}, i.e. using anonymous in case the user decides not to 
login.

> Enable authentication of native protocol users via client certificates
> --
>
> Key: CASSANDRA-10956
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10956
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Samuel Klock
>Assignee: Samuel Klock
> Attachments: 10956.patch
>
>
> Currently, the native protocol only supports user authentication via SASL.  
> While this is adequate for many use cases, it may be superfluous in scenarios 
> where clients are required to present an SSL certificate to connect to the 
> server.  If the certificate presented by a client is sufficient by itself to 
> specify a user, then an additional (series of) authentication step(s) via 
> SASL merely add overhead.  Worse, for uses wherein it's desirable to obtain 
> the identity from the client's certificate, it's necessary to implement a 
> custom SASL mechanism to do so, which increases the effort required to 
> maintain both client and server and which also duplicates functionality 
> already provided via SSL/TLS.
> Cassandra should provide a means of using certificates for user 
> authentication in the native protocol without any effort above configuring 
> SSL on the client and server.  Here's a possible strategy:
> * Add a new authenticator interface that returns {{AuthenticatedUser}} 
> objects based on the certificate chain presented by the client.
> * If this interface is in use, the user is authenticated immediately after 
> the server receives the {{STARTUP}} message.  It then responds with a 
> {{READY}} message.
> * Otherwise, the existing flow of control is used (i.e., if the authenticator 
> requires authentication, then an {{AUTHENTICATE}} message is sent to the 
> client).
> One advantage of this strategy is that it is backwards-compatible with 
> existing schemes; current users of SASL/{{IAuthenticator}} are not impacted.  
> Moreover, it can function as a drop-in replacement for SASL schemes without 
> requiring code changes (or even config changes) on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11089) cassandra-stress should allow specifying the Java driver's protocol version to be used

2016-02-17 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150291#comment-15150291
 ] 

Stefan Podkowinski commented on CASSANDRA-11089:


Using the 3.x stress tool with a 2.2 cluster is already working as both 
versions support protocol v4. 

> cassandra-stress should allow specifying the Java driver's protocol version 
> to be used
> --
>
> Key: CASSANDRA-11089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11089
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>  Labels: stress
> Fix For: 3.x
>
>
> It would be useful to use *cassandra-stress* that is coming with C* 3.x 
> against a C* 2.x cluster. In order for that to work, we should allow 
> specifying the Java driver's protocol version to be used for the connection.
> See also 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L118-118



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2016-02-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148880#comment-15148880
 ] 

Stefan Podkowinski commented on CASSANDRA-9325:
---

Can be reproduced using  e.g. the following stress tool options:
{{./bin/cassandra-stress "write n=100k cl=ONE no-warmup" -transport 
truststore=$HOME/truststore.jks truststore-password=cassandra}}


> cassandra-stress requires keystore for SSL but provides no way to configure it
> --
>
> Key: CASSANDRA-9325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: J.B. Langston
>Assignee: Stefan Podkowinski
>  Labels: lhf, stress
> Fix For: 2.2.x
>
>
> Even though it shouldn't be required unless client certificate authentication 
> is enabled, the stress tool is looking for a keystore in the default location 
> of conf/.keystore with the default password of cassandra. There is no command 
> line option to override these defaults so you have to provide a keystore that 
> satisfies the default. It looks for conf/.keystore in the working directory, 
> so you need to create this in the directory you are running cassandra-stress 
> from.It doesn't really matter what's in the keystore; it just needs to exist 
> in the expected location and have a password of cassandra.
> Since the keystore might be required if client certificate authentication is 
> enabled, we need to add -transport parameters for keystore and 
> keystore-password.  Ideally, these should be optional and stress shouldn't 
> require the keystore unless client certificate authentication is enabled on 
> the server.
> In case it wasn't apparent, this is for Cassandra 2.1 and later's stress 
> tool.  I actually had even more problems getting Cassandra 2.0's stress tool 
> working with SSL and gave up on it.  We probably don't need to fix 2.0; we 
> can just document that it doesn't support SSL and recommend using 2.1 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2016-02-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski reassigned CASSANDRA-9325:
-

Assignee: Stefan Podkowinski

> cassandra-stress requires keystore for SSL but provides no way to configure it
> --
>
> Key: CASSANDRA-9325
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: J.B. Langston
>Assignee: Stefan Podkowinski
>  Labels: lhf, stress
> Fix For: 2.1.x
>
>
> Even though it shouldn't be required unless client certificate authentication 
> is enabled, the stress tool is looking for a keystore in the default location 
> of conf/.keystore with the default password of cassandra. There is no command 
> line option to override these defaults so you have to provide a keystore that 
> satisfies the default. It looks for conf/.keystore in the working directory, 
> so you need to create this in the directory you are running cassandra-stress 
> from.It doesn't really matter what's in the keystore; it just needs to exist 
> in the expected location and have a password of cassandra.
> Since the keystore might be required if client certificate authentication is 
> enabled, we need to add -transport parameters for keystore and 
> keystore-password.  Ideally, these should be optional and stress shouldn't 
> require the keystore unless client certificate authentication is enabled on 
> the server.
> In case it wasn't apparent, this is for Cassandra 2.1 and later's stress 
> tool.  I actually had even more problems getting Cassandra 2.0's stress tool 
> working with SSL and gave up on it.  We probably don't need to fix 2.0; we 
> can just document that it doesn't support SSL and recommend using 2.1 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10458) cqlshrc: add option to always use ssl

2016-02-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148780#comment-15148780
 ] 

Stefan Podkowinski commented on CASSANDRA-10458:


Do you mind to review this very simple patch, [~pauloricardomg]?

> cqlshrc: add option to always use ssl
> -
>
> Key: CASSANDRA-10458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10458
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Matt Wringe
>Assignee: Stefan Podkowinski
>  Labels: lhf
>
> I am currently running on a system in which my cassandra cluster is only 
> accessible over tls.
> The cqlshrc file is used to specify the host, the certificates and other 
> configurations, but one option its missing is to always connect over ssl.
> I would like to be able to call 'cqlsh' instead of always having to specify 
> 'cqlsh --ssl'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10458) cqlshrc: add option to always use ssl

2016-02-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski reassigned CASSANDRA-10458:
--

Assignee: Stefan Podkowinski

> cqlshrc: add option to always use ssl
> -
>
> Key: CASSANDRA-10458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10458
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Matt Wringe
>Assignee: Stefan Podkowinski
>  Labels: lhf
>
> I am currently running on a system in which my cassandra cluster is only 
> accessible over tls.
> The cqlshrc file is used to specify the host, the certificates and other 
> configurations, but one option its missing is to always connect over ssl.
> I would like to be able to call 'cqlsh' instead of always having to specify 
> 'cqlsh --ssl'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10724) Allow option to only encrypt username/password transfer, not data

2016-02-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148668#comment-15148668
 ] 

Stefan Podkowinski commented on CASSANDRA-10724:


Username/password authentication is only taking place for client-to-node 
communication at the beginning of _each_ connection using SASL over an 
unencrypted or TLS secured connection. In case of TLS, all further data will be 
send encrypted afterwards. I'm not aware of any ways to downgrade the TLS 
connection to plaintext after authentication, if that's what you're suggesting. 
Can you elaborate why you need to make sure to protect the user credentials, 
but would be fine by sending all actual data unencrypted?

> Allow option to only encrypt username/password transfer, not data
> -
>
> Key: CASSANDRA-10724
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10724
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Thom Valley
>Priority: Minor
>
> Turning on SSL for both client->node and node->node connections is a resource 
> intensive (expensive) operation.
> Being able to only encrypt the username/password when passed (or looked up) 
> as an option would greatly reduce the encryption / decryption overhead 
> created by turning on SSL for all traffic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9220) Hostname verification for node-to-node encryption

2016-02-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-9220:
--
Attachment: (was: sslhostverification-2.0.patch)

> Hostname verification for node-to-node encryption
> -
>
> Key: CASSANDRA-9220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9220
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 3.x
>
>
> This patch will will introduce a new ssl server option: 
> {{require_endpoint_verification}}. 
> Setting it will enable hostname verification for inter-node SSL 
> communication. This is necessary to prevent man-in-the-middle attacks when 
> building a trust chain against a common CA. See 
> [here|https://tersesystems.com/2014/03/23/fixing-hostname-verification/] for 
> background details. 
> Clusters that solely rely on importing all node certificates into each trust 
> store (as described 
> [here|http://docs.datastax.com/en/cassandra/2.0/cassandra/security/secureSSLCertificates_t.html])
>  are not effected. 
> Clusters that use the same common CA to sign node certificates are 
> potentially affected. In case the CA signing process will allow other parties 
> to generate certs for different purposes, those certificates could in turn be 
> used for MITM attacks. The provided patch will allow to enable hostname 
> verification to make sure not only to check if the cert is valid but also if 
> it has been created for the host that we're about to connect.
> Corresponding dtest: [Test for 
> CASSANDRA-9220|https://github.com/riptano/cassandra-dtest/pull/237]
> Related patches from the client perspective: 
> [Java|https://datastax-oss.atlassian.net/browse/JAVA-716], 
> [Python|https://datastax-oss.atlassian.net/browse/PYTHON-296]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9220) Hostname verification for node-to-node encryption

2016-02-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148541#comment-15148541
 ] 

Stefan Podkowinski commented on CASSANDRA-9220:
---

I've now rebased and fixed the dtest and it is working fine now for me. Please 
go ahead if you want to continue review.

> Hostname verification for node-to-node encryption
> -
>
> Key: CASSANDRA-9220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9220
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 3.x
>
> Attachments: sslhostverification-2.0.patch
>
>
> This patch will will introduce a new ssl server option: 
> {{require_endpoint_verification}}. 
> Setting it will enable hostname verification for inter-node SSL 
> communication. This is necessary to prevent man-in-the-middle attacks when 
> building a trust chain against a common CA. See 
> [here|https://tersesystems.com/2014/03/23/fixing-hostname-verification/] for 
> background details. 
> Clusters that solely rely on importing all node certificates into each trust 
> store (as described 
> [here|http://docs.datastax.com/en/cassandra/2.0/cassandra/security/secureSSLCertificates_t.html])
>  are not effected. 
> Clusters that use the same common CA to sign node certificates are 
> potentially affected. In case the CA signing process will allow other parties 
> to generate certs for different purposes, those certificates could in turn be 
> used for MITM attacks. The provided patch will allow to enable hostname 
> verification to make sure not only to check if the cert is valid but also if 
> it has been created for the host that we're about to connect.
> Corresponding dtest: [Test for 
> CASSANDRA-9220|https://github.com/riptano/cassandra-dtest/pull/237]
> Related patches from the client perspective: 
> [Java|https://datastax-oss.atlassian.net/browse/JAVA-716], 
> [Python|https://datastax-oss.atlassian.net/browse/PYTHON-296]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9220) Hostname verification for node-to-node encryption

2016-02-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-9220:
--
Description: 
This patch will will introduce a new ssl server option: 
{{require_endpoint_verification}}. 

Setting it will enable hostname verification for inter-node SSL communication. 
This is necessary to prevent man-in-the-middle attacks when building a trust 
chain against a common CA. See 
[here|https://tersesystems.com/2014/03/23/fixing-hostname-verification/] for 
background details. 

Clusters that solely rely on importing all node certificates into each trust 
store (as described 
[here|http://docs.datastax.com/en/cassandra/2.0/cassandra/security/secureSSLCertificates_t.html])
 are not effected. 

Clusters that use the same common CA to sign node certificates are potentially 
affected. In case the CA signing process will allow other parties to generate 
certs for different purposes, those certificates could in turn be used for MITM 
attacks. The provided patch will allow to enable hostname verification to make 
sure not only to check if the cert is valid but also if it has been created for 
the host that we're about to connect.

Corresponding dtest: [Test for 
CASSANDRA-9220|https://github.com/riptano/cassandra-dtest/pull/237]

Related patches from the client perspective: 
[Java|https://datastax-oss.atlassian.net/browse/JAVA-716], 
[Python|https://datastax-oss.atlassian.net/browse/PYTHON-296]

  was:
This patch will will introduce a new ssl server option: 
{{require_endpoint_verification}}. 

Setting it will enable hostname verification for inter-node SSL communication. 
This is necessary to prevent man-in-the-middle attacks when building a trust 
chain against a common CA. See 
[here|https://tersesystems.com/2014/03/23/fixing-hostname-verification/] for 
background details. 

Clusters that solely rely on importing all node certificates into each trust 
store (as described 
[here|http://docs.datastax.com/en/cassandra/2.0/cassandra/security/secureSSLCertificates_t.html])
 are not effected. 

Clusters that use the same common CA to sign node certificates are potentially 
affected. In case the CA signing process will allow other parties to generate 
certs for different purposes, those certificates could in turn be used for MITM 
attacks. The provided patch will allow to enable hostname verification to make 
sure not only to check if the cert is valid but also if it has been created for 
the host that we're about to connect.

Corresponding dtest: [Test for 
CASSANDRA-9220|https://github.com/riptano/cassandra-dtest/pull/237]

Github: 
2.0 -> 
[diff|https://github.com/apache/cassandra/compare/cassandra-2.0...spodkowinski:feat/sslhostverification],
 
[patch|https://github.com/apache/cassandra/compare/cassandra-2.0...spodkowinski:feat/sslhostverification.patch],
Trunk -> 
[diff|https://github.com/apache/cassandra/compare/trunk...spodkowinski:feat/sslhostverification],
 
[patch|https://github.com/apache/cassandra/compare/trunk...spodkowinski:feat/sslhostverification.patch]

Related patches from the client perspective: 
[Java|https://datastax-oss.atlassian.net/browse/JAVA-716], 
[Python|https://datastax-oss.atlassian.net/browse/PYTHON-296]


> Hostname verification for node-to-node encryption
> -
>
> Key: CASSANDRA-9220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9220
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 3.x
>
> Attachments: sslhostverification-2.0.patch
>
>
> This patch will will introduce a new ssl server option: 
> {{require_endpoint_verification}}. 
> Setting it will enable hostname verification for inter-node SSL 
> communication. This is necessary to prevent man-in-the-middle attacks when 
> building a trust chain against a common CA. See 
> [here|https://tersesystems.com/2014/03/23/fixing-hostname-verification/] for 
> background details. 
> Clusters that solely rely on importing all node certificates into each trust 
> store (as described 
> [here|http://docs.datastax.com/en/cassandra/2.0/cassandra/security/secureSSLCertificates_t.html])
>  are not effected. 
> Clusters that use the same common CA to sign node certificates are 
> potentially affected. In case the CA signing process will allow other parties 
> to generate certs for different purposes, those certificates could in turn be 
> used for MITM attacks. The provided patch will allow to enable hostname 
> verification to make sure not only to check if the cert is valid but also if 
> it has been created for the host that we're about to connect.
> Corresponding dtest: [Test for 
> CASSANDRA-9220|https://github.com/riptano/cassandra-dtest/pull/237]
> Related patches from the client 

[jira] [Updated] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-16 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-10508:
---
Issue Type: Bug  (was: Improvement)

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10508) Remove hard-coded SSL cipher suites and protocols

2016-02-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148517#comment-15148517
 ] 

Stefan Podkowinski commented on CASSANDRA-10508:


I've now created a new patch that would use the JVM default ciphers instead of 
the hardcoded list as described in the description, while still preserving the 
option to specify a custom list of filters to use. Branch can be found at 
[CASSANDRA-10508-2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...spodkowinski:CASSANDRA-10508-2.2]
 (merges up cleanly).

I'm also changing the ticket type to "Bug" as {{filterCipherSuites()}}  will 
use weaker {{AES_128}} ciphers randomly (even with strong crypto extensions 
installed), due to the fact that it was not preserving the sequence of 
preferred ciphers. This has been fixed and addressed in a unit test. Do you 
mind having a look [~snazy]?

> Remove hard-coded SSL cipher suites and protocols
> -
>
> Key: CASSANDRA-10508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: lhf
> Fix For: 3.x
>
>
> Currently each SSL connections will be initialized using a hard-coded list of 
> protocols ("SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2") and cipher suites. We 
> now require Java 8 which comes with solid defaults for these kind of SSL 
> settings and I'm wondering if the current behavior shouldn't be re-evaluated. 
> In my impression the way cipher suites are currently whitelisted is 
> problematic, as this will prevent the JVM from using more recent and more 
> secure suites that haven't been added to the hard-coded list. JVM updates may 
> also cause issues in case the limited number of ciphers cannot be used, e.g. 
> see CASSANDRA-6613.
> Looking at the source I've also stumbled upon a bug in the 
> {{filterCipherSuites()}} method that would return the filtered list of 
> ciphers in undetermined order where the result is passed to 
> {{setEnabledCipherSuites()}}. However, the list of ciphers will reflect the 
> order of preference 
> ([source|https://bugs.openjdk.java.net/browse/JDK-8087311]) and therefore you 
> may end up with weaker algorithms on the top. Currently it's not that 
> critical, as we only whitelist a couple of ciphers anyway. But it adds to the 
> question if it still really makes sense to work with the cipher list at all 
> in the Cassandra code base.
> Another way to effect used ciphers is by changing the security properties. 
> This is a more versatile way to work with cipher lists instead of relying on 
> hard-coded values, see 
> [here|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#DisabledAlgorithms]
>  for details.
> The same applies to the protocols. Introduced in CASSANDRA-8265 to prevent 
> SSLv3 attacks, this is not necessary anymore as SSLv3 is now blacklisted 
> anyway and will stop using safer protocol sets on new JVM releases or user 
> request. Again, we should stick with the JVM defaults. Using the 
> {{jdk.tls.client.protocols}} systems property will always allow to restrict 
> the set of protocols in case another emergency fix is needed. 
> You can find a patch with where I ripped out the mentioned options here:
> [Diff 
> trunk|https://github.com/apache/cassandra/compare/trunk...spodkowinski:fix/ssloptions]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    6   7   8   9   10   11   12   >