[jira] [Updated] (CASSANDRA-16126) Blog post on Cassandra Usage Report 2020

2020-09-16 Thread Sankalp Kohli (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankalp Kohli updated CASSANDRA-16126:
--
Reviewers: Nate McCall  (was: Nate McCall, Sankalp Kohli)

> Blog post on Cassandra Usage Report 2020
> 
>
> Key: CASSANDRA-16126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16126
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Blog
>Reporter: Melissa Logan
>Assignee: Melissa Logan
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Blog post on the 2020 Cassandra Usage Report.
> ML: 
> [https://lists.apache.org/thread.html/r8a91b1d63079739c8dfbdab1833c66d58326de91a659afa3fb2a41a7%40%3Cdev.cassandra.apache.org%3E]
>  
> PR: [https://github.com/apache/cassandra-website/pull/21] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14939) fix some operational holes in incremental repair

2020-05-12 Thread Sankalp Kohli (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105760#comment-17105760
 ] 

Sankalp Kohli commented on CASSANDRA-14939:
---

I think we should do this as part of 4.0-beta as it is important to users who 
will use IR in 4.0. IR has major changes in 4.0 and we hope lot more users will 
use this in 4.0!!

 

cc [~jwest]

> fix some operational holes in incremental repair
> 
>
> Key: CASSANDRA-14939
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14939
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> Incremental repair has a few operational rough spots that make it more 
> difficult to fully automate and operate at scale than it should be.
> * Visibility into whether pending repair data exists for a given token range.
> * Ability to force promotion/demotion of data for completed sessions instead 
> of waiting for compaction.
> * Get the most recent repairedAt timestamp for a given token range.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12197) Integrate top threads command in nodetool

2019-11-15 Thread Sankalp Kohli (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975415#comment-16975415
 ] 

Sankalp Kohli commented on CASSANDRA-12197:
---

Should we not expose this via virtual tables rather than nodetool?

> Integrate top threads command in nodetool
> -
>
> Key: CASSANDRA-12197
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12197
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/nodetool
>Reporter: J.B. Langston
>Assignee: Ekaterina Dimitrova
>Priority: Low
>
> SJK (https://github.com/aragozin/jvm-tools) has a command called ttop that 
> displays the top threads within the JVM, sorted either by CPU utilization or 
> heap allocation rate. When diagnosing garbage collection or high cpu 
> utilization, this is very helpful information.  It would be great if users 
> can get this directly with nodetool without having to download something 
> else.  SJK is Apache 2.0 licensed so it might be possible leverage its code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15351) Allow configuring timeouts on the per-request basis

2019-10-10 Thread Sankalp Kohli (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948838#comment-16948838
 ] 

Sankalp Kohli commented on CASSANDRA-15351:
---

Dupe of https://issues.apache.org/jira/browse/CASSANDRA-2848 ??

> Allow configuring timeouts on the per-request basis
> ---
>
> Key: CASSANDRA-15351
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15351
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Messaging/Client
>Reporter: Alex Petrov
>Priority: Normal
>
> Some queries need to be ran with a higher timeout value, which should be 
> possible without allowing _all_ requests to be above this value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14723) Increase write availability during range movements

2019-07-26 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894032#comment-16894032
 ] 

sankalp kohli commented on CASSANDRA-14723:
---

Another extension of this idea is to use IR to reduce the window in which we 
need this. Here is how it will work and I think deserve another Jira
 # Dont bump the consistency level or do anything special when a new machine is 
joining as part of expansion.
 # When the new machine has streamed all the data, we now bump consistency 
level to require 1 more ack or use this idea. 
 # We run IR on all nodes including new one(need majority to be up).
 # New node take over membership. 

Assuming IR to be faster than bootstrap, we will reduce the window in which we 
will need more acks. 

This idea will be possible only when IR is running as part of the database and 
is run regularly and is not backed up. 

> Increase write availability during range movements
> --
>
> Key: CASSANDRA-14723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14723
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.x
>
>
> If pending ranges were to be paired with their ‘natural’ replicas, instead of 
> requiring QUORUM+#pending, we could instead require QUORUM, but count both 
> owners of the pending range as a combined 1 - where they are only counted if 
> both return success.
> As an example, in a cluster with RF=3, a pending range movement would 
> currently require 3 nodes to respond, so must include at least one of the 
> pending nodes.  Under the proposed scheme, the 2 non-participating nodes 
> could respond as normal to reach a QUORUM, not requiring either of the 
> pending nodes to respond.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630772#comment-16630772
 ] 

sankalp kohli commented on CASSANDRA-12126:
---

I agree with [~jjordan] that this is a correct response. 

Also in the future please open a new Jira if it is a different issue. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629689#comment-16629689
 ] 

sankalp kohli commented on CASSANDRA-12126:
---

Why is end result not correct? second and third operation did not succeed 
because first 1 did not finish? Can you combine the example with the earlier 
comment please

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-08-20 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586492#comment-16586492
 ] 

sankalp kohli commented on CASSANDRA-14346:
---

As per dev mailing list, Reaper is also being considered for this which is 
great news. Lets see how we can get the best out of these implementations 

[~michaelsembwever] any timeline when we can expect a patch for it? 

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-08-14 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580734#comment-16580734
 ] 

sankalp kohli commented on CASSANDRA-14346:
---

4.0 freeze is 2-3 weeks away, I am not sure this can make it to 4.0. We can 
always start the review but I dont think it can merge by 1st September.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, 
> CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10023) Emit a metric for number of local read and write calls

2018-07-20 Thread sankalp kohli (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-10023:
--
Reviewer:   (was: sankalp kohli)

> Emit a metric for number of local read and write calls
> --
>
> Key: CASSANDRA-10023
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10023
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Damien Stevenson
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: 10023-trunk-dtests.txt, 10023-trunk.txt, 
> CASSANDRA-10023.patch
>
>
> Many C* drivers have feature to be replica aware and chose the co-ordinator 
> which is a replica. We should add a metric which tells us whether all calls 
> to the co-ordinator are replica aware.
> We have seen issues where client thinks they are replica aware when they 
> forget to add routing key at various places in the code. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14532) Partition level deletions past GCGS are not propagated/merged on read

2018-06-20 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518337#comment-16518337
 ] 

sankalp kohli commented on CASSANDRA-14532:
---

>From the response form the dev list: The solution here to start using 
>incremental repair and use Marcus patch to only drop tombstones from repaired 
>data. This way if incremental repair is not run within gc grace, it will not 
>cause this issue. 

> Partition level deletions past GCGS are not propagated/merged on read
> -
>
> Key: CASSANDRA-14532
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14532
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>
> So as [~jay.zhuang] mentioned on the mailing list here, it appears that 
> partition deletions that have passed GCGS are not propagated/merged properly 
> on read, and also not repaired via read repair.
> Steps to reproduce:
> {code}
> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> create table test.test (id int PRIMARY KEY , data text) WITH gc_grace_seconds 
> = 10;
> CONSISTENCY ALL;
> INSERT INTO test.test (id, data) values (1, 'test');
> ccm node2 stop
> CONSISTENCY QUORUM;
> DELETE from test.test where id = 1; // wait 10 seconds so HH doesn't 
> propagate tombstone when starting node2
> select * from test.test where id = 1 ;
>  id | data
> +--
> (0 rows)
> ccm node2 start
> CONSISTENCY ALL;
> select * from test.test where id = 1 ;
>  id | data
> +--
>   1 | test
> alter table test.test WITH gc_grace_seconds = 10; // GC
> select * from test.test where id = 1 ;
>  id | data
> +--
> (0 rows)
> {code}
> We've also found a seemingly related issue in compaction where trying to 
> compact an SSTable which contains the partition deletion post GCGS, the 
> partition deletion won't be removed via compaction. Likely the same code is 
> causing both bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-11 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508939#comment-16508939
 ] 

sankalp kohli commented on CASSANDRA-14499:
---

Why not add nodetool stop insert/delete/select and have the side car call these 
when required?

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-11 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508936#comment-16508936
 ] 

sankalp kohli commented on CASSANDRA-14499:
---

We should give a way to recover after an application has filled a node. Can we 
allow deletes or truncate? 

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14381) nodetool listsnapshots is missing snapshots

2018-04-13 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437587#comment-16437587
 ] 

sankalp kohli commented on CASSANDRA-14381:
---

We can include it as I dont see a harm in listing it. 

> nodetool listsnapshots is missing snapshots
> ---
>
> Key: CASSANDRA-14381
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14381
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: MacOs 10.12.5
> Java 1.8.0_144
> Cassandra 3.11.2 (brew install)
>Reporter: Cyril Scetbon
>Priority: Major
>
> The output of *nodetool listsnapshots* is inconsistent with the snapshots 
> created :
> {code:java}
> $ nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
> $ nodetool snapshot -t tag1 --table local system
> Requested creating snapshot(s) for [system] with snapshot name [tag1] and 
> options {skipFlush=false}
> Snapshot directory: tag1
> $ nodetool snapshot -t tag2 --table local system
> Requested creating snapshot(s) for [system] with snapshot name [tag2] and 
> options {skipFlush=false}
> Snapshot directory: tag2
> $ nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
> $ ls 
> /usr/local/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/snapshots/
> tag1 tag2{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14381) nodetool listsnapshots is missing snapshots

2018-04-13 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437577#comment-16437577
 ] 

sankalp kohli commented on CASSANDRA-14381:
---

I dont remember fully but why we need snapshot for internal tables

> nodetool listsnapshots is missing snapshots
> ---
>
> Key: CASSANDRA-14381
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14381
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: MacOs 10.12.5
> Java 1.8.0_144
> Cassandra 3.11.2 (brew install)
>Reporter: Cyril Scetbon
>Priority: Major
>
> The output of *nodetool listsnapshots* is inconsistent with the snapshots 
> created :
> {code:java}
> $ nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
> $ nodetool snapshot -t tag1 --table local system
> Requested creating snapshot(s) for [system] with snapshot name [tag1] and 
> options {skipFlush=false}
> Snapshot directory: tag1
> $ nodetool snapshot -t tag2 --table local system
> Requested creating snapshot(s) for [system] with snapshot name [tag2] and 
> options {skipFlush=false}
> Snapshot directory: tag2
> $ nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
> $ ls 
> /usr/local/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/snapshots/
> tag1 tag2{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9261) Prepare and Snapshot for repairs should use higher timeouts for expiring map

2018-03-29 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419107#comment-16419107
 ] 

sankalp kohli commented on CASSANDRA-9261:
--

Did it wait for 1 hour before timing out?

> Prepare and Snapshot for repairs should use higher timeouts for expiring map
> 
>
> Key: CASSANDRA-9261
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9261
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
> Fix For: 2.1.6, 2.2.0 beta 1
>
> Attachments: 0001-make-prepare-snapshot-timeout-to-1-hour.patch, 
> trunk_9261.txt
>
>
> We wait for 1 hour after sending the prepare message but expiring map will 
> remove it after RPC timeout. 
> In snapshot during repair, we only wait for RPC time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2018-03-22 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410535#comment-16410535
 ] 

sankalp kohli commented on CASSANDRA-7622:
--

[~djoshi3] Can do it [~jjirsa]

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Chris Lohfink
>Priority: Major
> Fix For: 4.x
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12941) Backport CASSANDRA-9967

2018-02-25 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16376289#comment-16376289
 ] 

sankalp kohli commented on CASSANDRA-12941:
---

[~haijuncao] Looks like no activity on this Jira for long and also 3.0 release 
is much ahead. Can I close this as wont fix?

> Backport CASSANDRA-9967
> ---
>
> Key: CASSANDRA-12941
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12941
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Materialized Views, Observability
>Reporter: Haijun Cao
>Priority: Trivial
> Fix For: 3.0.x
>
> Attachments: 12941-3.0.txt
>
>
> Backport CASSANDRA-9967
> Materialized view is available for use in 3.0.x, it would be nice to check 
> view built status by issuing one CQL query against system_distributed table, 
> hence back port CASSANDRA-9967 to 3.0.x.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14197) SSTable upgrade should be automatic

2018-02-08 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-14197:
--
Reviewer: Ariel Weisberg

> SSTable upgrade should be automatic
> ---
>
> Key: CASSANDRA-14197
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14197
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> Upgradesstables should run automatically on node upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14217) nodetool verify needs to use the correct digest file and reload sstable metadata

2018-02-08 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-14217:
--
Reviewer: Ariel Weisberg

> nodetool verify needs to use the correct digest file and reload sstable 
> metadata
> 
>
> Key: CASSANDRA-14217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14217
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x
>
>
> {{nodetool verify}} tries to use the wrong digest file when verifying old 
> version sstables and it also needs to reload the sstable metadata and notify 
> compaction strategies when it mutates the repaired at field



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14201) Add a few options to nodetool verify

2018-02-08 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-14201:
--
Reviewer: Ariel Weisberg

> Add a few options to nodetool verify
> 
>
> Key: CASSANDRA-14201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14201
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> {{nodetool verify}} currently invokes the disk failure policy when it finds a 
> corrupt sstable - we should add an option to avoid that. It should also have 
> an option to check if all sstables are the latest version to be able to run 
> {{nodetool verify}} as a pre-upgrade check



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14151) [TRUNK] TestRepair.test_dead_sync_initiator failed due to ERROR in logs "SSTableTidier ran with no existing data file for an sstable that was not new"

2018-02-08 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-14151:
--
Reviewer: Blake Eggleston

> [TRUNK] TestRepair.test_dead_sync_initiator failed due to ERROR in logs 
> "SSTableTidier ran with no existing data file for an sstable that was not new"
> --
>
> Key: CASSANDRA-14151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14151
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log, 
> stdout-novnodes.txt
>
>
> TestRepair.test_dead_sync_initiator failed due to finding the following 
> unexpected error in the node's logs:
> {code}
> ERROR [NonPeriodicTasks:1] 2018-01-06 03:38:50,229 LogTransaction.java:347 - 
> SSTableTidier ran with no existing data file for an sstable that was not new
> {code}
> If this is "okay/expected" behavior we should change the log level to 
> something different (which will fix the test) or if it's an actual bug use 
> this JIRA to fix it. I've attached all of the logs from all 3 instances from 
> the dtest run that hit this failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14187) [DTEST] repair_tests/repair_test.py:TestRepair.simple_sequential_repair_test

2018-02-08 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-14187:
--
Reviewer: Blake Eggleston

> [DTEST] repair_tests/repair_test.py:TestRepair.simple_sequential_repair_test
> 
>
> Key: CASSANDRA-14187
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14187
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>
> Getting all rows from a node times out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14145) Detecting data resurrection during read

2018-01-03 Thread sankalp kohli (JIRA)
sankalp kohli created CASSANDRA-14145:
-

 Summary:  Detecting data resurrection during read
 Key: CASSANDRA-14145
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14145
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Priority: Minor


We have seen several bugs in which deleted data gets resurrected. We should try 
to see if we can detect this on the read path and possibly fix it. Here are a 
few examples which brought back data

A replica lost an sstable on startup which caused one replica to lose the 
tombstone and not the data. This tombstone was past gc grace which means this 
could resurrect data. We can deduct such invalid states by looking at other 
replicas. 

If we are running incremental repair, Cassandra will keep repaired and 
non-repaired data separate. Every-time incremental repair will run, it will 
move the data from non-repaired to repaired. Repaired data across all replicas 
should be 100% consistent. 

Here is an example of how we can detect and mitigate the issue in most cases. 
Say we have 3 machines, A,B and C. All these machines will have data split b/w 
repaired and non-repaired. 
1. Machine A due to some bug bring backs data D. This data D is in repaired 
dataset. All other replicas will have data D and tombstone T 
2. Read for data D comes from application which involve replicas A and B. The 
data being read involves data which is in repaired state.  A will respond back 
to co-ordinator with data D and B will send nothing as tombstone is past gc 
grace. This will cause digest mismatch. 
3. This patch will only kick in when there is a digest mismatch. Co-ordinator 
will ask both replicas to send back all data like we do today but with this 
patch, replicas will respond back what data it is returning is coming from 
repaired vs non-repaired. If data coming from repaired does not match, we know 
there is a something wrong!! At this time, co-ordinator cannot determine if 
replica A has resurrected some data or replica B has lost some data. We can 
still log error in the logs saying we hit an invalid state.
4. Besides the log, we can take this further and even correct the response to 
the query. After logging an invalid state, we can ask replica A and B (and also 
C if alive) to send back all data for this including gcable tombstones. If any 
machine returns a tombstone which is after this data, we know we cannot return 
this data. This way we can avoid returning data which has been deleted. 

Some Challenges with this 
1. When data will be moved from non-repaired to repaired, there could be a race 
here. We can look at which incremental repairs have promoted things on which 
replica to avoid false positives.  
2. If the third replica is down and live replica does not have any tombstone, 
we wont be able to break the tie in deciding whether data was actually deleted 
or resurrected. 
3. If the read is for latest data only, we wont be able to detect it as the 
read will be served from non-repaired data. 
4. If the replica where we lose a tombstone is the last replica to compact the 
tombstone, we wont be able to decide if data is coming back or rest of the 
replicas has lost that data. But we will still detect something is wrong. 
5. We wont affect 99.9% of the read queries as we only do extra work during 
digest mismatch.
6. CL.ONE reads will not be able to detect this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability

2017-11-13 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250338#comment-16250338
 ] 

sankalp kohli commented on CASSANDRA-13987:
---

+1 for doing this in 3.0+ 

> Multithreaded commitlog subtly changed durability
> -
>
> Key: CASSANDRA-13987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
> Fix For: 4.x
>
>
> When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly 
> changed the way that commitlog durability worked. Everything still gets 
> written to an mmap file. However, not everything is replayable from the 
> mmaped file after a process crash, in periodic mode.
> In brief, the reason this changesd is due to the chained markers that are 
> required for the multithreaded commit log. At each msync, we wait for 
> outstanding mutations to serialize into the commitlog, and update a marker 
> before and after the commits that have accumluated since the last sync. With 
> those markers, we can safely replay that section of the commitlog. Without 
> the markers, we have no guarantee that the commits in that section were 
> successfully written, thus we abandon those commits on replay.
> If you have correlated process failures of multiple nodes at "nearly" the 
> same time (see ["There Is No 
> Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have 
> data loss if none of the nodes msync the commitlog. For example, with RF=3, 
> if quorum write succeeds on two nodes (and we acknowledge the write back to 
> the client), and then the process on both nodes OOMs (say, due to reading the 
> index for a 100GB partition), the write will be lost if neither process 
> msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. 
> The reason why this data is silently lost is due to the chained markers that 
> were introduced with CASSANDRA-3578.
> The problem we are addressing with this ticket is incrementally improving 
> 'durability' due to process crash, not host crash. (Note: operators should 
> use batch mode to ensure greater durability, but batch mode in it's current 
> implementation is a) borked, and b) will burn through, *very* rapidly, SSDs 
> that don't have a non-volatile write cache sitting in front.) 
> The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which 
> means that a node could lose up to ten seconds of data due to process crash. 
> The unfortunate thing is that the data is still avaialble, in the mmap file, 
> but we can't replay it due to incomplete chained markers.
> ftr, I don't believe we've ever had a stated policy about commitlog 
> durability wrt process crash. Pre-2.0 we naturally piggy-backed off the 
> memory mapped file and the fact that every mutation was acquired a lock and 
> wrote into the mmap buffer, and the ability to replay everything out of it 
> came for free. With CASSANDRA-3578, that was subtly changed. 
> Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust 
> the durability 
> guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit]
>  of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm 
> using that idea as a loose springboard for what to do here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability

2017-11-13 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250338#comment-16250338
 ] 

sankalp kohli edited comment on CASSANDRA-13987 at 11/13/17 10:07 PM:
--

 +1 for doing this in 3.0+ 


was (Author: kohlisankalp):
+1 for doing this in 3.0+ 

> Multithreaded commitlog subtly changed durability
> -
>
> Key: CASSANDRA-13987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
> Fix For: 4.x
>
>
> When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly 
> changed the way that commitlog durability worked. Everything still gets 
> written to an mmap file. However, not everything is replayable from the 
> mmaped file after a process crash, in periodic mode.
> In brief, the reason this changesd is due to the chained markers that are 
> required for the multithreaded commit log. At each msync, we wait for 
> outstanding mutations to serialize into the commitlog, and update a marker 
> before and after the commits that have accumluated since the last sync. With 
> those markers, we can safely replay that section of the commitlog. Without 
> the markers, we have no guarantee that the commits in that section were 
> successfully written, thus we abandon those commits on replay.
> If you have correlated process failures of multiple nodes at "nearly" the 
> same time (see ["There Is No 
> Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have 
> data loss if none of the nodes msync the commitlog. For example, with RF=3, 
> if quorum write succeeds on two nodes (and we acknowledge the write back to 
> the client), and then the process on both nodes OOMs (say, due to reading the 
> index for a 100GB partition), the write will be lost if neither process 
> msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. 
> The reason why this data is silently lost is due to the chained markers that 
> were introduced with CASSANDRA-3578.
> The problem we are addressing with this ticket is incrementally improving 
> 'durability' due to process crash, not host crash. (Note: operators should 
> use batch mode to ensure greater durability, but batch mode in it's current 
> implementation is a) borked, and b) will burn through, *very* rapidly, SSDs 
> that don't have a non-volatile write cache sitting in front.) 
> The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which 
> means that a node could lose up to ten seconds of data due to process crash. 
> The unfortunate thing is that the data is still avaialble, in the mmap file, 
> but we can't replay it due to incomplete chained markers.
> ftr, I don't believe we've ever had a stated policy about commitlog 
> durability wrt process crash. Pre-2.0 we naturally piggy-backed off the 
> memory mapped file and the fact that every mutation was acquired a lock and 
> wrote into the mmap buffer, and the ability to replay everything out of it 
> came for free. With CASSANDRA-3578, that was subtly changed. 
> Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust 
> the durability 
> guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit]
>  of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm 
> using that idea as a loose springboard for what to do here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2017-11-13 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249851#comment-16249851
 ] 

sankalp kohli commented on CASSANDRA-14013:
---

Which commit log mode are you using? 

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Gregor Uhlenheuer
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13924) Continuous/Infectious Repair

2017-11-09 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246676#comment-16246676
 ] 

sankalp kohli commented on CASSANDRA-13924:
---

for mutation tracking, will you ack back to replicas if hint succeeds in a 
reasonable time? 

> Continuous/Infectious Repair
> 
>
> Key: CASSANDRA-13924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13924
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Minor
>  Labels: CommunityFeedbackRequested
>
> eI've been working on a way to keep data consistent without 
> scheduled/external/manual repair, because for large datasets repair is 
> extremely expensive. The basic gist is to introduce a new kind of hint that 
> keeps just the primary key of the mutation (indicating that PK needs repair) 
> and is recorded on replicas instead of coordinators during write time. Then a 
> periodic background task can issue read repairs to just the PKs that were 
> mutated. The initial performance degradation of this approach is non trivial, 
> but I believe that I can optimize it so that we are doing very little 
> additional work (see below in the design doc for some proposed optimizations).
> My extremely rough proof of concept (uses a local table instead of 
> HintStorage, etc) so far is [in a 
> branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:continuous_repair]
>  and has a rough [design 
> document|https://github.com/jolynch/cassandra/blob/continuous_repair/doc/source/architecture/continuous_repair.rst].
>  I'm working on getting benchmarks of the various optimizations, but I 
> figured I should start this ticket before I got too deep into it.
> I believe this approach is particularly good for high read rate clusters 
> requiring consistent low latency, and for clusters that mutate a relatively 
> small proportion of their data (since you never have to read the whole 
> dataset, just what's being mutated). I view this as something that works 
> _with_ incremental repair to reduce work required because with this technique 
> we could potentially flush repaired + unrepaired sstables directly from the 
> memtable. I also see this as something that would be enabled or disabled per 
> table since it is so use case specific (e.g. some tables don't need repair at 
> all). I think this is somewhat of a hybrid approach based on incremental 
> repair, ticklers (read all partitions @ ALL), mutation based repair 
> (CASSANDRA-8911), and hinted handoff. There are lots of tradeoffs, but I 
> think it's worth talking about.
> If anyone has feedback on the idea, I'd love to chat about it. 
> [~bdeggleston], [~aweisberg] I chatted with you guys a bit about this at 
> NGCC; if you have time I'd love to continue that conversation here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13924) Continuous/Infectious Repair

2017-11-09 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246470#comment-16246470
 ] 

sankalp kohli commented on CASSANDRA-13924:
---

I like this idea but want to propose the following changes to it. 

We track in memtable at partition level what data was replicated to all 
replicas. This will require co-ordinator to update the replicas once data is 
acked from all replicas. 

We flush memtable as separate sstables containing repaired and non repaired 
data. Incremental repair will take care of non repaired data. 

Another optimization we can build on top of this is to flush only repaired data 
when we need to flush and keep non repaired for a little longer time. This will 
make sure they get ACKed from co-ordinator. Co-ordoinator can also ack back to 
replicas if hints were successfully delivered. 

> Continuous/Infectious Repair
> 
>
> Key: CASSANDRA-13924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13924
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Minor
>  Labels: CommunityFeedbackRequested
>
> eI've been working on a way to keep data consistent without 
> scheduled/external/manual repair, because for large datasets repair is 
> extremely expensive. The basic gist is to introduce a new kind of hint that 
> keeps just the primary key of the mutation (indicating that PK needs repair) 
> and is recorded on replicas instead of coordinators during write time. Then a 
> periodic background task can issue read repairs to just the PKs that were 
> mutated. The initial performance degradation of this approach is non trivial, 
> but I believe that I can optimize it so that we are doing very little 
> additional work (see below in the design doc for some proposed optimizations).
> My extremely rough proof of concept (uses a local table instead of 
> HintStorage, etc) so far is [in a 
> branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:continuous_repair]
>  and has a rough [design 
> document|https://github.com/jolynch/cassandra/blob/continuous_repair/doc/source/architecture/continuous_repair.rst].
>  I'm working on getting benchmarks of the various optimizations, but I 
> figured I should start this ticket before I got too deep into it.
> I believe this approach is particularly good for high read rate clusters 
> requiring consistent low latency, and for clusters that mutate a relatively 
> small proportion of their data (since you never have to read the whole 
> dataset, just what's being mutated). I view this as something that works 
> _with_ incremental repair to reduce work required because with this technique 
> we could potentially flush repaired + unrepaired sstables directly from the 
> memtable. I also see this as something that would be enabled or disabled per 
> table since it is so use case specific (e.g. some tables don't need repair at 
> all). I think this is somewhat of a hybrid approach based on incremental 
> repair, ticklers (read all partitions @ ALL), mutation based repair 
> (CASSANDRA-8911), and hinted handoff. There are lots of tradeoffs, but I 
> think it's worth talking about.
> If anyone has feedback on the idea, I'd love to chat about it. 
> [~bdeggleston], [~aweisberg] I chatted with you guys a bit about this at 
> NGCC; if you have time I'd love to continue that conversation here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13924) Continuous/Infectious Repair

2017-11-09 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13924:
--
Description: 
eI've been working on a way to keep data consistent without 
scheduled/external/manual repair, because for large datasets repair is 
extremely expensive. The basic gist is to introduce a new kind of hint that 
keeps just the primary key of the mutation (indicating that PK needs repair) 
and is recorded on replicas instead of coordinators during write time. Then a 
periodic background task can issue read repairs to just the PKs that were 
mutated. The initial performance degradation of this approach is non trivial, 
but I believe that I can optimize it so that we are doing very little 
additional work (see below in the design doc for some proposed optimizations).

My extremely rough proof of concept (uses a local table instead of HintStorage, 
etc) so far is [in a 
branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:continuous_repair]
 and has a rough [design 
document|https://github.com/jolynch/cassandra/blob/continuous_repair/doc/source/architecture/continuous_repair.rst].
 I'm working on getting benchmarks of the various optimizations, but I figured 
I should start this ticket before I got too deep into it.

I believe this approach is particularly good for high read rate clusters 
requiring consistent low latency, and for clusters that mutate a relatively 
small proportion of their data (since you never have to read the whole dataset, 
just what's being mutated). I view this as something that works _with_ 
incremental repair to reduce work required because with this technique we could 
potentially flush repaired + unrepaired sstables directly from the memtable. I 
also see this as something that would be enabled or disabled per table since it 
is so use case specific (e.g. some tables don't need repair at all). I think 
this is somewhat of a hybrid approach based on incremental repair, ticklers 
(read all partitions @ ALL), mutation based repair (CASSANDRA-8911), and hinted 
handoff. There are lots of tradeoffs, but I think it's worth talking about.

If anyone has feedback on the idea, I'd love to chat about it. [~bdeggleston], 
[~aweisberg] I chatted with you guys a bit about this at NGCC; if you have time 
I'd love to continue that conversation here.

  was:
I've been working on a way to keep data consistent without 
scheduled/external/manual repair, because for large datasets repair is 
extremely expensive. The basic gist is to introduce a new kind of hint that 
keeps just the primary key of the mutation (indicating that PK needs repair) 
and is recorded on replicas instead of coordinators during write time. Then a 
periodic background task can issue read repairs to just the PKs that were 
mutated. The initial performance degradation of this approach is non trivial, 
but I believe that I can optimize it so that we are doing very little 
additional work (see below in the design doc for some proposed optimizations).

My extremely rough proof of concept (uses a local table instead of HintStorage, 
etc) so far is [in a 
branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:continuous_repair]
 and has a rough [design 
document|https://github.com/jolynch/cassandra/blob/continuous_repair/doc/source/architecture/continuous_repair.rst].
 I'm working on getting benchmarks of the various optimizations, but I figured 
I should start this ticket before I got too deep into it.

I believe this approach is particularly good for high read rate clusters 
requiring consistent low latency, and for clusters that mutate a relatively 
small proportion of their data (since you never have to read the whole dataset, 
just what's being mutated). I view this as something that works _with_ 
incremental repair to reduce work required because with this technique we could 
potentially flush repaired + unrepaired sstables directly from the memtable. I 
also see this as something that would be enabled or disabled per table since it 
is so use case specific (e.g. some tables don't need repair at all). I think 
this is somewhat of a hybrid approach based on incremental repair, ticklers 
(read all partitions @ ALL), mutation based repair (CASSANDRA-8911), and hinted 
handoff. There are lots of tradeoffs, but I think it's worth talking about.

If anyone has feedback on the idea, I'd love to chat about it. [~bdeggleston], 
[~aweisberg] I chatted with you guys a bit about this at NGCC; if you have time 
I'd love to continue that conversation here.


> Continuous/Infectious Repair
> 
>
> Key: CASSANDRA-13924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13924
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>

[jira] [Commented] (CASSANDRA-12373) 3.0 breaks CQL compatibility with super columns families

2017-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172132#comment-16172132
 ] 

sankalp kohli commented on CASSANDRA-12373:
---

Is this not too late for 3.0? 

> 3.0 breaks CQL compatibility with super columns families
> 
>
> Key: CASSANDRA-12373
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12373
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Sylvain Lebresne
>Assignee: Alex Petrov
> Fix For: 3.0.x, 3.11.x
>
>
> This is a follow-up to CASSANDRA-12335 to fix the CQL side of super column 
> compatibility.
> The details and a proposed solution can be found in the comments of 
> CASSANDRA-12335 but the crux of the issue is that super column famillies show 
> up differently in CQL in 3.0.x/3.x compared to 2.x, hence breaking backward 
> compatibilty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-08-21 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135790#comment-16135790
 ] 

sankalp kohli commented on CASSANDRA-8457:
--

[~slebresne]  I know you and Jason chatted on IRC. Are you +1 on 
this...Streaming patch has +1 from Ariel 

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
> Attachments: 8457-load.tgz
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13771) Emit metrics whenever we hit tombstone failures and warn thresholds

2017-08-17 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13771:
--
Reviewer: Marcus Eriksson

> Emit metrics whenever we hit tombstone failures and warn thresholds
> ---
>
> Key: CASSANDRA-13771
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13771
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: TIRU ADDANKI
>Assignee: TIRU ADDANKI
>Priority: Minor
> Attachments: 13771.patch
>
>
> Many a times we see cassandra timeouts, but unless we check the logs we won’t 
> be able to tell if the time outs are result of too many tombstones or some 
> other issue. It would be easier if we have metrics published whenever we hit 
> tombstone failure/warning thresholds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13771) Emit metrics whenever we hit tombstone failures and warn thresholds

2017-08-17 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130907#comment-16130907
 ] 

sankalp kohli commented on CASSANDRA-13771:
---

It will give a different picture I would say :). We also need to know about 
this metrics as well 

> Emit metrics whenever we hit tombstone failures and warn thresholds
> ---
>
> Key: CASSANDRA-13771
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13771
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: TIRU ADDANKI
>Assignee: TIRU ADDANKI
>Priority: Minor
> Attachments: 13771.patch
>
>
> Many a times we see cassandra timeouts, but unless we check the logs we won’t 
> be able to tell if the time outs are result of too many tombstones or some 
> other issue. It would be easier if we have metrics published whenever we hit 
> tombstone failure/warning thresholds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6246) EPaxos

2017-08-15 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127782#comment-16127782
 ] 

sankalp kohli commented on CASSANDRA-6246:
--

What are you looking for with this patch? 
It would help if you could rebase this patch and see if someone can review it. 

> EPaxos
> --
>
> Key: CASSANDRA-6246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Blake Eggleston
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is 
> that Multi-paxos requires leader election and hence, a period of 
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
> (2) is particularly useful across multiple datacenters, and (3) allows any 
> node to act as coordinator: 
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to 
> implement it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-10726) Read repair inserts should not be blocking

2017-08-10 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-10726:
-

Assignee: Xiaolong Jiang

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
>Assignee: Xiaolong Jiang
> Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10726) Read repair inserts should not be blocking

2017-08-10 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-10726:
--
Reviewer: Marcus Eriksson  (was: Blake Eggleston)

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
>Assignee: Xiaolong Jiang
> Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-10726) Read repair inserts should not be blocking

2017-08-10 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-10726:
-

Assignee: (was: Marcus Eriksson)

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
> Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-10726) Read repair inserts should not be blocking

2017-08-10 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-10726:
-

Assignee: Marcus Eriksson  (was: Xiaolong Jiang)

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
>Assignee: Marcus Eriksson
> Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable

2017-05-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018167#comment-16018167
 ] 

sankalp kohli commented on CASSANDRA-13508:
---

Your benchmark is on which version of C*? 

> Make system.paxos table compaction strategy configurable
> 
>
> Key: CASSANDRA-13508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13508
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
> Fix For: 4.0, 4.x
>
> Attachments: test11.png, test2.png
>
>
> The default compaction strategy for {{system.paxos}} table is LCS for 
> performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the 
> system is busy with {{system.paxos}} compaction.
> As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our 
> test, it significantly reduced the number of compaction without impacting the 
> latency too much:
> !test11.png!
> The time window for TWCS is set to 2 minutes for the test.
> Here is the p99 latency impact:
> !test2.png!
> the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% 
> increase.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation

2017-05-17 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014821#comment-16014821
 ] 

sankalp kohli commented on CASSANDRA-3200:
--

I think this will help a lot if you have many replicas. Reopening to see if we 
can work on it

> Repair: compare all trees together (for a given range/cf) instead of by pair 
> in isolation
> -
>
> Key: CASSANDRA-3200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3200
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: repair
>
> Currently, repair compare merkle trees by pair, in isolation of any other 
> tree. What that means concretely is that if I have three node A, B and C 
> (RF=3) with A and B in sync, but C having some range r inconsitent with both 
> A and B (since those are consistent), we will do the following transfer of r: 
> A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know 
> which one is more to date from A or C. However, the transfer B -> C is 
> useless provided we do A -> C if A and B are in sync. Not doing that transfer 
> will be a 25% improvement in that case. With RF=5 and only one node 
> inconsistent with all the others, that almost a 40% improvement, etc...
> Given that this situation of one node not in sync while the others are is 
> probably fairly common (one node died so it is behind), this could be a fair 
> improvement over what is transferred. In the case where we use repair to 
> rebuild completely a node, this will be a dramatic improvement, because it 
> will avoid the rebuilded node to get RF times the data it should get.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation

2017-05-17 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-3200:


Assignee: (was: Sylvain Lebresne)

> Repair: compare all trees together (for a given range/cf) instead of by pair 
> in isolation
> -
>
> Key: CASSANDRA-3200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3200
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
>  Labels: repair
>
> Currently, repair compare merkle trees by pair, in isolation of any other 
> tree. What that means concretely is that if I have three node A, B and C 
> (RF=3) with A and B in sync, but C having some range r inconsitent with both 
> A and B (since those are consistent), we will do the following transfer of r: 
> A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know 
> which one is more to date from A or C. However, the transfer B -> C is 
> useless provided we do A -> C if A and B are in sync. Not doing that transfer 
> will be a 25% improvement in that case. With RF=5 and only one node 
> inconsistent with all the others, that almost a 40% improvement, etc...
> Given that this situation of one node not in sync while the others are is 
> probably fairly common (one node died so it is behind), this could be a fair 
> improvement over what is transferred. In the case where we use repair to 
> rebuild completely a node, this will be a dramatic improvement, because it 
> will avoid the rebuilded node to get RF times the data it should get.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Reopened] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation

2017-05-17 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reopened CASSANDRA-3200:
--

> Repair: compare all trees together (for a given range/cf) instead of by pair 
> in isolation
> -
>
> Key: CASSANDRA-3200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3200
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: repair
>
> Currently, repair compare merkle trees by pair, in isolation of any other 
> tree. What that means concretely is that if I have three node A, B and C 
> (RF=3) with A and B in sync, but C having some range r inconsitent with both 
> A and B (since those are consistent), we will do the following transfer of r: 
> A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know 
> which one is more to date from A or C. However, the transfer B -> C is 
> useless provided we do A -> C if A and B are in sync. Not doing that transfer 
> will be a 25% improvement in that case. With RF=5 and only one node 
> inconsistent with all the others, that almost a 40% improvement, etc...
> Given that this situation of one node not in sync while the others are is 
> probably fairly common (one node died so it is behind), this could be a fair 
> improvement over what is transferred. In the case where we use repair to 
> rebuild completely a node, this will be a dramatic improvement, because it 
> will avoid the rebuilded node to get RF times the data it should get.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2017-05-15 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011512#comment-16011512
 ] 

sankalp kohli commented on CASSANDRA-7447:
--

Anyone interested in working on this? 

> New sstable format with support for columnar layout
> ---
>
> Key: CASSANDRA-7447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: sankalp kohli
>  Labels: performance, storage
> Fix For: 4.x
>
> Attachments: ngcc-storage.odp, storage_format.pdf
>
>
> h2. Storage Format Proposal
> C* has come a long way over the past few years, and unfortunately our storage 
> format hasn't kept pace with the data models we are now encouraging people to 
> utilise. This ticket proposes a collections of storage primitives that can be 
> combined to serve these data models more optimally.
> It would probably help to first state the data model at the most abstract 
> level. We have a fixed three-tier structure: We have the partition key, the 
> clustering columns, and the data columns. Each have their own characteristics 
> and so require their own specialised treatment.
> I should note that these changes will necessarily be delivered in stages, and 
> that we will be making some assumptions about what the most useful features 
> to support initially will be. Any features not supported will require 
> sticking with the old format until we extend support to all C* functionality.
> h3. Partition Key
> * This really has two components: the partition, and the value. Although the 
> partition is primarily used to distribute across nodes, it can also be used 
> to optimise lookups for a given key within a node
> * Generally partitioning is by hash, and for the moment I want to focus this 
> ticket on the assumption that this is the case
> * Given this, it makes sense to optimise our storage format to permit O(1) 
> searching of a given partition. It may be possible to achieve this with 
> little overhead based on the fact we store the hashes in order and know they 
> are approximately randomly distributed, as this effectively forms an 
> immutable contiguous split-ordered list (see Shalev/Shavit, or 
> CASSANDRA-7282), so we only need to store an amount of data based on how 
> imperfectly distributed the hashes are, or at worst a single value per block.
> * This should completely obviate the need for a separate key-cache, which 
> will be relegated to supporting the old storage format only
> h3. Primary Key / Clustering Columns
> * Given we have a hierarchical data model, I propose the use of a 
> cache-oblivious trie
> * The main advantage of the trie is that it is extremely compact and 
> _supports optimally efficient merges with other tries_ so that we can support 
> more efficient reads when multiple sstables are touched
> * The trie will be preceded by a small amount of related data; the full 
> partition key, a timestamp epoch (for offset-encoding timestamps) and any 
> other partition level optimisation data, such as (potentially) a min/max 
> timestamp to abort merges earlier
> * Initially I propose to limit the trie to byte-order comparable data types 
> only (the number of which we can expand through translations of the important 
> types that are not currently)
> * Crucially the trie will also encapsulate any range tombstones, so that 
> these are merged early in the process and avoids re-iterating the same data
> * Results in true bidirectional streaming without having to read entire range 
> into memory
> h3. Values
> There are generally two approaches to storing rows of data: columnar, or 
> row-oriented. The above two data structures can be combined with a value 
> storage scheme that is based on either. However, given the current model we 
> have of reading large 64Kb blocks for any read, I am inclined to focus on 
> columnar support first, as this delivers order-of-magnitude benefits to those 
> users with the correct workload, while for most workloads our 64Kb blocks are 
> large enough to store row-oriented data in a column-oriented fashion without 
> any performance degradation (I'm happy to consign very large row support to 
> phase 2). 
> Since we will most likely target both behaviours eventually, I am currently 
> inclined to suggest that static columns, sets and maps be targeted for a 
> row-oriented release, as they don't naturally fit in a columnar layout 
> without secondary heap-blocks. This may be easier than delivering heap-blocks 
> also, as it keeps both implementations relatively clean. This is certainly 
> open to debate, and I have no doubt there will be conflicting opinions here.
> Focusing on our columnar layout, the goals are:
> * Support sparse and 

[jira] [Assigned] (CASSANDRA-7447) New sstable format with support for columnar layout

2017-05-15 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-7447:


Assignee: sankalp kohli

> New sstable format with support for columnar layout
> ---
>
> Key: CASSANDRA-7447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: sankalp kohli
>  Labels: performance, storage
> Fix For: 4.x
>
> Attachments: ngcc-storage.odp, storage_format.pdf
>
>
> h2. Storage Format Proposal
> C* has come a long way over the past few years, and unfortunately our storage 
> format hasn't kept pace with the data models we are now encouraging people to 
> utilise. This ticket proposes a collections of storage primitives that can be 
> combined to serve these data models more optimally.
> It would probably help to first state the data model at the most abstract 
> level. We have a fixed three-tier structure: We have the partition key, the 
> clustering columns, and the data columns. Each have their own characteristics 
> and so require their own specialised treatment.
> I should note that these changes will necessarily be delivered in stages, and 
> that we will be making some assumptions about what the most useful features 
> to support initially will be. Any features not supported will require 
> sticking with the old format until we extend support to all C* functionality.
> h3. Partition Key
> * This really has two components: the partition, and the value. Although the 
> partition is primarily used to distribute across nodes, it can also be used 
> to optimise lookups for a given key within a node
> * Generally partitioning is by hash, and for the moment I want to focus this 
> ticket on the assumption that this is the case
> * Given this, it makes sense to optimise our storage format to permit O(1) 
> searching of a given partition. It may be possible to achieve this with 
> little overhead based on the fact we store the hashes in order and know they 
> are approximately randomly distributed, as this effectively forms an 
> immutable contiguous split-ordered list (see Shalev/Shavit, or 
> CASSANDRA-7282), so we only need to store an amount of data based on how 
> imperfectly distributed the hashes are, or at worst a single value per block.
> * This should completely obviate the need for a separate key-cache, which 
> will be relegated to supporting the old storage format only
> h3. Primary Key / Clustering Columns
> * Given we have a hierarchical data model, I propose the use of a 
> cache-oblivious trie
> * The main advantage of the trie is that it is extremely compact and 
> _supports optimally efficient merges with other tries_ so that we can support 
> more efficient reads when multiple sstables are touched
> * The trie will be preceded by a small amount of related data; the full 
> partition key, a timestamp epoch (for offset-encoding timestamps) and any 
> other partition level optimisation data, such as (potentially) a min/max 
> timestamp to abort merges earlier
> * Initially I propose to limit the trie to byte-order comparable data types 
> only (the number of which we can expand through translations of the important 
> types that are not currently)
> * Crucially the trie will also encapsulate any range tombstones, so that 
> these are merged early in the process and avoids re-iterating the same data
> * Results in true bidirectional streaming without having to read entire range 
> into memory
> h3. Values
> There are generally two approaches to storing rows of data: columnar, or 
> row-oriented. The above two data structures can be combined with a value 
> storage scheme that is based on either. However, given the current model we 
> have of reading large 64Kb blocks for any read, I am inclined to focus on 
> columnar support first, as this delivers order-of-magnitude benefits to those 
> users with the correct workload, while for most workloads our 64Kb blocks are 
> large enough to store row-oriented data in a column-oriented fashion without 
> any performance degradation (I'm happy to consign very large row support to 
> phase 2). 
> Since we will most likely target both behaviours eventually, I am currently 
> inclined to suggest that static columns, sets and maps be targeted for a 
> row-oriented release, as they don't naturally fit in a columnar layout 
> without secondary heap-blocks. This may be easier than delivering heap-blocks 
> also, as it keeps both implementations relatively clean. This is certainly 
> open to debate, and I have no doubt there will be conflicting opinions here.
> Focusing on our columnar layout, the goals are:
> * Support sparse and dense column storage
> * Efficient compression of tombstones, 

[jira] [Commented] (CASSANDRA-4650) RangeStreamer should be smarter when picking endpoints for streaming in case of N >=3 in each DC.

2017-04-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989100#comment-15989100
 ] 

sankalp kohli commented on CASSANDRA-4650:
--

+1

> RangeStreamer should be smarter when picking endpoints for streaming in case 
> of N >=3 in each DC.  
> ---
>
> Key: CASSANDRA-4650
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4650
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 1.1.5
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
>  Labels: streaming
> Fix For: 4.x
>
> Attachments: CASSANDRA-4650_trunk.txt, photo-1.JPG
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> getRangeFetchMap method in RangeStreamer should pick unique nodes to stream 
> data from when number of replicas in each DC is three or more. 
> When N>=3 in a DC, there are two options for streaming a range. Consider an 
> example of 4 nodes in one datacenter and replication factor of 3. 
> If a node goes down, it needs to recover 3 ranges of data. With current code, 
> two nodes could get selected as it orders the node by proximity. 
> We ideally will want to select 3 nodes for streaming the data. We can do this 
> by selecting unique nodes for each range.  
> Advantages:
> This will increase the performance of bootstrapping a node and will also put 
> less pressure on nodes serving the data. 
> Note: This does not affect if N < 3 in each DC as then it streams data from 
> only 2 nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-18 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972997#comment-15972997
 ] 

sankalp kohli commented on CASSANDRA-13442:
---

Regarding cost, it will help you save roughly 30% if you go from RF=6 to 4(2 in 
each DC) and more if you do this with RF=10 to 6(3 in each DC). 
It will be quite complex and we can spec it out to see if it is worth the time. 

> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-17 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972055#comment-15972055
 ] 

sankalp kohli commented on CASSANDRA-13442:
---

[~tjake]  We need to make changes to bootstrap and other operations to know 
about this. 
This will also involve changes in the resolver to know which is full replica vs 
partial replica and make best use of that. 

> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-17 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972051#comment-15972051
 ] 

sankalp kohli commented on CASSANDRA-13442:
---

This will be used with multiple DC having 2 full replicas each. With minimum 2 
DCs, you will still have 4 full replicas. Also it can be used with 3 full 
replicas and 2 partial which can be used for use cases with RF=5 in each DC. 

It will be opt in at the keyspace level and won't affect anyone who dont want 
to use it. 
 

> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-17 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971858#comment-15971858
 ] 

sankalp kohli commented on CASSANDRA-13442:
---

cc [~jbellis] What do you think about this idea? 

> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13391) nodetool clearsnapshot should require --all to clear all snapshots

2017-03-29 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947896#comment-15947896
 ] 

sankalp kohli commented on CASSANDRA-13391:
---

+1 on the idea. 

> nodetool clearsnapshot should require --all to clear all snapshots
> --
>
> Key: CASSANDRA-13391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13391
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>
> Deleting all snapshots by default is insanely dangerous.  It would be like if 
> rm's default was -rf /.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-11530) Remove deprecated repair method in 4.0

2017-03-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932203#comment-15932203
 ] 

sankalp kohli commented on CASSANDRA-11530:
---

This does not remove any repair options but only deprecated method correct? 

> Remove deprecated repair method in 4.0
> --
>
> Key: CASSANDRA-11530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11530
> Project: Cassandra
>  Issue Type: Task
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 4.x
>
>
> Once we hit 4.0, we should remove all deprecated repair JMX API.
> (nodetool has been using only new API since it is introduced.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13231) org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing

2017-03-02 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13231:
--
Status: Patch Available  (was: Open)

> org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing
> ---
>
> Key: CASSANDRA-13231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 674.diff
>
>
> The testStandardDirs(org.apache.cassandra.db.DirectoriesTest) unit test 
> always fails. This appears to be due to a commit by Yuki for CASSANDRA-10587 
> which switched the SSTable descriptor to use the canonical path.
> From one of Yuki's comments in CASSANDRA-10587:
> "I ended up fixing Descriptor object to always have canonical path as its 
> directory.
> This way we don't need to think about given directory is relative or absolute.
> In fact, right now Desctiptor (and corresponding SSTable) is not considered 
> equal between Descriptor's directory being relative and absolute. (Added 
> simple unit test to DescriptorTest)."
> The issue here is that canonical path will expand out differently than even 
> absolute path. In this case /var/folders -> /private/var/folders. The unit 
> test is looking for /var/folders/... but the Descriptor expands out to 
> /private/var/folders and the unit test fails.
> Descriptor#L88 seems to be the real root cause.
>[junit] Testcase: 
> testStandardDirs(org.apache.cassandra.db.DirectoriesTest):   FAILED
> [junit] 
> expected:
>  but 
> was:
> [junit] junit.framework.AssertionFailedError: 
> expected:
>  but 
> was:
> [junit]   at 
> org.apache.cassandra.db.DirectoriesTest.testStandardDirs(DirectoriesTest.java:159)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.db.DirectoriesTest FAILED
> I'm guessing given we went to canonicalPath() on purpose the "fix" here is to 
> call .getCanonicalFile() on both expected Files generated (snapshotDir and 
> backupsDir) for the junit assert.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13231) org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing

2017-03-02 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-13231:
-

Assignee: Michael Kjellman

> org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing
> ---
>
> Key: CASSANDRA-13231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 674.diff
>
>
> The testStandardDirs(org.apache.cassandra.db.DirectoriesTest) unit test 
> always fails. This appears to be due to a commit by Yuki for CASSANDRA-10587 
> which switched the SSTable descriptor to use the canonical path.
> From one of Yuki's comments in CASSANDRA-10587:
> "I ended up fixing Descriptor object to always have canonical path as its 
> directory.
> This way we don't need to think about given directory is relative or absolute.
> In fact, right now Desctiptor (and corresponding SSTable) is not considered 
> equal between Descriptor's directory being relative and absolute. (Added 
> simple unit test to DescriptorTest)."
> The issue here is that canonical path will expand out differently than even 
> absolute path. In this case /var/folders -> /private/var/folders. The unit 
> test is looking for /var/folders/... but the Descriptor expands out to 
> /private/var/folders and the unit test fails.
> Descriptor#L88 seems to be the real root cause.
>[junit] Testcase: 
> testStandardDirs(org.apache.cassandra.db.DirectoriesTest):   FAILED
> [junit] 
> expected:
>  but 
> was:
> [junit] junit.framework.AssertionFailedError: 
> expected:
>  but 
> was:
> [junit]   at 
> org.apache.cassandra.db.DirectoriesTest.testStandardDirs(DirectoriesTest.java:159)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.db.DirectoriesTest FAILED
> I'm guessing given we went to canonicalPath() on purpose the "fix" here is to 
> call .getCanonicalFile() on both expected Files generated (snapshotDir and 
> backupsDir) for the junit assert.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-2848) Make the Client API support passing down timeouts

2017-03-01 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890894#comment-15890894
 ] 

sankalp kohli commented on CASSANDRA-2848:
--

[~slebresne] Are you still reviewing this?

> Make the Client API support passing down timeouts
> -
>
> Key: CASSANDRA-2848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2848
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Goffinet
>Assignee: Geoffrey Yu
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: 2848-trunk.txt, 2848-trunk-v2.txt
>
>
> Having a max server RPC timeout is good for worst case, but many applications 
> that have middleware in front of Cassandra, might have higher timeout 
> requirements. In a fail fast environment, if my application starting at say 
> the front-end, only has 20ms to process a request, and it must connect to X 
> services down the stack, by the time it hits Cassandra, we might only have 
> 10ms. I propose we provide the ability to specify the timeout on each call we 
> do optionally.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13233) no libsigar-universal64-macosx.dylib in java.library.path

2017-02-24 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13233:
--
Assignee: Michael Kjellman
  Status: Patch Available  (was: Open)

> no libsigar-universal64-macosx.dylib in java.library.path
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff
>
>
> The changes introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-7838 (Resolved; Fixed; 2.2.0 
> beta 1): "Warn user when OS settings are poor / integrate sigar" are not Mac 
> friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of the native methods except posix_fadvise are supported.
> # Replace sigar usage to get current pid with calls to CLibrary/native 
> equivalent -- and fall back to Sigar for platforms like Windows who don't 
> have that support with JDK8 (and without a CLibrary equivalent)

[jira] [Updated] (CASSANDRA-13205) Hint related logging should include the IP address of the destination in addition to host ID

2017-02-09 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13205:
--
Reviewer: Blake Eggleston

> Hint related logging should include the IP address of the destination in 
> addition to host ID
> 
>
> Key: CASSANDRA-13205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13205
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
>Priority: Trivial
> Fix For: 3.0.x
>
>
> After the hint rewrite in 3.0, many of the hint related logs now use hostId 
> UUIDs rather than endpoint addresses. This complicates debugging 
> unnecessarily. We should include both.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13204) Thread Leak in OutboundTcpConnection

2017-02-09 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-13204:
-

Assignee: Jason Brown

> Thread Leak in OutboundTcpConnection
> 
>
> Key: CASSANDRA-13204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13204
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: Jason Brown
>
> We found threads leaking from OutboundTcpConnection to machines which are not 
> part of the cluster and still in Gossip for some reason. There are two issues 
> here, this JIRA will cover the second one which is most important. 
> 1) First issue is that Gossip has information about machines not in the ring 
> which has been replaced out. It causes Cassandra to connect to those machines 
> but due to internode auth, it wont be able to connect to them at the socket 
> level.  
> 2) Second issue is a race between creating a connection and closing a 
> connections which is triggered by the gossip bug explained above. Let me try 
> to explain it using the code
> In OutboundTcpConnection, we are calling closeSocket(true) which will set 
> isStopped=true and also put a close sentinel into the queue to exit the 
> thread. On the ack connection, Gossip tries to send a message which calls 
> connect() which will block for 10 seconds which is RPC timeout. The reason we 
> will block is because Cassandra might not be running there or internode auth 
> will not let it connect. During this 10 seconds, if Gossip calls closeSocket, 
> it will put close sentinel into the queue. When we return from the connect 
> method after 10 seconds, we will clear the backlog queue causing this thread 
> to leak. 
> Proofs from the heap dump of the affected machine which is leaking threads 
> 1. Only ack connection is leaking and not the command connection which is not 
> used by Gossip. 
> 2. We see thread blocked on the backlog queue, isStopped=true and backlog 
> queue is empty. This is happening on the threads which have already leaked. 
> 3. A running thread was blocked on the connect waiting for timeout(10 
> seconds) and we see backlog queue to contain the close sentinel. Once the 
> connect will return false, we will clear the backlog and this thread will 
> have leaked.  
> Interesting bits from j stack 
> 1282 number of threads for "MessagingService-Outgoing-/"
> Thread which is about to leak:
> "MessagingService-Outgoing-/" 
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:454)
>   at sun.nio.ch.Net.connect(Net.java:446)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>   - locked <> (a java.lang.Object)
>   - locked <> (a java.lang.Object)
>   - locked <> (a java.lang.Object)
>   at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:137)
>   at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:119)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:381)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:217)
> Thread already leaked:
> "MessagingService-Outgoing-/"
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.cassandra.utils.CoalescingStrategies$DisabledCoalescingStrategy.coalesceInternal(CoalescingStrategies.java:482)
>   at 
> org.apache.cassandra.utils.CoalescingStrategies$CoalescingStrategy.coalesce(CoalescingStrategies.java:213)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:190)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13204) Thread Leak in OutboundTcpConnection

2017-02-09 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-13204:
-

Assignee: (was: Jason Brown)

> Thread Leak in OutboundTcpConnection
> 
>
> Key: CASSANDRA-13204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13204
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We found threads leaking from OutboundTcpConnection to machines which are not 
> part of the cluster and still in Gossip for some reason. There are two issues 
> here, this JIRA will cover the second one which is most important. 
> 1) First issue is that Gossip has information about machines not in the ring 
> which has been replaced out. It causes Cassandra to connect to those machines 
> but due to internode auth, it wont be able to connect to them at the socket 
> level.  
> 2) Second issue is a race between creating a connection and closing a 
> connections which is triggered by the gossip bug explained above. Let me try 
> to explain it using the code
> In OutboundTcpConnection, we are calling closeSocket(true) which will set 
> isStopped=true and also put a close sentinel into the queue to exit the 
> thread. On the ack connection, Gossip tries to send a message which calls 
> connect() which will block for 10 seconds which is RPC timeout. The reason we 
> will block is because Cassandra might not be running there or internode auth 
> will not let it connect. During this 10 seconds, if Gossip calls closeSocket, 
> it will put close sentinel into the queue. When we return from the connect 
> method after 10 seconds, we will clear the backlog queue causing this thread 
> to leak. 
> Proofs from the heap dump of the affected machine which is leaking threads 
> 1. Only ack connection is leaking and not the command connection which is not 
> used by Gossip. 
> 2. We see thread blocked on the backlog queue, isStopped=true and backlog 
> queue is empty. This is happening on the threads which have already leaked. 
> 3. A running thread was blocked on the connect waiting for timeout(10 
> seconds) and we see backlog queue to contain the close sentinel. Once the 
> connect will return false, we will clear the backlog and this thread will 
> have leaked.  
> Interesting bits from j stack 
> 1282 number of threads for "MessagingService-Outgoing-/"
> Thread which is about to leak:
> "MessagingService-Outgoing-/" 
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:454)
>   at sun.nio.ch.Net.connect(Net.java:446)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>   - locked <> (a java.lang.Object)
>   - locked <> (a java.lang.Object)
>   - locked <> (a java.lang.Object)
>   at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:137)
>   at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:119)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:381)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:217)
> Thread already leaked:
> "MessagingService-Outgoing-/"
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.cassandra.utils.CoalescingStrategies$DisabledCoalescingStrategy.coalesceInternal(CoalescingStrategies.java:482)
>   at 
> org.apache.cassandra.utils.CoalescingStrategies$CoalescingStrategy.coalesce(CoalescingStrategies.java:213)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:190)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13204) Thread Leak in OutboundTcpConnection

2017-02-09 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-13204:
-

Assignee: Jason Brown

> Thread Leak in OutboundTcpConnection
> 
>
> Key: CASSANDRA-13204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13204
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: Jason Brown
>
> We found threads leaking from OutboundTcpConnection to machines which are not 
> part of the cluster and still in Gossip for some reason. There are two issues 
> here, this JIRA will cover the second one which is most important. 
> 1) First issue is that Gossip has information about machines not in the ring 
> which has been replaced out. It causes Cassandra to connect to those machines 
> but due to internode auth, it wont be able to connect to them at the socket 
> level.  
> 2) Second issue is a race between creating a connection and closing a 
> connections which is triggered by the gossip bug explained above. Let me try 
> to explain it using the code
> In OutboundTcpConnection, we are calling closeSocket(true) which will set 
> isStopped=true and also put a close sentinel into the queue to exit the 
> thread. On the ack connection, Gossip tries to send a message which calls 
> connect() which will block for 10 seconds which is RPC timeout. The reason we 
> will block is because Cassandra might not be running there or internode auth 
> will not let it connect. During this 10 seconds, if Gossip calls closeSocket, 
> it will put close sentinel into the queue. When we return from the connect 
> method after 10 seconds, we will clear the backlog queue causing this thread 
> to leak. 
> Proofs from the heap dump of the affected machine which is leaking threads 
> 1. Only ack connection is leaking and not the command connection which is not 
> used by Gossip. 
> 2. We see thread blocked on the backlog queue, isStopped=true and backlog 
> queue is empty. This is happening on the threads which have already leaked. 
> 3. A running thread was blocked on the connect waiting for timeout(10 
> seconds) and we see backlog queue to contain the close sentinel. Once the 
> connect will return false, we will clear the backlog and this thread will 
> have leaked.  
> Interesting bits from j stack 
> 1282 number of threads for "MessagingService-Outgoing-/"
> Thread which is about to leak:
> "MessagingService-Outgoing-/" 
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:454)
>   at sun.nio.ch.Net.connect(Net.java:446)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>   - locked <> (a java.lang.Object)
>   - locked <> (a java.lang.Object)
>   - locked <> (a java.lang.Object)
>   at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:137)
>   at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:119)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:381)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:217)
> Thread already leaked:
> "MessagingService-Outgoing-/"
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.cassandra.utils.CoalescingStrategies$DisabledCoalescingStrategy.coalesceInternal(CoalescingStrategies.java:482)
>   at 
> org.apache.cassandra.utils.CoalescingStrategies$CoalescingStrategy.coalesce(CoalescingStrategies.java:213)
>   at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:190)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13204) Thread Leak in OutboundTcpConnection

2017-02-09 Thread sankalp kohli (JIRA)
sankalp kohli created CASSANDRA-13204:
-

 Summary: Thread Leak in OutboundTcpConnection
 Key: CASSANDRA-13204
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13204
 Project: Cassandra
  Issue Type: Bug
Reporter: sankalp kohli


We found threads leaking from OutboundTcpConnection to machines which are not 
part of the cluster and still in Gossip for some reason. There are two issues 
here, this JIRA will cover the second one which is most important. 



1) First issue is that Gossip has information about machines not in the ring 
which has been replaced out. It causes Cassandra to connect to those machines 
but due to internode auth, it wont be able to connect to them at the socket 
level.  

2) Second issue is a race between creating a connection and closing a 
connections which is triggered by the gossip bug explained above. Let me try to 
explain it using the code

In OutboundTcpConnection, we are calling closeSocket(true) which will set 
isStopped=true and also put a close sentinel into the queue to exit the thread. 
On the ack connection, Gossip tries to send a message which calls connect() 
which will block for 10 seconds which is RPC timeout. The reason we will block 
is because Cassandra might not be running there or internode auth will not let 
it connect. During this 10 seconds, if Gossip calls closeSocket, it will put 
close sentinel into the queue. When we return from the connect method after 10 
seconds, we will clear the backlog queue causing this thread to leak. 

Proofs from the heap dump of the affected machine which is leaking threads 
1. Only ack connection is leaking and not the command connection which is not 
used by Gossip. 
2. We see thread blocked on the backlog queue, isStopped=true and backlog queue 
is empty. This is happening on the threads which have already leaked. 
3. A running thread was blocked on the connect waiting for timeout(10 seconds) 
and we see backlog queue to contain the close sentinel. Once the connect will 
return false, we will clear the backlog and this thread will have leaked.  


Interesting bits from j stack 
1282 number of threads for "MessagingService-Outgoing-/"

Thread which is about to leak:
"MessagingService-Outgoing-/" 
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
- locked <> (a java.lang.Object)
- locked <> (a java.lang.Object)
- locked <> (a java.lang.Object)
at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:137)
at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:119)
at 
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:381)
at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:217)

Thread already leaked:
"MessagingService-Outgoing-/"
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
org.apache.cassandra.utils.CoalescingStrategies$DisabledCoalescingStrategy.coalesceInternal(CoalescingStrategies.java:482)
at 
org.apache.cassandra.utils.CoalescingStrategies$CoalescingStrategy.coalesce(CoalescingStrategies.java:213)
at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:190)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-7544) Allow storage port to be configurable per node

2017-02-01 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849445#comment-15849445
 ] 

sankalp kohli commented on CASSANDRA-7544:
--

[~brandon.williams] I see that you are assigned the reviewer of this JIRA way 
back in 2014. Will you be reviewing this or we find someone else?

> Allow storage port to be configurable per node
> --
>
> Key: CASSANDRA-7544
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7544
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sam Overton
>Assignee: Ariel Weisberg
> Fix For: 3.x
>
>
> Currently storage_port must be configured identically on all nodes in a 
> cluster and it is assumed that this is the case when connecting to a remote 
> node.
> This prevents running in any environment that requires multiple nodes to be 
> able to bind to the same network interface, such as with many automatic 
> provisioning/deployment frameworks.
> The current solutions seems to be
> * use a separate network interface for each node deployed to the same box. 
> This puts a big requirement on IP allocation at large scale.
> * allow multiple clusters to be provisioned from the same resource pool, but 
> restrict allocation to a maximum of one node per host from each cluster, 
> assuming each cluster is running on a different storage port.
> It would make operations much simpler in these kind of environments if the 
> environment provisioning the resources could assign the ports to be used when 
> bringing up a new node on shared hardware.
> The changes required would be at least the following:
> 1. configure seeds as IP:port instead of just IP
> 2. gossip the storage port as part of a node's ApplicationState
> 3. refer internally to nodes by hostID instead of IP, since there will be 
> multiple nodes with the same IP
> (1) & (2) are mostly trivial and I already have a patch for these. The bulk 
> of the work to enable this is (3), and I would structure this as a separate 
> pre-requisite patch. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13157) reduce overheads of tracing

2017-01-25 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13157:
--
Summary: reduce overheads of tracing  (was: reuce overheads of tracing)

> reduce overheads of tracing
> ---
>
> Key: CASSANDRA-13157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13157
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Sam Overton
>Assignee: Sam Overton
>
> Currently we store a string description for every trace line of every request 
> that gets traced, and this adds a completely unnecessary memory and IO 
> overhead. We could just store a minimal representation of the measurements 
> and join them with the human-readable descriptions later on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13012) Paxos regression from CASSANDRA-12716

2016-12-07 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730419#comment-15730419
 ] 

sankalp kohli commented on CASSANDRA-13012:
---

IN which JIRA this issues was added...Do you remember? 

> Paxos regression from CASSANDRA-12716
> -
>
> Key: CASSANDRA-13012
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13012
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Minor
>
> I introduced a dumb bug when reading the Paxos state in 
> {{SystemKeyspace.loadPaxosState}} where the new condition on 
> {{proposal_version}} and {{most_recent_commit_version}} is obviously way too 
> strong, and actually entirely unnecessary.
> This is consistently breaking the 
> {{paxos_tests.TestPaxos.contention_test_many_threads}} so I'm not sure why I 
> didn't caught that, sorry. Thanks to [~jkni] who noticed that first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8217) Fail snapshot based repair if Merkle tree does not use the same snapshot

2016-11-29 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-8217:
-
Assignee: (was: sankalp kohli)

> Fail snapshot based repair if Merkle tree does not use the same snapshot
> 
>
> Key: CASSANDRA-8217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8217
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: sankalp kohli
>Priority: Minor
>
> Snapshot based repair rely on having a snapshot on each replica. If for some 
> reason, snapshot is not found or is different, we should fail the repair. 
> This will avoid streaming lot of data around.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-17 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12918:
--
Status: Patch Available  (was: Open)

> Create hint directory in FailureDetectorTest
> 
>
> Key: CASSANDRA-12918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12918_trunk.diff
>
>
> We see the below exception while running the FailureDetectorTest
> FSReadError in build/test/cassandra/hints:181
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
>   at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
>   at 
> org.apache.cassandra.hints.HintsService.(HintsService.java:63)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
>   at 
> org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
> Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>   at java.nio.file.Files.newDirectoryStream(Files.java:457)
>   at java.nio.file.Files.list(Files.java:3451)
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-17 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12918:
--
Status: Open  (was: Ready to Commit)

> Create hint directory in FailureDetectorTest
> 
>
> Key: CASSANDRA-12918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12918_trunk.diff
>
>
> We see the below exception while running the FailureDetectorTest
> FSReadError in build/test/cassandra/hints:181
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
>   at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
>   at 
> org.apache.cassandra.hints.HintsService.(HintsService.java:63)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
>   at 
> org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
> Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>   at java.nio.file.Files.newDirectoryStream(Files.java:457)
>   at java.nio.file.Files.list(Files.java:3451)
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-17 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12918:
--
Status: Ready to Commit  (was: Patch Available)

> Create hint directory in FailureDetectorTest
> 
>
> Key: CASSANDRA-12918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12918_trunk.diff
>
>
> We see the below exception while running the FailureDetectorTest
> FSReadError in build/test/cassandra/hints:181
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
>   at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
>   at 
> org.apache.cassandra.hints.HintsService.(HintsService.java:63)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
>   at 
> org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
> Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>   at java.nio.file.Files.newDirectoryStream(Files.java:457)
>   at java.nio.file.Files.list(Files.java:3451)
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-16 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12918:
--
Attachment: CASSANDRA-12918_trunk.diff

> Create hint directory in FailureDetectorTest
> 
>
> Key: CASSANDRA-12918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12918_trunk.diff
>
>
> We see the below exception while running the FailureDetectorTest
> FSReadError in build/test/cassandra/hints:181
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
>   at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
>   at 
> org.apache.cassandra.hints.HintsService.(HintsService.java:63)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
>   at 
> org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
> Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>   at java.nio.file.Files.newDirectoryStream(Files.java:457)
>   at java.nio.file.Files.list(Files.java:3451)
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-16 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12918:
--
Status: Patch Available  (was: Open)

> Create hint directory in FailureDetectorTest
> 
>
> Key: CASSANDRA-12918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12918_trunk.diff
>
>
> We see the below exception while running the FailureDetectorTest
> FSReadError in build/test/cassandra/hints:181
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
>   at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
>   at 
> org.apache.cassandra.hints.HintsService.(HintsService.java:63)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
>   at 
> org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
> Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>   at java.nio.file.Files.newDirectoryStream(Files.java:457)
>   at java.nio.file.Files.list(Files.java:3451)
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-16 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-12918:
-

Assignee: sankalp kohli

> Create hint directory in FailureDetectorTest
> 
>
> Key: CASSANDRA-12918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
>
> We see the below exception while running the FailureDetectorTest
> FSReadError in build/test/cassandra/hints:181
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
>   at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
>   at 
> org.apache.cassandra.hints.HintsService.(HintsService.java:63)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
>   at 
> org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
> Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>   at java.nio.file.Files.newDirectoryStream(Files.java:457)
>   at java.nio.file.Files.list(Files.java:3451)
>   at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12918) Create hint directory in FailureDetectorTest

2016-11-16 Thread sankalp kohli (JIRA)
sankalp kohli created CASSANDRA-12918:
-

 Summary: Create hint directory in FailureDetectorTest
 Key: CASSANDRA-12918
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12918
 Project: Cassandra
  Issue Type: Bug
Reporter: sankalp kohli
Priority: Trivial


We see the below exception while running the FailureDetectorTest

FSReadError in build/test/cassandra/hints:181
at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:70)
at org.apache.cassandra.hints.HintsService.(HintsService.java:88)
at 
org.apache.cassandra.hints.HintsService.(HintsService.java:63)
at 
org.apache.cassandra.service.StorageService.excise(StorageService.java:2239)
at 
org.apache.cassandra.service.StorageService.excise(StorageService.java:2252)
at 
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2156)
at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1744)
at 
org.apache.cassandra.gms.FailureDetectorTest.testConvictAfterLeft(FailureDetectorTest.java:75)
Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/hints:181
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.list(Files.java:3451)
at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:62)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12786) Fix a bug in CASSANDRA-11005(Split consisten range movement flag)

2016-10-21 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596455#comment-15596455
 ] 

sankalp kohli commented on CASSANDRA-12786:
---

Yes..please commit it. 

> Fix a bug in CASSANDRA-11005(Split consisten range movement flag)
> -
>
> Key: CASSANDRA-12786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12786
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
> Attachments: CASSANDRA-12786.txt
>
>
> I missed a place in the code where we need to split this flag for bootstrap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12384) Include info about sstable on "Compacting large row” message

2016-10-14 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576131#comment-15576131
 ] 

sankalp kohli edited comment on CASSANDRA-12384 at 10/14/16 6:37 PM:
-

Attached v2 and we dont need Dtest changes


was (Author: kohlisankalp):
If you ignore the Dtest attached...It will work. 

> Include info about sstable on "Compacting large row” message
> 
>
> Key: CASSANDRA-12384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12384
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12384_3.0.txt, CASSANDRA-12384_trunk.txt, 
> CASSANDRA_12384_v2.diff
>
>
> On a message like this one, it would be helpful to understand which sstable 
> this large row is going in
> Compacting large row abc/xyz:38956kjhawf (xyz bytes) incrementally



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12384) Include info about sstable on "Compacting large row” message

2016-10-14 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12384:
--
Attachment: CASSANDRA_12384_v2.diff

> Include info about sstable on "Compacting large row” message
> 
>
> Key: CASSANDRA-12384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12384
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12384_3.0.txt, CASSANDRA-12384_trunk.txt, 
> CASSANDRA_12384_v2.diff
>
>
> On a message like this one, it would be helpful to understand which sstable 
> this large row is going in
> Compacting large row abc/xyz:38956kjhawf (xyz bytes) incrementally



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12384) Include info about sstable on "Compacting large row” message

2016-10-14 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12384:
--
Attachment: (was: Dtest.txt)

> Include info about sstable on "Compacting large row” message
> 
>
> Key: CASSANDRA-12384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12384
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12384_3.0.txt, CASSANDRA-12384_trunk.txt, 
> CASSANDRA_12384_v2.diff
>
>
> On a message like this one, it would be helpful to understand which sstable 
> this large row is going in
> Compacting large row abc/xyz:38956kjhawf (xyz bytes) incrementally



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12384) Include info about sstable on "Compacting large row” message

2016-10-14 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576131#comment-15576131
 ] 

sankalp kohli commented on CASSANDRA-12384:
---

If you ignore the Dtest attached...It will work. 

> Include info about sstable on "Compacting large row” message
> 
>
> Key: CASSANDRA-12384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12384
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Trivial
> Attachments: CASSANDRA-12384_3.0.txt, CASSANDRA-12384_trunk.txt, 
> Dtest.txt
>
>
> On a message like this one, it would be helpful to understand which sstable 
> this large row is going in
> Compacting large row abc/xyz:38956kjhawf (xyz bytes) incrementally



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12786) Fix a bug in CASSANDRA-11005(Split consisten range movement flag)

2016-10-13 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12786:
--
Status: Patch Available  (was: Open)

> Fix a bug in CASSANDRA-11005(Split consisten range movement flag)
> -
>
> Key: CASSANDRA-12786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12786
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
> Attachments: CASSANDRA-12786.txt
>
>
> I missed a place in the code where we need to split this flag for bootstrap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12786) Fix a bug in CASSANDRA-11005(Split consisten range movement flag)

2016-10-13 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12786:
--
Attachment: CASSANDRA-12786.txt

> Fix a bug in CASSANDRA-11005(Split consisten range movement flag)
> -
>
> Key: CASSANDRA-12786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12786
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
> Attachments: CASSANDRA-12786.txt
>
>
> I missed a place in the code where we need to split this flag for bootstrap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12786) Fix a bug in CASSANDRA-11005(Split consisten range movement flag)

2016-10-13 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12786:
--
Description: I missed a place in the code where we need to split this flag 
for bootstrap  (was: I missed a place in the code where we need to split this 
flag. Also cleaning up the consistent range movement flag. )

> Fix a bug in CASSANDRA-11005(Split consisten range movement flag)
> -
>
> Key: CASSANDRA-12786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12786
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
>
> I missed a place in the code where we need to split this flag for bootstrap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12786) Fix a bug in CASSANDRA-11005(Split consisten range movement flag)

2016-10-13 Thread sankalp kohli (JIRA)
sankalp kohli created CASSANDRA-12786:
-

 Summary: Fix a bug in CASSANDRA-11005(Split consisten range 
movement flag)
 Key: CASSANDRA-12786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12786
 Project: Cassandra
  Issue Type: Bug
Reporter: sankalp kohli
Priority: Minor


I missed a place in the code where we need to split this flag. Also cleaning up 
the consistent range movement flag. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-12786) Fix a bug in CASSANDRA-11005(Split consisten range movement flag)

2016-10-13 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-12786:
-

Assignee: sankalp kohli

> Fix a bug in CASSANDRA-11005(Split consisten range movement flag)
> -
>
> Key: CASSANDRA-12786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12786
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
>
> I missed a place in the code where we need to split this flag. Also cleaning 
> up the consistent range movement flag. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5988) Make hint TTL customizable

2016-10-13 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572584#comment-15572584
 ] 

sankalp kohli commented on CASSANDRA-5988:
--

Thanks [~iamaleksey]. 

> Make hint TTL customizable
> --
>
> Key: CASSANDRA-5988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5988
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Oleg Kibirev
>Assignee: Vishy Kasar
>  Labels: patch
> Fix For: 1.2.12, 2.0.3
>
> Attachments: 5988.txt
>
>
> Currently time to live for stored hints is hardcoded to be gc_grace_seconds. 
> This causes problems for applications using backdated deletes as a form of 
> optimistic locking. Hints for updates made to the same data on which delete 
> was attempted can persist for days, making it impossible to determine if 
> delete succeeded by doing read(ALL) after a reasonable delay. We need a way 
> to explicitly configure hint TTL, either through schema parameter or through 
> a yaml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5988) Make hint TTL customizable

2016-10-12 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570311#comment-15570311
 ] 

sankalp kohli commented on CASSANDRA-5988:
--

Without hintTTL, if we replay data older than GC grace, that will bring back 
data right? If it is not there in 3.0, it should be fixed as Major if not 
blocker? 

> Make hint TTL customizable
> --
>
> Key: CASSANDRA-5988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5988
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Oleg Kibirev
>Assignee: Vishy Kasar
>  Labels: patch
> Fix For: 1.2.12, 2.0.3
>
> Attachments: 5988.txt
>
>
> Currently time to live for stored hints is hardcoded to be gc_grace_seconds. 
> This causes problems for applications using backdated deletes as a form of 
> optimistic locking. Hints for updates made to the same data on which delete 
> was attempted can persist for days, making it impossible to determine if 
> delete succeeded by doing read(ALL) after a reasonable delay. We need a way 
> to explicitly configure hint TTL, either through schema parameter or through 
> a yaml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5988) Make hint TTL customizable

2016-10-11 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567501#comment-15567501
 ] 

sankalp kohli commented on CASSANDRA-5988:
--

[~iamaleksey] I could not search "cassandra.maxHintTTL" in 3.0.9. With new 
hints in 3.0, how can we change this?

> Make hint TTL customizable
> --
>
> Key: CASSANDRA-5988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5988
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Oleg Kibirev
>Assignee: Vishy Kasar
>  Labels: patch
> Fix For: 1.2.12, 2.0.3
>
> Attachments: 5988.txt
>
>
> Currently time to live for stored hints is hardcoded to be gc_grace_seconds. 
> This causes problems for applications using backdated deletes as a form of 
> optimistic locking. Hints for updates made to the same data on which delete 
> was attempted can persist for days, making it impossible to determine if 
> delete succeeded by doing read(ALL) after a reasonable delay. We need a way 
> to explicitly configure hint TTL, either through schema parameter or through 
> a yaml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11968) More metrics on native protocol requests & responses

2016-10-10 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563751#comment-15563751
 ] 

sankalp kohli commented on CASSANDRA-11968:
---

[~snazy] Any updates here.

> More metrics on native protocol requests & responses
> 
>
> Key: CASSANDRA-11968
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11968
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> Proposal to add more metrics to the native protocol:
> - number of requests per request-type
> - number of responses by response-type
> - size of request messages in bytes
> - size of response messages in bytes
> - number of in-flight requests (from request arrival to response)
> (Will provide a patch soon)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-29 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534159#comment-15534159
 ] 

sankalp kohli commented on CASSANDRA-12681:
---

What API is it breaking? It stops users from accidental outages right. 

> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Fix For: 3.0.10, 3.10
>
> Attachments: trunkpatch.diff, v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-23 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517938#comment-15517938
 ] 

sankalp kohli commented on CASSANDRA-12681:
---

[~jjirsa] Can you please review this ?

> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Attachments: trunkpatch.diff, v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505044#comment-15505044
 ] 

sankalp kohli commented on CASSANDRA-12668:
---

We tried the fanout = 4 in tests and found no change. I will update once we 
test the actual use case. 

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504919#comment-15504919
 ] 

sankalp kohli edited comment on CASSANDRA-12668 at 9/19/16 10:48 PM:
-

"Lower throughput != unusable"

The cluster was working in 2.0 and is no longer working in 2.1. That is 
unusable. Applications needs a certain throughput otherwise they will have ever 
increasing backlog. 

"Everything is about trade-offs, as was the swapping of the data structure in 
the first place. There's rarely a 100% free lunch."
The tradeoff where a use case is made unusable is not a trade off we should 
make. Is there any warning in NEWS.txt which says this tradeoff is made? 


"Just some fairly broad qualitative/correlative statements."
These are not statements but practical experience with a cluster which has been 
made unusable in 2.1. I will give out more information that you are asking.  


was (Author: kohlisankalp):
"Lower throughput != unusable"

The cluster was working in 2.0 and is no longer working in 2.1. That is 
unusable. Applications needs a certain throughput otherwise they will have ever 
increasing backlog. 

"Everything is about trade-offs, as was the swapping of the data structure in 
the first place. There's rarely a 100% free lunch."
The tradeoff where a use case is made unusable is not a trade off we should 
make. 


"Just some fairly broad qualitative/correlative statements."
These are not statements but practical experience with a cluster which has been 
made unusable in 2.1. I will give out more information that you are asking.  

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504919#comment-15504919
 ] 

sankalp kohli commented on CASSANDRA-12668:
---

"Lower throughput != unusable"

The cluster was working in 2.0 and is no longer working in 2.1. That is 
unusable. Applications needs a certain throughput otherwise they will have ever 
increasing backlog. 

"Everything is about trade-offs, as was the swapping of the data structure in 
the first place. There's rarely a 100% free lunch."
The tradeoff where a use case is made unusable is not a trade off we should 
make. 


"Just some fairly broad qualitative/correlative statements."
These are not statements but practical experience with a cluster which has been 
made unusable in 2.1. I will give out more information that you are asking.  

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504699#comment-15504699
 ] 

sankalp kohli edited comment on CASSANDRA-12668 at 9/19/16 9:14 PM:


The assertions are from real cluster as well but we also did the same work in 
testing as well. 

"Of course, it may well also be that for your test case it is inherently worse; 
not every use case can be improved."
I agree you cannot improve every use case but here we have made a use case 
unusable. 

The reason in lower thought-put is due to locking added in 7546. But the root 
cause is still still the memtable Btree change. 

I will try with different fan-factor and see if it helps.  


was (Author: kohlisankalp):
The assertions are from real cluster as well but we also did the same work in 
testing as well. 

"Of course, it may well also be that for your test case it is inherently worse; 
not every use case can be improved."
I agree you cannot improve every use case but here we have made a use case 
worse here. 

The reason in lower thought-put is due to locking added in 7546. But the root 
cause is still still the memtable Btree change. 

I will try with different fan-factor and see if it helps.  

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504699#comment-15504699
 ] 

sankalp kohli commented on CASSANDRA-12668:
---

The assertions are from real cluster as well but we also did the same work in 
testing as well. 

"Of course, it may well also be that for your test case it is inherently worse; 
not every use case can be improved."
I agree you cannot improve every use case but here we have made a use case 
worse here. 

The reason in lower thought-put is due to locking added in 7546. But the root 
cause is still still the memtable Btree change. 

I will try with different fan-factor and see if it helps.  

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504501#comment-15504501
 ] 

sankalp kohli commented on CASSANDRA-12668:
---

By "always synchronous" I assume you mean always locking instead of using CAS? 

We did a test where you always write to a few CQL partition simultaneous to 
create contention. We have seen 2.0 has a higher throughput than 2.1 and 
looking at allocation points to this memtable issue.

Then we made the configuration changes added in  7546 to always lock and that 
reduced the throughput. 

Looking at the heap dumps did not point that memtable is smaller in 2.1 vs 2.0. 
So I dont think this is an issue. 

Apart from the testing, the only clusters this is an issue is where we have 
contention and hence this is change is an issue. 

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504471#comment-15504471
 ] 

sankalp kohli commented on CASSANDRA-12668:
---

The cluster was doing constant Java GC and was not able to stay up. 

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504471#comment-15504471
 ] 

sankalp kohli edited comment on CASSANDRA-12668 at 9/19/16 7:49 PM:


The cluster was doing constant Java GC and was not able to stay up. Is there 
anything else you are looking for?


was (Author: kohlisankalp):
The cluster was doing constant Java GC and was not able to stay up. 

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-12668:
--
Reproduced In: 2.1.15

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)
sankalp kohli created CASSANDRA-12668:
-

 Summary: Memtable Contention in 2.1
 Key: CASSANDRA-12668
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
 Project: Cassandra
  Issue Type: Bug
Reporter: sankalp kohli


We added a new Btree implementation in 2.1 which causes write performance to go 
down in Cassandra if there is  lot of contention in the memtable for a CQL 
partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
cluster to fall apart due to GC. We tried making the defaults added in 
CASSANDRA-7546 configurable but that did not help. Is there anyway to fix this 
issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12668) Memtable Contention in 2.1

2016-09-19 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504422#comment-15504422
 ] 

sankalp kohli commented on CASSANDRA-12668:
---

cc [~benedict] [~brandon.williams]

> Memtable Contention in 2.1
> --
>
> Key: CASSANDRA-12668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>
> We added a new Btree implementation in 2.1 which causes write performance to 
> go down in Cassandra if there is  lot of contention in the memtable for a CQL 
> partition. Upgrading a cluster from 2.0 to 2.1 with contention causes the 
> cluster to fall apart due to GC. We tried making the defaults added in 
> CASSANDRA-7546 configurable but that did not help. Is there anyway to fix 
> this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition

2016-09-16 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497594#comment-15497594
 ] 

sankalp kohli commented on CASSANDRA-12367:
---

I think we should return like -1 if key is not replicated to the box and not 0. 
The reason is that 0 should mean the key is not there in that instance and -1 
will tell you that you are not calling the correct instances. 

> Add an API to request the size of a CQL partition
> -
>
> Key: CASSANDRA-12367
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12367
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Geoffrey Yu
>Assignee: Geoffrey Yu
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12367-trunk-v2.txt, 12367-trunk.txt
>
>
> It would be useful to have an API that we could use to get the total 
> serialized size of a CQL partition, scoped by keyspace and table, on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >