Git Push Summary

2014-02-03 Thread slebresne
Updated Tags:  refs/tags/2.0.5-tentative [deleted] 0191b359f


Git Push Summary

2014-02-03 Thread slebresne
Updated Tags:  refs/tags/2.0.5-tentative [created] b71372146


[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889330#comment-13889330
 ] 

Marcus Eriksson commented on CASSANDRA-6364:


About the ignore case, lets hard code something for now - rate limit at one log 
error message per second perhaps?

I don't think we should default to 'ignore' in Config.java - if someone does a 
minor upgrade they most likely wont check NEWS or update their config files to 
add the new parameter.

The shipped config in cassandra.yaml looks wrong, should be 
commit_failure_policy, not disk_failure_policy I guess

 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.5


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should be done to handle the commit volume 
 dying as doing nothing results in affecting the entire cluster.  I should 
 note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2014-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889343#comment-13889343
 ] 

Benedict commented on CASSANDRA-6364:
-

bq. I don't think we should default to 'ignore' in Config.java

Well, I wasn't too sure about this. On the one hand switching the default to 
stop means we could over cautiously kill user's hosts unexpectedly, maybe 
resulting in interruption of service (especially, say, our users running on 
SAN, as much as it is strongly discouraged). Whereas switching to ignore 
means we may not be durable. Neither are great defaults, but both are better 
than before. I'm comfortable with both, so if you feel strongly it should be 
stop, I'll happily switch it. Perhaps I lean slightly in favour of it too, 
but it depends on if the user favours durability over availability, really, so 
there doesn't seem a single correct answer to me. Note that the default 
disk_failure_policy is also ignore, and the prior behaviour was closest to 
ignore, so introducing a default that results in a failing node is somewhat 
unprecedented for disk failure.

bq. The shipped config in cassandra.yaml looks wrong, should be 
commit_failure_policy, not disk_failure_policy I guess

Right, looks like I didn't update the first or last lines I copy-pasted. 
Thanks. 

bq. About the ignore case, lets hard code something for now - rate limit at one 
log error message per second perhaps?

If we're just rate limiting the log messages, I'd say one per minute might be 
better. But I'm not sure having the threads spin trying to make progress is 
useful. The PCLES, for instance, will just start burning one core until it can 
successfully sync, assuming it doesn't actually have to wait each time to 
encounter the error. Tempted to have a 1s pause after an error during which we 
just sleep the erroring thread.

Another issue that slightly concerns me is what happens if the CLES sync() 
starts failing, but the append and CLA doesn't. With ignore this could 
potentially result in us mapping in and allocating huge amounts of disk space, 
but not being able to sync or clear it. This might either result in lots of 
swapping, and/or us exceeding by a large margin our max log space goal. Since 
we never guarantee to keep to this I'm not sure how much of a problem it would 
be, but an error down to ACLs that stops us syncing one file might potentally 
end up eating up huge quantities of commit disk space. I'm tempted to have the 
CLA thread block once it hits twice its goal max space (or maybe introduce a 
second config parameter for a hard maximum). But I'm also tempted to leave 
these changes for the 2.1 branch, since it's a fairly specific failure case, 
and what we have is a big improvement over the current state of affairs.



 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.5


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should 

[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit

2014-02-03 Thread Dmitry Shohov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889346#comment-13889346
 ] 

Dmitry Shohov commented on CASSANDRA-6628:
--

I checked your patch and jvm doesn't crash

I also did some performance tests. I don't know the usage pattern for 
comparator, but my simple tests show that my changes to comparer would make it 
slower than pure java implementation :) I fully agree that it's better to use 
java implementation on solaris sparcv9, than change unsafe implementation

 Cassandra crashes on Solaris sparcv9 using java 64bit
 -

 Key: CASSANDRA-6628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: checked 1.2.x line and 2.0.x
Reporter: Dmitry Shohov
Assignee: Dmitry Shohov
 Fix For: 2.0.5

 Attachments: solaris_unsafe_fix.patch, tmp.patch


 When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, 
 JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid.
 The reason for this crash is some memory allignment related problems and 
 incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, 
 you will see that it checks os.arch before using getLong methods.
 I have a patch, which check for the os.arch and if it is not one of the 
 known, it reads longs and ints byte by byte.
 Although patch fixes the problem in cassandra, it will still crash without 
 similar fixes in the lz4 library. I already provided the patch for Unsafe 
 usage in lz4.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit

2014-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889349#comment-13889349
 ] 

Benedict commented on CASSANDRA-6628:
-

bq. my changes to comparer would make it slower than pure java implementation

This isn't very surprising given what they were doing, but always good to have 
the confirmation :-)

It should be possible to make a special unsafe comparer tailored for sparcv9 
(and any other aligned access only architectures) that is quite a bit faster, 
in the manner I mention above, but it's not something we're likely to consider 
a priority in the near future. As always feel free to have a crack at it 
yourself and submit, I'd be more than happy to review.

 Cassandra crashes on Solaris sparcv9 using java 64bit
 -

 Key: CASSANDRA-6628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: checked 1.2.x line and 2.0.x
Reporter: Dmitry Shohov
Assignee: Dmitry Shohov
 Fix For: 2.0.5

 Attachments: solaris_unsafe_fix.patch, tmp.patch


 When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, 
 JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid.
 The reason for this crash is some memory allignment related problems and 
 incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, 
 you will see that it checks os.arch before using getLong methods.
 I have a patch, which check for the os.arch and if it is not one of the 
 known, it reads longs and ints byte by byte.
 Although patch fixes the problem in cassandra, it will still crash without 
 similar fixes in the lz4 library. I already provided the patch for Unsafe 
 usage in lz4.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit

2014-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889356#comment-13889356
 ] 

Benedict commented on CASSANDRA-6628:
-

This patch is ready for commit.

 Cassandra crashes on Solaris sparcv9 using java 64bit
 -

 Key: CASSANDRA-6628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: checked 1.2.x line and 2.0.x
Reporter: Dmitry Shohov
Assignee: Dmitry Shohov
 Fix For: 2.0.5

 Attachments: solaris_unsafe_fix.patch, tmp.patch


 When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, 
 JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid.
 The reason for this crash is some memory allignment related problems and 
 incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, 
 you will see that it checks os.arch before using getLong methods.
 I have a patch, which check for the os.arch and if it is not one of the 
 known, it reads longs and ints byte by byte.
 Although patch fixes the problem in cassandra, it will still crash without 
 similar fixes in the lz4 library. I already provided the patch for Unsafe 
 usage in lz4.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family

2014-02-03 Thread Sergio Bossa (JIRA)
Sergio Bossa created CASSANDRA-6645:
---

 Summary: upgradesstables causes NPE for secondary indexes without 
an underlying column family
 Key: CASSANDRA-6645
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa


SecondaryIndex#getIndexCfs is allowed to return null by contract, if the index 
is not backed by a column family, but this causes an NPE as 
StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do not 
check for null values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family

2014-02-03 Thread Sergio Bossa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Bossa updated CASSANDRA-6645:


Attachment: CASSANDRA-6645.patch

 upgradesstables causes NPE for secondary indexes without an underlying column 
 family
 

 Key: CASSANDRA-6645
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Attachments: CASSANDRA-6645.patch


 SecondaryIndex#getIndexCfs is allowed to return null by contract, if the 
 index is not backed by a column family, but this causes an NPE as 
 StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do 
 not check for null values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Git Push Summary

2014-02-03 Thread slebresne
Updated Tags:  refs/tags/cassandra-1.2.14 [created] f9bef16ae


Git Push Summary

2014-02-03 Thread slebresne
Updated Tags:  refs/tags/1.2.14-tentative [deleted] 6a9314408


[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494
 ] 

Marcus Eriksson commented on CASSANDRA-5351:


More complete version now pushed to  
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the 
repair (if full repair, all sstables are included) otherwise only the ones with 
repairedAt set to 0. Note that we don't do any locking of the sstables, if they 
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have 
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all 
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep 
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use 
the new repairs.
* anticompaction
  - Split an sstable in 2 new ones. One sstable with all keys that were in the 
repaired ranges and one with unrepaired data.
  - If the repaired ranges cover the entire sstable, we rewrite sstable 
metadata. This means that the optimal way to run incremental repairs is to not 
do partitioner range repairs etc.
* Compaction
  * LCS
- We always first check if there are any unrepaired sstables to do STCS on, 
if there is, we do that. Reasoning being that new data (which needs compaction) 
is unrepaired.
- We keep all sstables in the LeveledManifest, then filter out the 
unrepaired ones when getting compaction candidates etc.
  * STCS
- Major compaction is done by taking the biggest set of sstables - so for a 
total major compaction, you will need to run nodetool compact twice.
- Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair 
new sstable is not repaired.
* Upgradesstables - Keep repaired status


 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494
 ] 

Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:18 PM:


More complete version now pushed to  
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the 
repair (if full repair, all sstables are included) otherwise only the ones with 
repairedAt set to 0. Note that we don't do any locking of the sstables, if they 
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have 
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all 
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep 
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use 
the new repairs.
* anticompaction
  ** Split an sstable in 2 new ones. One sstable with all keys that were in the 
repaired ranges and one with unrepaired data.
  ** If the repaired ranges cover the entire sstable, we rewrite sstable 
metadata. This means that the optimal way to run incremental repairs is to not 
do partitioner range repairs etc.
* LCS
   ** We always first check if there are any unrepaired sstables to do STCS on, 
if there is, we do that. Reasoning being that new data (which needs compaction) 
is unrepaired.
   ** We keep all sstables in the LeveledManifest, then filter out the 
unrepaired ones when getting compaction candidates etc.
* STCS
  ** Major compaction is done by taking the biggest set of sstables - so for a 
total major compaction, you will need to run nodetool compact twice.
   ** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair 
new sstable is not repaired.
* Upgradesstables - Keep repaired status



was (Author: krummas):
More complete version now pushed to  
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the 
repair (if full repair, all sstables are included) otherwise only the ones with 
repairedAt set to 0. Note that we don't do any locking of the sstables, if they 
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have 
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all 
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep 
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use 
the new repairs.
* anticompaction
  - Split an sstable in 2 new ones. One sstable with all keys that were in the 
repaired ranges and one with unrepaired data.
  - If the repaired ranges cover the entire sstable, we rewrite sstable 
metadata. This means that the optimal way to run incremental repairs is to not 
do partitioner range repairs etc.
* Compaction
  * LCS
- We always first check if there are any unrepaired sstables to do STCS on, 
if there is, we do that. Reasoning being that new data (which needs compaction) 
is unrepaired.
- We keep all sstables in the LeveledManifest, then filter out the 
unrepaired ones when getting compaction candidates etc.
  * STCS
- Major compaction is done by taking the biggest set of sstables - so for a 
total major compaction, you will need to run nodetool compact twice.
- Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair 
new sstable is not repaired.
* Upgradesstables - Keep repaired status


 Avoid repairing already-repaired data by default
 

 Key: 

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889498#comment-13889498
 ] 

Marcus Eriksson commented on CASSANDRA-5351:


if its unclear, purpose of the Prepare step is that we want to do 
anticompaction on all ranges involved in the repair, if we did not do that, we 
would have to anticompact 3 times for a nodetool repair with RF=3 (and 3*256 
times with 256 vnodes (i think))

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494
 ] 

Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:30 PM:


More complete version now pushed to  
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the 
repair (if full repair, all sstables are included) otherwise only the ones with 
repairedAt set to 0. Note that we don't do any locking of the sstables, if they 
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have 
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all 
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep 
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use 
the new repairs.
* anticompaction
  ** Split an sstable in 2 new ones. One sstable with all keys that were in the 
repaired ranges and one with unrepaired data.
  ** If the repaired ranges cover the entire sstable, we rewrite sstable 
metadata. This means that the optimal way to run incremental repairs is to not 
do partitioner range repairs etc.
* LCS
   ** We always first check if there are any unrepaired sstables to do STCS on, 
if there is, we do that. Reasoning being that new data (which needs compaction) 
is unrepaired.
   ** We keep all sstables in the LeveledManifest, then filter out the 
unrepaired ones when getting compaction candidates etc.
* STCS
  ** Major compaction is done by taking the biggest set of sstables - so for a 
total major compaction, you will need to run nodetool compact twice.
   ** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during scrub 
new sstable is not repaired.
* Upgradesstables - Keep repaired status



was (Author: krummas):
More complete version now pushed to  
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the 
repair (if full repair, all sstables are included) otherwise only the ones with 
repairedAt set to 0. Note that we don't do any locking of the sstables, if they 
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have 
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all 
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep 
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use 
the new repairs.
* anticompaction
  ** Split an sstable in 2 new ones. One sstable with all keys that were in the 
repaired ranges and one with unrepaired data.
  ** If the repaired ranges cover the entire sstable, we rewrite sstable 
metadata. This means that the optimal way to run incremental repairs is to not 
do partitioner range repairs etc.
* LCS
   ** We always first check if there are any unrepaired sstables to do STCS on, 
if there is, we do that. Reasoning being that new data (which needs compaction) 
is unrepaired.
   ** We keep all sstables in the LeveledManifest, then filter out the 
unrepaired ones when getting compaction candidates etc.
* STCS
  ** Major compaction is done by taking the biggest set of sstables - so for a 
total major compaction, you will need to run nodetool compact twice.
   ** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair 
new sstable is not repaired.
* Upgradesstables - Keep repaired status


 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351

[jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address

2014-02-03 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889546#comment-13889546
 ] 

Brandon Williams commented on CASSANDRA-6622:
-

Can you attach logs from both the replacing node and the node that is failing 
the stream session?

 Streaming session failures during node replace using replace_address
 

 Key: CASSANDRA-6622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
 Project: Cassandra
  Issue Type: Bug
 Environment: RHEL6, cassandra-2.0.4
Reporter: Ravi Prasad
Assignee: Brandon Williams
 Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 
 6622-2.0.txt


 When using replace_address, Gossiper ApplicationState is set to hibernate, 
 which is a down state. We are seeing that the peer nodes are seeing streaming 
 plan request even before the Gossiper on them marks the replacing node as 
 dead. As a result, streaming on peer nodes convicts the replacing node by 
 closing the stream handler.  
 I think, making the StorageService thread on the replacing node, sleep for 
 BROADCAST_INTERVAL before bootstrapping, would avoid this scenario.
 Relevant logs from peer node (see that the Gossiper on peer node mark the 
 replacing node as down, 2 secs after  the streaming init request):
 {noformat}
  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 
 StreamResultFuture.java (line 116) [Stream 
 #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
 
  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is 
 complete
  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) 
 InetAddress /x.x.x.x is now DOWN
 ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 
 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
 java.lang.RuntimeException: Outgoing stream handler has been closed
 at 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
 at 
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
 at 
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
 at java.lang.Thread.run(Thread.java:722)
  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with 
 /x.x.x.x is complete
  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-4851) CQL3: improve support for paginating over composites

2014-02-03 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4851:


Attachment: 4851.txt

Attaching rather simple patch for this. 

The patch implements the tuple/vector syntax described above (so {{WHERE (c1, 
c2)  (1, 0)}} typically) as that's the easier and imo the most natural syntax 
anyway when you want to do such a thing.

 CQL3: improve support for paginating over composites
 

 Key: CASSANDRA-4851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 4851.txt


 Consider the following table:
 {noformat}
 CREATE TABLE test (
 k int,
 c1 int,
 c2 int,
 PRIMARY KEY (k, c1, c2)
 )
 {noformat}
 with the following data:
 {noformat}
 k | c1 | c2
 
 0 | 0  | 0
 0 | 0  | 1
 0 | 1  | 0
 0 | 1  | 1
 {noformat}
 Currently, CQL3 allows to slice over either c1 or c2:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND c1  0 AND c1  2
 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2  0 AND c2  2
 {noformat}
 but you cannot express a query that return the 3 last records. Indeed, for 
 that you would need to do a query like say:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2  0) OR c2  0)
 {noformat}
 but we don't support that.
 This can make it hard to paginate over say all records for {{k = 0}} (I'm 
 saying can because if the value for c2 cannot be very large, an easy 
 workaround could be to paginate by entire value of c1, which you can do).
 For the case where you only paginate to avoid OOMing on a query, 
 CASSANDRA-4415 will that and is probably the best solution. However, there 
 may be case where the pagination is say user (as in, the user of your 
 application) triggered.
 I note that one solution would be to add the OR support at least in case like 
 the one above. That's definitively doable but on the other side, we won't be 
 able to support full-blown OR, so it may not be very natural that we support 
 seemingly random combination of OR and not others.
 Another solution would be to allow the following syntax:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND (c1, c2)  (0, 0)
 {noformat}
 which would literally mean that you want records where the values of c1 and 
 c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This 
 is less SQL-like (though maybe some SQL store have that, it's a fairly thing 
 to have imo?), but would be much simpler to implement and probably to use too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites

2014-02-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889582#comment-13889582
 ] 

Sylvain Lebresne commented on CASSANDRA-4851:
-

For info, I pushed a dtest too: 
https://github.com/riptano/cassandra-dtest/commit/da2fb8451b465299c095b320fbfc83c90467a49b

 CQL3: improve support for paginating over composites
 

 Key: CASSANDRA-4851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 4851.txt


 Consider the following table:
 {noformat}
 CREATE TABLE test (
 k int,
 c1 int,
 c2 int,
 PRIMARY KEY (k, c1, c2)
 )
 {noformat}
 with the following data:
 {noformat}
 k | c1 | c2
 
 0 | 0  | 0
 0 | 0  | 1
 0 | 1  | 0
 0 | 1  | 1
 {noformat}
 Currently, CQL3 allows to slice over either c1 or c2:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND c1  0 AND c1  2
 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2  0 AND c2  2
 {noformat}
 but you cannot express a query that return the 3 last records. Indeed, for 
 that you would need to do a query like say:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2  0) OR c2  0)
 {noformat}
 but we don't support that.
 This can make it hard to paginate over say all records for {{k = 0}} (I'm 
 saying can because if the value for c2 cannot be very large, an easy 
 workaround could be to paginate by entire value of c1, which you can do).
 For the case where you only paginate to avoid OOMing on a query, 
 CASSANDRA-4415 will that and is probably the best solution. However, there 
 may be case where the pagination is say user (as in, the user of your 
 application) triggered.
 I note that one solution would be to add the OR support at least in case like 
 the one above. That's definitively doable but on the other side, we won't be 
 able to support full-blown OR, so it may not be very natural that we support 
 seemingly random combination of OR and not others.
 Another solution would be to allow the following syntax:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND (c1, c2)  (0, 0)
 {noformat}
 which would literally mean that you want records where the values of c1 and 
 c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This 
 is less SQL-like (though maybe some SQL store have that, it's a fairly thing 
 to have imo?), but would be much simpler to implement and probably to use too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5323) Revisit disabled dtests

2014-02-03 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889590#comment-13889590
 ] 

Michael Shuler commented on CASSANDRA-5323:
---

*All* dtest have been running on cassandra-2.0 branch for a few days, without 
hanging up the entire test  :)

 Revisit disabled dtests
 ---

 Key: CASSANDRA-5323
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5323
 Project: Cassandra
  Issue Type: Test
Reporter: Ryan McGuire
Assignee: Michael Shuler

 The following dtests are disabled in buildbot, if they can be re-enabled 
 great, if they can't can they be fixed? 
 upgrade|decommission|sstable_gen|global_row|putget_2dc|cql3_insert



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6640) Improve custom 2i performance and abstraction

2014-02-03 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889600#comment-13889600
 ] 

Sam Tunnicliffe commented on CASSANDRA-6640:


Second patch LGTM +1

 Improve custom 2i performance and abstraction
 -

 Key: CASSANDRA-6640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Miguel Angel Fernandez Diaz
Assignee: Miguel Angel Fernandez Diaz
  Labels: patch, performance
 Fix For: 2.1

 Attachments: 6640.diff, 6640v2.diff


 With the current implementation, the method update from SecondaryIndexManager 
 forces to insert and delete a cell. That happens because we assume that we 
 need the value of the old cell in order to locate the cell that we are 
 updating in our custom secondary index implementation. 
 However, depending on the implementation, a insert and a delete operations 
 could have much worse performance than a simple update. Moreover, if our 
 custom secondary index doesn't use inverted indexes, we don't really need the 
 old cell information and the key information is enough. 
 Therefore, a good solution would be to make the update method more abstract. 
 Thus, the update method for PerColumnSecondaryIndex would receive also the 
 old cell information and from that point we could decide if we must carry out 
 the delete+insert operation or just a update operation.
 I attach a patch that implements this solution.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889646#comment-13889646
 ] 

Yuki Morishita commented on CASSANDRA-5351:
---

[~krummas] Still reviewing your patch, but I see following problems:

* Sequential(Snapshot) repair is now default, so to use this feature users need 
to do 'nodetool repair -par -inc' since I don't see incremental repair codes in 
snapshot repair code path.
* Performing anti-compaction in repair thread sequentially for all parent 
sessions seems problematic performance-wise.

And for STCS major compaction, I prefer not to change current behavior, 
dropping compacted SSTable to UNREPAIRED is fine I think. I think it is 
surprise for users (despite major compaction is not recommended).

I'll look deeper but that's what I have currently.

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


git commit: Improve custom 2i performance and abstraction Patch by Miguel Angel Fernandez Diaz, reviewed by Sam Tunnicliffe for CASSANDRA-6640

2014-02-03 Thread brandonwilliams
Updated Branches:
  refs/heads/trunk aa29b6af6 - fc91071c0


Improve custom 2i performance and abstraction
Patch by Miguel Angel Fernandez Diaz, reviewed by Sam Tunnicliffe for
CASSANDRA-6640


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fc91071c
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fc91071c
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fc91071c

Branch: refs/heads/trunk
Commit: fc91071c01c33500774de83944bf5f937397c089
Parents: aa29b6a
Author: Brandon Williams brandonwilli...@apache.org
Authored: Mon Feb 3 11:33:37 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Mon Feb 3 11:33:37 2014 -0600

--
 .../db/index/AbstractSimplePerColumnSecondaryIndex.java   | 7 +--
 .../apache/cassandra/db/index/PerColumnSecondaryIndex.java| 3 ++-
 .../org/apache/cassandra/db/index/SecondaryIndexManager.java  | 7 +++
 test/unit/org/apache/cassandra/db/RangeTombstoneTest.java | 2 +-
 .../org/apache/cassandra/db/SecondaryIndexCellSizeTest.java   | 2 +-
 5 files changed, 12 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
--
diff --git 
a/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
 
b/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
index 5987d7a..e2a6608 100644
--- 
a/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
+++ 
b/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
@@ -135,9 +135,12 @@ public abstract class 
AbstractSimplePerColumnSecondaryIndex extends PerColumnSec
 indexCfs.apply(valueKey, cfi, SecondaryIndexManager.nullUpdater, 
opGroup, null);
 }
 
-public void update(ByteBuffer rowKey, Cell col, OpOrder.Group opGroup)
-{
+public void update(ByteBuffer rowKey, Cell oldCol, Cell col, OpOrder.Group 
opGroup)
+{
+// insert the new value before removing the old one, so we never have 
a period
+// where the row is invisible to both queries (the opposite seems 
preferable); see CASSANDRA-5540
 insert(rowKey, col, opGroup);
+delete(rowKey, oldCol, opGroup);
 }
 
 public void removeIndex(ByteBuffer columnName)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
--
diff --git 
a/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java 
b/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
index e094c4c..79087d2 100644
--- a/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
+++ b/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
@@ -49,9 +49,10 @@ public abstract class PerColumnSecondaryIndex extends 
SecondaryIndex
  * update a column from the index
  *
  * @param rowKey the underlying row key which is indexed
+ * @param oldCol the previous column info
  * @param col all the column info
  */
-public abstract void update(ByteBuffer rowKey, Cell col, OpOrder.Group 
opGroup);
+public abstract void update(ByteBuffer rowKey, Cell oldCol, Cell col, 
OpOrder.Group opGroup);
 
 public String getNameForSystemKeyspace(ByteBuffer column)
 {

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
--
diff --git a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java 
b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
index 66e549d..946e3be 100644
--- a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
+++ b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
@@ -676,11 +676,10 @@ public class SecondaryIndexManager
 {
 if (index instanceof PerColumnSecondaryIndex)
 {
-// insert the new value before removing the old one, so we 
never have a period
-// where the row is invisible to both queries (the 
opposite seems preferable); see CASSANDRA-5540
 if (!cell.isMarkedForDelete(System.currentTimeMillis()))
-((PerColumnSecondaryIndex) index).insert(key.key, 
cell, opGroup);
-((PerColumnSecondaryIndex) index).delete(key.key, oldCell, 
opGroup);
+((PerColumnSecondaryIndex) index).update(key.key, 
oldCell, cell, 

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889667#comment-13889667
 ] 

Marcus Eriksson commented on CASSANDRA-5351:


[~yukim] thanks for the comments;

bq. I don't see incremental repair codes in snapshot repair code path.
right.. snapshot repairs... will look at them tomorrow

bq. STCS major compaction, I prefer not to change current behavior
Dropping sstable to UNREPAIRED during major compaction means that all repaired 
data status is cleared for the node. Maybe we could make major compaction do 2 
separate compactions? Ending up with 2 sstables should be fine for users right?

bq. Performing anti-compaction in repair thread sequentially for all parent 
sessions seems problematic performance-wise.
I will move anticompaction out of the repair thread as well

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (CASSANDRA-6646) Disk Failure Policy ignores CorruptBlockException

2014-02-03 Thread sankalp kohli (JIRA)
sankalp kohli created CASSANDRA-6646:


 Summary: Disk Failure Policy ignores CorruptBlockException 
 Key: CASSANDRA-6646
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6646
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Priority: Minor


If Cassandra is using compression and has a bad drive or stable, it will throw 
an CorruptBlockException. 
Disk Failure Policy only works if it is an FSError and does not work for 
IOExceptions like this. 
We need to better handle such exceptions as it causes nodes to not respond to 
the co-ordinator causing the client to timeout. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup

2014-02-03 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889826#comment-13889826
 ] 

Vijay commented on CASSANDRA-6590:
--

Sorry was shooting a different message during the startup, fixed and pushed to 
https://github.com/Vijay2win/cassandra/tree/6590-v3. Thanks!



 Gossip does not heal after a temporary partition at startup
 ---

 Key: CASSANDRA-6590
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Vijay
 Fix For: 2.0.6

 Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 
 6590_disable_echo.txt


 See CASSANDRA-6571 for background.  If a node is partitioned on startup when 
 the echo command is sent, but then the partition heals, the halves of the 
 partition will never mark each other up despite being able to communicate.  
 This stems from CASSANDRA-3533.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


git commit: Let scrub optionally skip broken counter partitions

2014-02-03 Thread aleksey
Updated Branches:
  refs/heads/cassandra-2.0 b71372146 - 728c4fa9b


Let scrub optionally skip broken counter partitions

patch by Tyler Hobbs; reviewed by Aleksey Yeschenko for CASSANDRA-5930


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/728c4fa9
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/728c4fa9
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/728c4fa9

Branch: refs/heads/cassandra-2.0
Commit: 728c4fa9bf2b2c11dbc61c8e5536b1542abc1ccb
Parents: b713721
Author: Aleksey Yeschenko alek...@apache.org
Authored: Mon Feb 3 23:01:31 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Mon Feb 3 23:01:31 2014 +0300

--
 CHANGES.txt |  4 +
 NEWS.txt| 12 ++-
 .../apache/cassandra/db/ColumnFamilyStore.java  |  4 +-
 .../db/compaction/CompactionManager.java| 12 +--
 .../cassandra/db/compaction/Scrubber.java   | 37 ++---
 .../cassandra/service/StorageService.java   |  4 +-
 .../cassandra/service/StorageServiceMBean.java  |  2 +-
 .../org/apache/cassandra/tools/NodeCmd.java |  6 +-
 .../org/apache/cassandra/tools/NodeProbe.java   |  4 +-
 .../cassandra/tools/StandaloneScrubber.java |  6 +-
 .../apache/cassandra/tools/NodeToolHelp.yaml|  6 +-
 .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++--
 12 files changed, 140 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 13b4c5b..a1a58a3 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,3 +1,7 @@
+2.0.6
+ * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
+
+
 2.0.5
  * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609)
  * Add ks.cf names to tombstone logging (CASSANDRA-6597)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 92446c8..b21fbaa 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -14,11 +14,21 @@ restore snapshots created with the previous major version 
using the
 using the provided 'sstableupgrade' tool.
 
 
+2.0.6
+=
+
+New features
+
+- Scrub can now optionally skip corrupt counter partitions. Please note
+  that this will lead to the loss of all the counter updates in the skipped
+  partition. See the --skip-corrupted option.
+
+
 2.0.5
 =
 
 New features
-
+
 - Batchlog replay can be, and is throttled by default now.
   See batchlog_replay_throttle_in_kb setting in cassandra.yaml.
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index 8750026..38d87db 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -1115,12 +1115,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 CompactionManager.instance.performCleanup(ColumnFamilyStore.this, 
renewer);
 }
 
-public void scrub(boolean disableSnapshot) throws ExecutionException, 
InterruptedException
+public void scrub(boolean disableSnapshot, boolean skipCorrupted) throws 
ExecutionException, InterruptedException
 {
 // skip snapshot creation during scrub, SEE JIRA 5891
 if(!disableSnapshot)
 snapshotWithoutFlush(pre-scrub- + System.currentTimeMillis());
-CompactionManager.instance.performScrub(ColumnFamilyStore.this);
+CompactionManager.instance.performScrub(ColumnFamilyStore.this, 
skipCorrupted);
 }
 
 public void sstablesRewrite(boolean excludeCurrentVersion) throws 
ExecutionException, InterruptedException

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
--
diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java 
b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
index 168ee02..48900c8 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
@@ -227,13 +227,13 @@ public class CompactionManager implements 
CompactionManagerMBean
 executor.submit(runnable).get();
 }
 
-public void 

[jira] [Updated] (CASSANDRA-6622) Streaming session failures during node replace using replace_address

2014-02-03 Thread Ravi Prasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prasad updated CASSANDRA-6622:
---

Attachment: logs.tgz

 Streaming session failures during node replace using replace_address
 

 Key: CASSANDRA-6622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
 Project: Cassandra
  Issue Type: Bug
 Environment: RHEL6, cassandra-2.0.4
Reporter: Ravi Prasad
Assignee: Brandon Williams
 Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 
 6622-2.0.txt, logs.tgz


 When using replace_address, Gossiper ApplicationState is set to hibernate, 
 which is a down state. We are seeing that the peer nodes are seeing streaming 
 plan request even before the Gossiper on them marks the replacing node as 
 dead. As a result, streaming on peer nodes convicts the replacing node by 
 closing the stream handler.  
 I think, making the StorageService thread on the replacing node, sleep for 
 BROADCAST_INTERVAL before bootstrapping, would avoid this scenario.
 Relevant logs from peer node (see that the Gossiper on peer node mark the 
 replacing node as down, 2 secs after  the streaming init request):
 {noformat}
  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 
 StreamResultFuture.java (line 116) [Stream 
 #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
 
  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is 
 complete
  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) 
 InetAddress /x.x.x.x is now DOWN
 ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 
 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
 java.lang.RuntimeException: Outgoing stream handler has been closed
 at 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
 at 
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
 at 
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
 at java.lang.Thread.run(Thread.java:722)
  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with 
 /x.x.x.x is complete
  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address

2014-02-03 Thread Ravi Prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889856#comment-13889856
 ] 

Ravi Prasad commented on CASSANDRA-6622:


In attached logs, .72 was the replacing node, .73 is where the streaming 
session failed. I had trace logging turned on in .73 for 
org.apache.cassandra.gms.  Looks like, it is FailureDetector is convicting.  I 
have to mention that this was with 
'0001-don-t-signal-restart-of-dead-states.txt' applied on cassandra-2.0.4.

 Streaming session failures during node replace using replace_address
 

 Key: CASSANDRA-6622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
 Project: Cassandra
  Issue Type: Bug
 Environment: RHEL6, cassandra-2.0.4
Reporter: Ravi Prasad
Assignee: Brandon Williams
 Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 
 6622-2.0.txt, logs.tgz


 When using replace_address, Gossiper ApplicationState is set to hibernate, 
 which is a down state. We are seeing that the peer nodes are seeing streaming 
 plan request even before the Gossiper on them marks the replacing node as 
 dead. As a result, streaming on peer nodes convicts the replacing node by 
 closing the stream handler.  
 I think, making the StorageService thread on the replacing node, sleep for 
 BROADCAST_INTERVAL before bootstrapping, would avoid this scenario.
 Relevant logs from peer node (see that the Gossiper on peer node mark the 
 replacing node as down, 2 secs after  the streaming init request):
 {noformat}
  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 
 StreamResultFuture.java (line 116) [Stream 
 #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
 
  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is 
 complete
  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) 
 InetAddress /x.x.x.x is now DOWN
 ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 
 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
 java.lang.RuntimeException: Outgoing stream handler has been closed
 at 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
 at 
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
 at 
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
 at java.lang.Thread.run(Thread.java:722)
  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with 
 /x.x.x.x is complete
  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[2/4] Merge branch 'cassandra-2.0' into trunk

2014-02-03 Thread aleksey
http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/test/unit/org/apache/cassandra/db/ScrubTest.java
--
diff --cc test/unit/org/apache/cassandra/db/ScrubTest.java
index 38c8b62,08dd435..d8ab9ff
--- a/test/unit/org/apache/cassandra/db/ScrubTest.java
+++ b/test/unit/org/apache/cassandra/db/ScrubTest.java
@@@ -39,9 -41,9 +41,11 @@@ import org.apache.cassandra.db.compacti
  import org.apache.cassandra.exceptions.ConfigurationException;
  import org.apache.cassandra.db.columniterator.IdentityQueryFilter;
  import org.apache.cassandra.db.compaction.CompactionManager;
++import org.apache.cassandra.exceptions.WriteTimeoutException;
  import org.apache.cassandra.io.sstable.*;
  import org.apache.cassandra.utils.ByteBufferUtil;
  
++import static org.apache.cassandra.Util.cellname;
  import static org.apache.cassandra.Util.column;
  import static org.junit.Assert.assertEquals;
  import static org.junit.Assert.fail;
@@@ -76,6 -79,53 +81,53 @@@ public class ScrubTest extends SchemaLo
  }
  
  @Test
 -public void testScrubCorruptedCounterRow() throws IOException, 
InterruptedException, ExecutionException
++public void testScrubCorruptedCounterRow() throws IOException, 
InterruptedException, ExecutionException, WriteTimeoutException
+ {
+ CompactionManager.instance.disableAutoCompaction();
+ Keyspace keyspace = Keyspace.open(KEYSPACE);
+ ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(COUNTER_CF);
+ cfs.clearUnsafe();
+ 
+ fillCounterCF(cfs, 2);
+ 
+ ListRow rows = cfs.getRangeSlice(Util.range(, ), null, new 
IdentityQueryFilter(), 1000);
+ assertEquals(2, rows.size());
+ 
+ SSTableReader sstable = cfs.getSSTables().iterator().next();
+ 
+ // overwrite one row with garbage
+ long row0Start = 
sstable.getPosition(RowPosition.forKey(ByteBufferUtil.bytes(0), 
sstable.partitioner), SSTableReader.Operator.EQ).position;
+ long row1Start = 
sstable.getPosition(RowPosition.forKey(ByteBufferUtil.bytes(1), 
sstable.partitioner), SSTableReader.Operator.EQ).position;
+ long startPosition = row0Start  row1Start ? row0Start : row1Start;
+ long endPosition = row0Start  row1Start ? row1Start : row0Start;
+ 
+ RandomAccessFile file = new RandomAccessFile(sstable.getFilename(), 
rw);
+ file.seek(startPosition);
+ file.writeBytes(StringUtils.repeat('z', (int) (endPosition - 
startPosition)));
+ file.close();
+ 
+ // with skipCorrupted == false, the scrub is expected to fail
+ Scrubber scrubber = new Scrubber(cfs, sstable, false);
+ try
+ {
+ scrubber.scrub();
+ fail(Expected a CorruptSSTableException to be thrown);
+ }
+ catch (IOError err) {}
+ 
+ // with skipCorrupted == true, the corrupt row will be skipped
+ scrubber = new Scrubber(cfs, sstable, true);
+ scrubber.scrub();
+ scrubber.close();
+ cfs.replaceCompactedSSTables(Collections.singletonList(sstable), 
Collections.singletonList(scrubber.getNewSSTable()), OperationType.SCRUB);
+ assertEquals(1, cfs.getSSTables().size());
+ 
+ // verify that we can read all of the rows, and there is now one less 
row
+ rows = cfs.getRangeSlice(Util.range(, ), null, new 
IdentityQueryFilter(), 1000);
+ assertEquals(1, rows.size());
+ }
+ 
+ @Test
  public void testScrubDeletedRow() throws IOException, ExecutionException, 
InterruptedException, ConfigurationException
  {
  CompactionManager.instance.disableAutoCompaction();
@@@ -207,4 -256,20 +258,20 @@@
  
  cfs.forceBlockingFlush();
  }
+ 
 -protected void fillCounterCF(ColumnFamilyStore cfs, int rowsPerSSTable) 
throws ExecutionException, InterruptedException, IOException
++protected void fillCounterCF(ColumnFamilyStore cfs, int rowsPerSSTable) 
throws ExecutionException, InterruptedException, IOException, 
WriteTimeoutException
+ {
+ for (int i = 0; i  rowsPerSSTable; i++)
+ {
+ String key = String.valueOf(i);
+ ColumnFamily cf = 
TreeMapBackedSortedColumns.factory.create(KEYSPACE, COUNTER_CF);
 -RowMutation rm = new RowMutation(KEYSPACE, 
ByteBufferUtil.bytes(key), cf);
 -rm.addCounter(COUNTER_CF, ByteBufferUtil.bytes(Column1), 100);
++Mutation rm = new Mutation(KEYSPACE, ByteBufferUtil.bytes(key), 
cf);
++rm.addCounter(COUNTER_CF, cellname(Column1), 100);
+ CounterMutation cm = new CounterMutation(rm, 
ConsistencyLevel.ONE);
+ cm.apply();
+ }
+ 
+ cfs.forceBlockingFlush();
+ }
+ 
 -}
 +}



[1/4] git commit: Let scrub optionally skip broken counter partitions

2014-02-03 Thread aleksey
Updated Branches:
  refs/heads/trunk fc91071c0 - 63f110b5e


Let scrub optionally skip broken counter partitions

patch by Tyler Hobbs; reviewed by Aleksey Yeschenko for CASSANDRA-5930


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/728c4fa9
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/728c4fa9
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/728c4fa9

Branch: refs/heads/trunk
Commit: 728c4fa9bf2b2c11dbc61c8e5536b1542abc1ccb
Parents: b713721
Author: Aleksey Yeschenko alek...@apache.org
Authored: Mon Feb 3 23:01:31 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Mon Feb 3 23:01:31 2014 +0300

--
 CHANGES.txt |  4 +
 NEWS.txt| 12 ++-
 .../apache/cassandra/db/ColumnFamilyStore.java  |  4 +-
 .../db/compaction/CompactionManager.java| 12 +--
 .../cassandra/db/compaction/Scrubber.java   | 37 ++---
 .../cassandra/service/StorageService.java   |  4 +-
 .../cassandra/service/StorageServiceMBean.java  |  2 +-
 .../org/apache/cassandra/tools/NodeCmd.java |  6 +-
 .../org/apache/cassandra/tools/NodeProbe.java   |  4 +-
 .../cassandra/tools/StandaloneScrubber.java |  6 +-
 .../apache/cassandra/tools/NodeToolHelp.yaml|  6 +-
 .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++--
 12 files changed, 140 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 13b4c5b..a1a58a3 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,3 +1,7 @@
+2.0.6
+ * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
+
+
 2.0.5
  * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609)
  * Add ks.cf names to tombstone logging (CASSANDRA-6597)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 92446c8..b21fbaa 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -14,11 +14,21 @@ restore snapshots created with the previous major version 
using the
 using the provided 'sstableupgrade' tool.
 
 
+2.0.6
+=
+
+New features
+
+- Scrub can now optionally skip corrupt counter partitions. Please note
+  that this will lead to the loss of all the counter updates in the skipped
+  partition. See the --skip-corrupted option.
+
+
 2.0.5
 =
 
 New features
-
+
 - Batchlog replay can be, and is throttled by default now.
   See batchlog_replay_throttle_in_kb setting in cassandra.yaml.
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index 8750026..38d87db 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -1115,12 +1115,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 CompactionManager.instance.performCleanup(ColumnFamilyStore.this, 
renewer);
 }
 
-public void scrub(boolean disableSnapshot) throws ExecutionException, 
InterruptedException
+public void scrub(boolean disableSnapshot, boolean skipCorrupted) throws 
ExecutionException, InterruptedException
 {
 // skip snapshot creation during scrub, SEE JIRA 5891
 if(!disableSnapshot)
 snapshotWithoutFlush(pre-scrub- + System.currentTimeMillis());
-CompactionManager.instance.performScrub(ColumnFamilyStore.this);
+CompactionManager.instance.performScrub(ColumnFamilyStore.this, 
skipCorrupted);
 }
 
 public void sstablesRewrite(boolean excludeCurrentVersion) throws 
ExecutionException, InterruptedException

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
--
diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java 
b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
index 168ee02..48900c8 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
@@ -227,13 +227,13 @@ public class CompactionManager implements 
CompactionManagerMBean
 executor.submit(runnable).get();
 }
 
-public void performScrub(ColumnFamilyStore 

[4/4] git commit: Merge branch 'cassandra-2.0' into trunk

2014-02-03 Thread aleksey
Merge branch 'cassandra-2.0' into trunk

Conflicts:
CHANGES.txt
src/java/org/apache/cassandra/tools/NodeCmd.java
src/resources/org/apache/cassandra/tools/NodeToolHelp.yaml


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/63f110b5
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/63f110b5
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/63f110b5

Branch: refs/heads/trunk
Commit: 63f110b5e058217c1d7e3d178b367b918ca2f856
Parents: fc91071 728c4fa
Author: Aleksey Yeschenko alek...@apache.org
Authored: Mon Feb 3 23:32:23 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Mon Feb 3 23:32:23 2014 +0300

--
 CHANGES.txt |  4 +
 NEWS.txt| 12 ++-
 .../apache/cassandra/db/ColumnFamilyStore.java  |  4 +-
 .../db/compaction/CompactionManager.java| 12 +--
 .../cassandra/db/compaction/Scrubber.java   | 37 ++---
 .../cassandra/service/StorageService.java   |  4 +-
 .../cassandra/service/StorageServiceMBean.java  |  2 +-
 .../org/apache/cassandra/tools/NodeProbe.java   |  4 +-
 .../org/apache/cassandra/tools/NodeTool.java| 11 ++-
 .../cassandra/tools/StandaloneScrubber.java |  6 +-
 .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++--
 11 files changed, 141 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/CHANGES.txt
--
diff --cc CHANGES.txt
index 6ca163a,a1a58a3..f9da65c
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,32 -1,7 +1,36 @@@
 +2.1
 + * add listsnapshots command to nodetool (CASSANDRA-5742)
 + * Introduce AtomicBTreeColumns (CASSANDRA-6271)
 + * Multithreaded commitlog (CASSANDRA-3578)
 + * allocate fixed index summary memory pool and resample cold index summaries 
 +   to use less memory (CASSANDRA-5519)
 + * Removed multithreaded compaction (CASSANDRA-6142)
 + * Parallelize fetching rows for low-cardinality indexes (CASSANDRA-1337)
 + * change logging from log4j to logback (CASSANDRA-5883)
 + * switch to LZ4 compression for internode communication (CASSANDRA-5887)
 + * Stop using Thrift-generated Index* classes internally (CASSANDRA-5971)
 + * Remove 1.2 network compatibility code (CASSANDRA-5960)
 + * Remove leveled json manifest migration code (CASSANDRA-5996)
 + * Remove CFDefinition (CASSANDRA-6253)
 + * Use AtomicIntegerFieldUpdater in RefCountedMemory (CASSANDRA-6278)
 + * User-defined types for CQL3 (CASSANDRA-5590)
 + * Use of o.a.c.metrics in nodetool (CASSANDRA-5871, 6406)
 + * Batch read from OTC's queue and cleanup (CASSANDRA-1632)
 + * Secondary index support for collections (CASSANDRA-4511, 6383)
 + * SSTable metadata(Stats.db) format change (CASSANDRA-6356)
 + * Push composites support in the storage engine
 +   (CASSANDRA-5417, CASSANDRA-6520)
 + * Add snapshot space used to cfstats (CASSANDRA-6231)
 + * Add cardinality estimator for key count estimation (CASSANDRA-5906)
 + * CF id is changed to be non-deterministic. Data dir/key cache are created
 +   uniquely for CF id (CASSANDRA-5202)
 + * New counters implementation (CASSANDRA-6504)
 +
 +
+ 2.0.6
+  * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
+ 
+ 
  2.0.5
   * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609)
   * Add ks.cf names to tombstone logging (CASSANDRA-6597)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/NEWS.txt
--
diff --cc NEWS.txt
index 72b898e,b21fbaa..185f60c
--- a/NEWS.txt
+++ b/NEWS.txt
@@@ -13,37 -13,17 +13,47 @@@ restore snapshots created with the prev
  'sstableloader' tool. You can upgrade the file format of your snapshots
  using the provided 'sstableupgrade' tool.
  
 +2.1
 +===
 +
 +New features
 +
 +   - SSTable data directory name is slightly changed. Each directory will
 + have hex string appended after CF name, e.g.
 + ks/cf-5be396077b811e3a3ab9dc4b9ac088d/
 + This hex string part represents unique ColumnFamily ID.
 + Note that existing directories are used as is, so only newly created
 + directories after upgrade have new directory name format.
 +   - Saved key cache files also have ColumnFamily ID in their file name.
 +
 +Upgrading
 +-
 +   - Rolling upgrades from anything pre-2.0.5 is not supported.
 +   - For leveled compaction users, 2.0 must be atleast started before
 + upgrading to 2.1 due to the fact that the old JSON leveled
 + manifest is migrated into the sstable metadata files on startup
 + in 2.0 and this code is gone from 2.1.
 +   - For size-tiered compaction users, Cassandra now defaults to ignoring
 + 

[Cassandra Wiki] Trivial Update of ThirdPartySupport by AlekseyYeschenko

2014-02-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThirdPartySupport page has been changed by AlekseyYeschenko:
https://wiki.apache.org/cassandra/ThirdPartySupport?action=diffrev1=42rev2=43

  Companies providing support for Apache Cassandra are not endorsed by the 
Apache Software Foundation, although some of these companies employ 
[[Committers]] to the Apache project.
  
- Companies that employ Apache Cassandra [[Committers]]:
+ == Companies that employ Apache Cassandra Committers: ==
  
  {{http://www.datastax.com/wp-content/themes/datastax-custom/images/logo.png}} 
[[http://datastax.com|Datastax]], the commercial leader in Apache Cassandraâ„¢ 
offers products and services that make it easy for customers to build, deploy 
and operate elastically scalable and cloud-optimized applications and data 
services. [[http://datastax.com|DataStax]] has over 100 customers, including 
leaders such as Netflix, Cisco, Rackspace, HP, Constant Contact and 
[[http://www.datastax.com/cassandrausers|more]], and spanning verticals 
including web, financial services, telecommunications, logistics and government.
  
- Other companies:
+ == Other companies: ==
  
  {{http://www.acunu.com/uploads/1/1/5/5/11559475/1335714080.png}} 
[[http://www.acunu.com|Acunu]] are world experts in Apache Cassandra and 
beyond. Some of the most challenging Cassandra deployments already rely on 
Acunu's technology, training and support. With a focus real time applications, 
Acunu makes it easy to build Cassandra based real-time Big Data solutions that 
derive instant answers from event streams and deliver fresh insight
  


[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-02-03 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889939#comment-13889939
 ] 

Lyuben Todorov commented on CASSANDRA-6572:
---

I think it would be easier for users to understand what's going on if we record 
the CQL query string in QP#processStatement and pass it to a function is SP (so 
the majority of the work can be done in SP but still allow us to capture the 
CQL string which is easy to understand) and then save that to a system table 
(not though about the name yet) along with the timestamp of execution this will 
give us a good starting point. 

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.6


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6510) Don't drop local mutations without a trace

2014-02-03 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889940#comment-13889940
 ] 

Robert Coli commented on CASSANDRA-6510:


This bug seems to have the implication that no ConsistencyLevel has had its 
supposed meaning for the duration of the bug, because there is no guarantee 
that the acknowledged-to-the-client local write actually succeeds? Is that 
correct?

If so, this issue seems quite fundamental and serious; why did automated 
testing not surface it? Is there now a test which covers this case?

What is the since for this issue? Looks like at least 1.2.0?

 Don't drop local mutations without a trace
 --

 Key: CASSANDRA-6510
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.14, 2.0.4

 Attachments: 6510.txt


 SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
 mutations get dropped without leaving a trace. SP.insertLocal() should be 
 using LocalMutationRunnable instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6510) Don't drop local mutations without a trace

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889949#comment-13889949
 ] 

Jonathan Ellis commented on CASSANDRA-6510:
---

Nope, that's not the implication.  You can easily see from the code that 
{{responseHandler.response}} only gets called after {{rm.apply}}.  That is, no 
write is acknowledge if it hasn't actually been applied.

 Don't drop local mutations without a trace
 --

 Key: CASSANDRA-6510
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.14, 2.0.4

 Attachments: 6510.txt


 SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
 mutations get dropped without leaving a trace. SP.insertLocal() should be 
 using LocalMutationRunnable instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6510) Don't drop local mutations without a hint

2014-02-03 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-6510:
-

Description: 
SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
mutations get dropped without leaving a hint. SP.insertLocal() should be using 
LocalMutationRunnable instead.

Note: hints are the context here, not consistency.

  was:SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
mutations get dropped without leaving a trace. SP.insertLocal() should be using 
LocalMutationRunnable instead.

Summary: Don't drop local mutations without a hint  (was: Don't drop 
local mutations without a trace)

 Don't drop local mutations without a hint
 -

 Key: CASSANDRA-6510
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.14, 2.0.4

 Attachments: 6510.txt


 SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
 mutations get dropped without leaving a hint. SP.insertLocal() should be 
 using LocalMutationRunnable instead.
 Note: hints are the context here, not consistency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6510) Don't drop local mutations without a hint

2014-02-03 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889968#comment-13889968
 ] 

Robert Coli commented on CASSANDRA-6510:


Thanks for the clarification. Others who look to JIRA to understand impact will 
appreciate not having to try to deduce it from reading the patch.

What is the since for this issue? Looks like at least 1.2.0?

 Don't drop local mutations without a hint
 -

 Key: CASSANDRA-6510
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.14, 2.0.4

 Attachments: 6510.txt


 SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
 mutations get dropped without leaving a hint. SP.insertLocal() should be 
 using LocalMutationRunnable instead.
 Note: hints are the context here, not consistency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


git commit: Unscarify 6510 CHANGES

2014-02-03 Thread aleksey
Updated Branches:
  refs/heads/cassandra-1.2 3f9875c7f - 814a91209


Unscarify 6510 CHANGES


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/814a9120
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/814a9120
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/814a9120

Branch: refs/heads/cassandra-1.2
Commit: 814a91209418206f791eda1cebc83262c9e225f0
Parents: 3f9875c
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Feb 4 01:00:29 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Feb 4 01:00:29 2014 +0300

--
 CHANGES.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/814a9120/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 68bed3b..981f977 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,7 +5,7 @@
  * Allow executing CREATE statements multiple times (CASSANDRA-6471)
  * Don't send confusing info with timeouts (CASSANDRA-6491)
  * Don't resubmit counter mutation runnables internally (CASSANDRA-6427)
- * Don't drop local mutations without a trace (CASSANDRA-6510)
+ * Don't drop local mutations without a hint (CASSANDRA-6510)
  * Don't allow null max_hint_window_in_ms (CASSANDRA-6419)
  * Validate SliceRange start and finish lengths (CASSANDRA-6521)
  * fsync compression metadata (CASSANDRA-6531)



[1/3] git commit: Unscarify 6510 CHANGES

2014-02-03 Thread aleksey
Updated Branches:
  refs/heads/trunk 63f110b5e - 78f71420c


Unscarify 6510 CHANGES


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/814a9120
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/814a9120
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/814a9120

Branch: refs/heads/trunk
Commit: 814a91209418206f791eda1cebc83262c9e225f0
Parents: 3f9875c
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Feb 4 01:00:29 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Feb 4 01:00:29 2014 +0300

--
 CHANGES.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/814a9120/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 68bed3b..981f977 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,7 +5,7 @@
  * Allow executing CREATE statements multiple times (CASSANDRA-6471)
  * Don't send confusing info with timeouts (CASSANDRA-6491)
  * Don't resubmit counter mutation runnables internally (CASSANDRA-6427)
- * Don't drop local mutations without a trace (CASSANDRA-6510)
+ * Don't drop local mutations without a hint (CASSANDRA-6510)
  * Don't allow null max_hint_window_in_ms (CASSANDRA-6419)
  * Validate SliceRange start and finish lengths (CASSANDRA-6521)
  * fsync compression metadata (CASSANDRA-6531)



[2/3] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0

2014-02-03 Thread aleksey
Merge branch 'cassandra-1.2' into cassandra-2.0

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/066d00ba
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/066d00ba
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/066d00ba

Branch: refs/heads/trunk
Commit: 066d00ba5183a3a37b962334d0442edaaf9bebc8
Parents: 728c4fa 814a912
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Feb 4 01:01:42 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Feb 4 01:01:42 2014 +0300

--
 CHANGES.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/066d00ba/CHANGES.txt
--
diff --cc CHANGES.txt
index a1a58a3,981f977..4440942
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -77,51 -45,9 +77,51 @@@ Merged from 1.2
 (CASSANDRA-6413)
   * (Hadoop) add describe_local_ring (CASSANDRA-6268)
   * Fix handling of concurrent directory creation failure (CASSANDRA-6459)
 + * Allow executing CREATE statements multiple times (CASSANDRA-6471)
 + * Don't send confusing info with timeouts (CASSANDRA-6491)
 + * Don't resubmit counter mutation runnables internally (CASSANDRA-6427)
-  * Don't drop local mutations without a trace (CASSANDRA-6510)
++ * Don't drop local mutations without a hint (CASSANDRA-6510)
 + * Don't allow null max_hint_window_in_ms (CASSANDRA-6419)
 + * Validate SliceRange start and finish lengths (CASSANDRA-6521)
  
  
 -1.2.12
 +2.0.3
 + * Fix FD leak on slice read path (CASSANDRA-6275)
 + * Cancel read meter task when closing SSTR (CASSANDRA-6358)
 + * free off-heap IndexSummary during bulk (CASSANDRA-6359)
 + * Recover from IOException in accept() thread (CASSANDRA-6349)
 + * Improve Gossip tolerance of abnormally slow tasks (CASSANDRA-6338)
 + * Fix trying to hint timed out counter writes (CASSANDRA-6322)
 + * Allow restoring specific columnfamilies from archived CL (CASSANDRA-4809)
 + * Avoid flushing compaction_history after each operation (CASSANDRA-6287)
 + * Fix repair assertion error when tombstones expire (CASSANDRA-6277)
 + * Skip loading corrupt key cache (CASSANDRA-6260)
 + * Fixes for compacting larger-than-memory rows (CASSANDRA-6274)
 + * Compact hottest sstables first and optionally omit coldest from
 +   compaction entirely (CASSANDRA-6109)
 + * Fix modifying column_metadata from thrift (CASSANDRA-6182)
 + * cqlsh: fix LIST USERS output (CASSANDRA-6242)
 + * Add IRequestSink interface (CASSANDRA-6248)
 + * Update memtable size while flushing (CASSANDRA-6249)
 + * Provide hooks around CQL2/CQL3 statement execution (CASSANDRA-6252)
 + * Require Permission.SELECT for CAS updates (CASSANDRA-6247)
 + * New CQL-aware SSTableWriter (CASSANDRA-5894)
 + * Reject CAS operation when the protocol v1 is used (CASSANDRA-6270)
 + * Correctly throw error when frame too large (CASSANDRA-5981)
 + * Fix serialization bug in PagedRange with 2ndary indexes (CASSANDRA-6299)
 + * Fix CQL3 table validation in Thrift (CASSANDRA-6140)
 + * Fix bug missing results with IN clauses (CASSANDRA-6327)
 + * Fix paging with reversed slices (CASSANDRA-6343)
 + * Set minTimestamp correctly to be able to drop expired sstables 
(CASSANDRA-6337)
 + * Support NaN and Infinity as float literals (CASSANDRA-6003)
 + * Remove RF from nodetool ring output (CASSANDRA-6289)
 + * Fix attempting to flush empty rows (CASSANDRA-6374)
 + * Fix potential out of bounds exception when paging (CASSANDRA-6333)
 +Merged from 1.2:
 + * Optimize FD phi calculation (CASSANDRA-6386)
 + * Improve initial FD phi estimate when starting up (CASSANDRA-6385)
 + * Don't list CQL3 table in CLI describe even if named explicitely 
 +   (CASSANDRA-5750)
   * Invalidate row cache when dropping CF (CASSANDRA-6351)
   * add non-jamm path for cached statements (CASSANDRA-6293)
   * (Hadoop) Require CFRR batchSize to be at least 2 (CASSANDRA-6114)



[1/2] git commit: Unscarify 6510 CHANGES

2014-02-03 Thread aleksey
Updated Branches:
  refs/heads/cassandra-2.0 728c4fa9b - 066d00ba5


Unscarify 6510 CHANGES


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/814a9120
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/814a9120
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/814a9120

Branch: refs/heads/cassandra-2.0
Commit: 814a91209418206f791eda1cebc83262c9e225f0
Parents: 3f9875c
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Feb 4 01:00:29 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Feb 4 01:00:29 2014 +0300

--
 CHANGES.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/814a9120/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 68bed3b..981f977 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,7 +5,7 @@
  * Allow executing CREATE statements multiple times (CASSANDRA-6471)
  * Don't send confusing info with timeouts (CASSANDRA-6491)
  * Don't resubmit counter mutation runnables internally (CASSANDRA-6427)
- * Don't drop local mutations without a trace (CASSANDRA-6510)
+ * Don't drop local mutations without a hint (CASSANDRA-6510)
  * Don't allow null max_hint_window_in_ms (CASSANDRA-6419)
  * Validate SliceRange start and finish lengths (CASSANDRA-6521)
  * fsync compression metadata (CASSANDRA-6531)



[3/3] git commit: Merge branch 'cassandra-2.0' into trunk

2014-02-03 Thread aleksey
Merge branch 'cassandra-2.0' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/78f71420
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/78f71420
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/78f71420

Branch: refs/heads/trunk
Commit: 78f71420c33f588dcbb82bcbd689bb4aad6dd92f
Parents: 63f110b 066d00b
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Feb 4 01:02:08 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Feb 4 01:02:08 2014 +0300

--
 CHANGES.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/78f71420/CHANGES.txt
--



[2/2] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0

2014-02-03 Thread aleksey
Merge branch 'cassandra-1.2' into cassandra-2.0

Conflicts:
CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/066d00ba
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/066d00ba
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/066d00ba

Branch: refs/heads/cassandra-2.0
Commit: 066d00ba5183a3a37b962334d0442edaaf9bebc8
Parents: 728c4fa 814a912
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Feb 4 01:01:42 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Feb 4 01:01:42 2014 +0300

--
 CHANGES.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/066d00ba/CHANGES.txt
--
diff --cc CHANGES.txt
index a1a58a3,981f977..4440942
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -77,51 -45,9 +77,51 @@@ Merged from 1.2
 (CASSANDRA-6413)
   * (Hadoop) add describe_local_ring (CASSANDRA-6268)
   * Fix handling of concurrent directory creation failure (CASSANDRA-6459)
 + * Allow executing CREATE statements multiple times (CASSANDRA-6471)
 + * Don't send confusing info with timeouts (CASSANDRA-6491)
 + * Don't resubmit counter mutation runnables internally (CASSANDRA-6427)
-  * Don't drop local mutations without a trace (CASSANDRA-6510)
++ * Don't drop local mutations without a hint (CASSANDRA-6510)
 + * Don't allow null max_hint_window_in_ms (CASSANDRA-6419)
 + * Validate SliceRange start and finish lengths (CASSANDRA-6521)
  
  
 -1.2.12
 +2.0.3
 + * Fix FD leak on slice read path (CASSANDRA-6275)
 + * Cancel read meter task when closing SSTR (CASSANDRA-6358)
 + * free off-heap IndexSummary during bulk (CASSANDRA-6359)
 + * Recover from IOException in accept() thread (CASSANDRA-6349)
 + * Improve Gossip tolerance of abnormally slow tasks (CASSANDRA-6338)
 + * Fix trying to hint timed out counter writes (CASSANDRA-6322)
 + * Allow restoring specific columnfamilies from archived CL (CASSANDRA-4809)
 + * Avoid flushing compaction_history after each operation (CASSANDRA-6287)
 + * Fix repair assertion error when tombstones expire (CASSANDRA-6277)
 + * Skip loading corrupt key cache (CASSANDRA-6260)
 + * Fixes for compacting larger-than-memory rows (CASSANDRA-6274)
 + * Compact hottest sstables first and optionally omit coldest from
 +   compaction entirely (CASSANDRA-6109)
 + * Fix modifying column_metadata from thrift (CASSANDRA-6182)
 + * cqlsh: fix LIST USERS output (CASSANDRA-6242)
 + * Add IRequestSink interface (CASSANDRA-6248)
 + * Update memtable size while flushing (CASSANDRA-6249)
 + * Provide hooks around CQL2/CQL3 statement execution (CASSANDRA-6252)
 + * Require Permission.SELECT for CAS updates (CASSANDRA-6247)
 + * New CQL-aware SSTableWriter (CASSANDRA-5894)
 + * Reject CAS operation when the protocol v1 is used (CASSANDRA-6270)
 + * Correctly throw error when frame too large (CASSANDRA-5981)
 + * Fix serialization bug in PagedRange with 2ndary indexes (CASSANDRA-6299)
 + * Fix CQL3 table validation in Thrift (CASSANDRA-6140)
 + * Fix bug missing results with IN clauses (CASSANDRA-6327)
 + * Fix paging with reversed slices (CASSANDRA-6343)
 + * Set minTimestamp correctly to be able to drop expired sstables 
(CASSANDRA-6337)
 + * Support NaN and Infinity as float literals (CASSANDRA-6003)
 + * Remove RF from nodetool ring output (CASSANDRA-6289)
 + * Fix attempting to flush empty rows (CASSANDRA-6374)
 + * Fix potential out of bounds exception when paging (CASSANDRA-6333)
 +Merged from 1.2:
 + * Optimize FD phi calculation (CASSANDRA-6386)
 + * Improve initial FD phi estimate when starting up (CASSANDRA-6385)
 + * Don't list CQL3 table in CLI describe even if named explicitely 
 +   (CASSANDRA-5750)
   * Invalidate row cache when dropping CF (CASSANDRA-6351)
   * add non-jamm path for cached statements (CASSANDRA-6293)
   * (Hadoop) Require CFRR batchSize to be at least 2 (CASSANDRA-6114)



[jira] [Comment Edited] (CASSANDRA-6510) Don't drop local mutations without a hint

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889949#comment-13889949
 ] 

Jonathan Ellis edited comment on CASSANDRA-6510 at 2/3/14 10:01 PM:


Nope, that's not the implication.  You can see from the code that 
{{responseHandler.response}} only gets called after {{rm.apply}}.  That is, no 
write is acknowledged if it hasn't actually been applied.


was (Author: jbellis):
Nope, that's not the implication.  You can easily see from the code that 
{{responseHandler.response}} only gets called after {{rm.apply}}.  That is, no 
write is acknowledge if it hasn't actually been applied.

 Don't drop local mutations without a hint
 -

 Key: CASSANDRA-6510
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.14, 2.0.4

 Attachments: 6510.txt


 SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
 mutations get dropped without leaving a hint. SP.insertLocal() should be 
 using LocalMutationRunnable instead.
 Note: hints are the context here, not consistency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5863) Create a Decompressed Chunk [block] Cache

2014-02-03 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889997#comment-13889997
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


Just to keep everybody updated, I didn't forget about this although got 
distracted by multiple things coming simultaneously, I tried multiple ways of 
using existing cache non of which yielded good performance. As Jake mentioned 
previously the hard part to track hotness of the sections + low cost lookup for 
already decompressed chunks, I have one more idea how to make it work, will 
keep you posted...

 Create a Decompressed Chunk [block] Cache
 -

 Key: CASSANDRA-5863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Pavel Yaskevich
  Labels: performance
 Fix For: 2.1


 Currently, for every read, the CRAR reads each compressed chunk into a 
 byte[], sends it to ICompressor, gets back another byte[] and verifies a 
 checksum.  
 This process is where the majority of time is spent in a read request.  
 Before compression, we would have zero-copy of data and could respond 
 directly from the page-cache.
 It would be useful to have some kind of Chunk cache that could speed up this 
 process for hot data. Initially this could be a off heap cache but it would 
 be great to put these decompressed chunks onto a SSD so the hot data lives on 
 a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5863) Create a Decompressed Chunk [block] Cache

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890008#comment-13890008
 ] 

Jonathan Ellis commented on CASSANDRA-5863:
---

Thanks for the update.

 Create a Decompressed Chunk [block] Cache
 -

 Key: CASSANDRA-5863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Pavel Yaskevich
  Labels: performance
 Fix For: 2.1


 Currently, for every read, the CRAR reads each compressed chunk into a 
 byte[], sends it to ICompressor, gets back another byte[] and verifies a 
 checksum.  
 This process is where the majority of time is spent in a read request.  
 Before compression, we would have zero-copy of data and could respond 
 directly from the page-cache.
 It would be useful to have some kind of Chunk cache that could speed up this 
 process for hot data. Initially this could be a off heap cache but it would 
 be great to put these decompressed chunks onto a SSD so the hot data lives on 
 a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890012#comment-13890012
 ] 

Jonathan Ellis commented on CASSANDRA-6572:
---

Why split the work across two classes instead of doing it all in QP?

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.6


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6510) Don't drop local mutations without a hint

2014-02-03 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-6510:
-

Since Version: 1.2.1

 Don't drop local mutations without a hint
 -

 Key: CASSANDRA-6510
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.14, 2.0.4

 Attachments: 6510.txt


 SP.insertLocal() uses a regular DroppableRunnable, thus timed out local 
 mutations get dropped without leaving a hint. SP.insertLocal() should be 
 using LocalMutationRunnable instead.
 Note: hints are the context here, not consistency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890409#comment-13890409
 ] 

Jonathan Ellis commented on CASSANDRA-4851:
---

Agreed that this syntax is convenient.

 CQL3: improve support for paginating over composites
 

 Key: CASSANDRA-4851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 4851.txt


 Consider the following table:
 {noformat}
 CREATE TABLE test (
 k int,
 c1 int,
 c2 int,
 PRIMARY KEY (k, c1, c2)
 )
 {noformat}
 with the following data:
 {noformat}
 k | c1 | c2
 
 0 | 0  | 0
 0 | 0  | 1
 0 | 1  | 0
 0 | 1  | 1
 {noformat}
 Currently, CQL3 allows to slice over either c1 or c2:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND c1  0 AND c1  2
 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2  0 AND c2  2
 {noformat}
 but you cannot express a query that return the 3 last records. Indeed, for 
 that you would need to do a query like say:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2  0) OR c2  0)
 {noformat}
 but we don't support that.
 This can make it hard to paginate over say all records for {{k = 0}} (I'm 
 saying can because if the value for c2 cannot be very large, an easy 
 workaround could be to paginate by entire value of c1, which you can do).
 For the case where you only paginate to avoid OOMing on a query, 
 CASSANDRA-4415 will that and is probably the best solution. However, there 
 may be case where the pagination is say user (as in, the user of your 
 application) triggered.
 I note that one solution would be to add the OR support at least in case like 
 the one above. That's definitively doable but on the other side, we won't be 
 able to support full-blown OR, so it may not be very natural that we support 
 seemingly random combination of OR and not others.
 Another solution would be to allow the following syntax:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND (c1, c2)  (0, 0)
 {noformat}
 which would literally mean that you want records where the values of c1 and 
 c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This 
 is less SQL-like (though maybe some SQL store have that, it's a fairly thing 
 to have imo?), but would be much simpler to implement and probably to use too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-4851) CQL3: improve support for paginating over composites

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4851:
--

   Reviewer: Aleksey Yeschenko
Component/s: API

 CQL3: improve support for paginating over composites
 

 Key: CASSANDRA-4851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 4851.txt


 Consider the following table:
 {noformat}
 CREATE TABLE test (
 k int,
 c1 int,
 c2 int,
 PRIMARY KEY (k, c1, c2)
 )
 {noformat}
 with the following data:
 {noformat}
 k | c1 | c2
 
 0 | 0  | 0
 0 | 0  | 1
 0 | 1  | 0
 0 | 1  | 1
 {noformat}
 Currently, CQL3 allows to slice over either c1 or c2:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND c1  0 AND c1  2
 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2  0 AND c2  2
 {noformat}
 but you cannot express a query that return the 3 last records. Indeed, for 
 that you would need to do a query like say:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2  0) OR c2  0)
 {noformat}
 but we don't support that.
 This can make it hard to paginate over say all records for {{k = 0}} (I'm 
 saying can because if the value for c2 cannot be very large, an easy 
 workaround could be to paginate by entire value of c1, which you can do).
 For the case where you only paginate to avoid OOMing on a query, 
 CASSANDRA-4415 will that and is probably the best solution. However, there 
 may be case where the pagination is say user (as in, the user of your 
 application) triggered.
 I note that one solution would be to add the OR support at least in case like 
 the one above. That's definitively doable but on the other side, we won't be 
 able to support full-blown OR, so it may not be very natural that we support 
 seemingly random combination of OR and not others.
 Another solution would be to allow the following syntax:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND (c1, c2)  (0, 0)
 {noformat}
 which would literally mean that you want records where the values of c1 and 
 c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This 
 is less SQL-like (though maybe some SQL store have that, it's a fairly thing 
 to have imo?), but would be much simpler to implement and probably to use too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890411#comment-13890411
 ] 

Jonathan Ellis commented on CASSANDRA-6645:
---

On the first hunk, should 
{{indexManager.getIndexForColumn(expression.column_name)}} even be including 
non-CF indexes?

 upgradesstables causes NPE for secondary indexes without an underlying column 
 family
 

 Key: CASSANDRA-6645
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Fix For: 2.0.6

 Attachments: CASSANDRA-6645.patch


 SecondaryIndex#getIndexCfs is allowed to return null by contract, if the 
 index is not backed by a column family, but this causes an NPE as 
 StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do 
 not check for null values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6645:
--

 Reviewer: Jonathan Ellis
Fix Version/s: 2.0.6

 upgradesstables causes NPE for secondary indexes without an underlying column 
 family
 

 Key: CASSANDRA-6645
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Fix For: 2.0.6

 Attachments: CASSANDRA-6645.patch


 SecondaryIndex#getIndexCfs is allowed to return null by contract, if the 
 index is not backed by a column family, but this causes an NPE as 
 StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do 
 not check for null values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[3/3] git commit: merge from 2.0

2014-02-03 Thread jbellis
merge from 2.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b25a63a8
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b25a63a8
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b25a63a8

Branch: refs/heads/trunk
Commit: b25a63a81d22e409e607ca28c39e20604332cb5d
Parents: 78f7142 039e9b9
Author: Jonathan Ellis jbel...@apache.org
Authored: Mon Feb 3 23:51:16 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Mon Feb 3 23:51:16 2014 -0600

--
 CHANGES.txt |   2 +
 .../cassandra/config/DatabaseDescriptor.java|  20 ++-
 .../org/apache/cassandra/io/util/Memory.java| 123 ++-
 .../cassandra/service/CassandraDaemon.java  |   2 +-
 .../cassandra/utils/FastByteComparisons.java|   6 +
 5 files changed, 145 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b25a63a8/CHANGES.txt
--
diff --cc CHANGES.txt
index 6a4c507,b1fade1..28278c4
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,33 -1,6 +1,35 @@@
 +2.1
 + * add listsnapshots command to nodetool (CASSANDRA-5742)
 + * Introduce AtomicBTreeColumns (CASSANDRA-6271)
 + * Multithreaded commitlog (CASSANDRA-3578)
 + * allocate fixed index summary memory pool and resample cold index summaries 
 +   to use less memory (CASSANDRA-5519)
 + * Removed multithreaded compaction (CASSANDRA-6142)
 + * Parallelize fetching rows for low-cardinality indexes (CASSANDRA-1337)
 + * change logging from log4j to logback (CASSANDRA-5883)
 + * switch to LZ4 compression for internode communication (CASSANDRA-5887)
 + * Stop using Thrift-generated Index* classes internally (CASSANDRA-5971)
 + * Remove 1.2 network compatibility code (CASSANDRA-5960)
 + * Remove leveled json manifest migration code (CASSANDRA-5996)
 + * Remove CFDefinition (CASSANDRA-6253)
 + * Use AtomicIntegerFieldUpdater in RefCountedMemory (CASSANDRA-6278)
 + * User-defined types for CQL3 (CASSANDRA-5590)
 + * Use of o.a.c.metrics in nodetool (CASSANDRA-5871, 6406)
 + * Batch read from OTC's queue and cleanup (CASSANDRA-1632)
 + * Secondary index support for collections (CASSANDRA-4511, 6383)
 + * SSTable metadata(Stats.db) format change (CASSANDRA-6356)
 + * Push composites support in the storage engine
 +   (CASSANDRA-5417, CASSANDRA-6520)
 + * Add snapshot space used to cfstats (CASSANDRA-6231)
 + * Add cardinality estimator for key count estimation (CASSANDRA-5906)
 + * CF id is changed to be non-deterministic. Data dir/key cache are created
 +   uniquely for CF id (CASSANDRA-5202)
 + * New counters implementation (CASSANDRA-6504)
 +
 +
  2.0.6
+  * Fix direct Memory on architectures that do not support unaligned long 
access
+(CASSANDRA-6628)
   * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
  
  

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b25a63a8/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
--
diff --cc src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index 2793237,bd5db69..378fa8a
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@@ -177,9 -177,9 +177,9 @@@ public class DatabaseDescripto
  /* evaluate the DiskAccessMode Config directive, which also affects 
indexAccessMode selection */
  if (conf.disk_access_mode == Config.DiskAccessMode.auto)
  {
- conf.disk_access_mode = 
System.getProperty(os.arch).contains(64) ? Config.DiskAccessMode.mmap : 
Config.DiskAccessMode.standard;
+ conf.disk_access_mode = hasLargeAddressSpace() ? 
Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard;
  indexAccessMode = conf.disk_access_mode;
 -logger.info(DiskAccessMode 'auto' determined to be  + 
conf.disk_access_mode + , indexAccessMode is  + indexAccessMode );
 +logger.info(DiskAccessMode 'auto' determined to be {}, 
indexAccessMode is {}, conf.disk_access_mode, indexAccessMode);
  }
  else if (conf.disk_access_mode == 
Config.DiskAccessMode.mmap_index_only)
  {
@@@ -1384,8 -1324,19 +1384,24 @@@
  }
  }
  
 +public static int getIndexSummaryResizeIntervalInMinutes()
 +{
 +return conf.index_summary_resize_interval_in_minutes;
 +}
++
+ public static boolean hasLargeAddressSpace()
+ {
+ // currently we just check if it's a 64bit arch, but any we only 
really care if the address space is large
+ String datamodel = System.getProperty(sun.arch.data.model);
+ if (datamodel != null)
+ {
+ switch (datamodel)
+   

[2/3] git commit: Fix direct Memory on architectures that do not support unaligned long access patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628

2014-02-03 Thread jbellis
Fix direct Memory on architectures that do not support unaligned long access
patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/039e9b9a
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/039e9b9a
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/039e9b9a

Branch: refs/heads/trunk
Commit: 039e9b9a18cbe78091231a4538b6d428deacc771
Parents: 066d00b
Author: Jonathan Ellis jbel...@apache.org
Authored: Mon Feb 3 23:50:08 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Mon Feb 3 23:50:08 2014 -0600

--
 CHANGES.txt |   2 +
 .../cassandra/config/DatabaseDescriptor.java|  20 ++-
 .../org/apache/cassandra/io/util/Memory.java| 123 ++-
 .../cassandra/service/CassandraDaemon.java  |   2 +-
 .../cassandra/utils/FastByteComparisons.java|   6 +
 5 files changed, 145 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 4440942..b1fade1 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,6 @@
 2.0.6
+ * Fix direct Memory on architectures that do not support unaligned long access
+   (CASSANDRA-6628)
  * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
 
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
--
diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java 
b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index 44d9d3a..bd5db69 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@ -172,12 +172,12 @@ public class DatabaseDescriptor
 }
 
 if (conf.commitlog_total_space_in_mb == null)
-conf.commitlog_total_space_in_mb = 
System.getProperty(os.arch).contains(64) ? 1024 : 32;
+conf.commitlog_total_space_in_mb = hasLargeAddressSpace() ? 1024 : 
32;
 
 /* evaluate the DiskAccessMode Config directive, which also affects 
indexAccessMode selection */
 if (conf.disk_access_mode == Config.DiskAccessMode.auto)
 {
-conf.disk_access_mode = 
System.getProperty(os.arch).contains(64) ? Config.DiskAccessMode.mmap : 
Config.DiskAccessMode.standard;
+conf.disk_access_mode = hasLargeAddressSpace() ? 
Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard;
 indexAccessMode = conf.disk_access_mode;
 logger.info(DiskAccessMode 'auto' determined to be  + 
conf.disk_access_mode + , indexAccessMode is  + indexAccessMode );
 }
@@ -1323,4 +1323,20 @@ public class DatabaseDescriptor
 throw new RuntimeException(e);
 }
 }
+
+public static boolean hasLargeAddressSpace()
+{
+// currently we just check if it's a 64bit arch, but any we only 
really care if the address space is large
+String datamodel = System.getProperty(sun.arch.data.model);
+if (datamodel != null)
+{
+switch (datamodel)
+{
+case 64: return true;
+case 32: return false;
+}
+}
+String arch = System.getProperty(os.arch);
+return arch.contains(64) || arch.contains(sparcv9);
+}
 }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/io/util/Memory.java
--
diff --git a/src/java/org/apache/cassandra/io/util/Memory.java 
b/src/java/org/apache/cassandra/io/util/Memory.java
index f276190..263205b 100644
--- a/src/java/org/apache/cassandra/io/util/Memory.java
+++ b/src/java/org/apache/cassandra/io/util/Memory.java
@@ -17,9 +17,10 @@
  */
 package org.apache.cassandra.io.util;
 
-import sun.misc.Unsafe;
+import java.nio.ByteOrder;
 
 import org.apache.cassandra.config.DatabaseDescriptor;
+import sun.misc.Unsafe;
 
 /**
  * An off-heap region of memory that must be manually free'd when no longer 
needed.
@@ -30,6 +31,16 @@ public class Memory
 private static final IAllocator allocator = 
DatabaseDescriptor.getoffHeapMemoryAllocator();
 private static final long BYTE_ARRAY_BASE_OFFSET = 
unsafe.arrayBaseOffset(byte[].class);
 
+private static final boolean bigEndian = 
ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
+private static final boolean unaligned;
+
+static
+{
+String arch = System.getProperty(os.arch);
+unaligned = 

[1/3] git commit: Fix direct Memory on architectures that do not support unaligned long access patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628

2014-02-03 Thread jbellis
Updated Branches:
  refs/heads/cassandra-2.0 066d00ba5 - 039e9b9a1
  refs/heads/trunk 78f71420c - b25a63a81


Fix direct Memory on architectures that do not support unaligned long access
patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/039e9b9a
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/039e9b9a
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/039e9b9a

Branch: refs/heads/cassandra-2.0
Commit: 039e9b9a18cbe78091231a4538b6d428deacc771
Parents: 066d00b
Author: Jonathan Ellis jbel...@apache.org
Authored: Mon Feb 3 23:50:08 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Mon Feb 3 23:50:08 2014 -0600

--
 CHANGES.txt |   2 +
 .../cassandra/config/DatabaseDescriptor.java|  20 ++-
 .../org/apache/cassandra/io/util/Memory.java| 123 ++-
 .../cassandra/service/CassandraDaemon.java  |   2 +-
 .../cassandra/utils/FastByteComparisons.java|   6 +
 5 files changed, 145 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 4440942..b1fade1 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,6 @@
 2.0.6
+ * Fix direct Memory on architectures that do not support unaligned long access
+   (CASSANDRA-6628)
  * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
 
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
--
diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java 
b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index 44d9d3a..bd5db69 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@ -172,12 +172,12 @@ public class DatabaseDescriptor
 }
 
 if (conf.commitlog_total_space_in_mb == null)
-conf.commitlog_total_space_in_mb = 
System.getProperty(os.arch).contains(64) ? 1024 : 32;
+conf.commitlog_total_space_in_mb = hasLargeAddressSpace() ? 1024 : 
32;
 
 /* evaluate the DiskAccessMode Config directive, which also affects 
indexAccessMode selection */
 if (conf.disk_access_mode == Config.DiskAccessMode.auto)
 {
-conf.disk_access_mode = 
System.getProperty(os.arch).contains(64) ? Config.DiskAccessMode.mmap : 
Config.DiskAccessMode.standard;
+conf.disk_access_mode = hasLargeAddressSpace() ? 
Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard;
 indexAccessMode = conf.disk_access_mode;
 logger.info(DiskAccessMode 'auto' determined to be  + 
conf.disk_access_mode + , indexAccessMode is  + indexAccessMode );
 }
@@ -1323,4 +1323,20 @@ public class DatabaseDescriptor
 throw new RuntimeException(e);
 }
 }
+
+public static boolean hasLargeAddressSpace()
+{
+// currently we just check if it's a 64bit arch, but any we only 
really care if the address space is large
+String datamodel = System.getProperty(sun.arch.data.model);
+if (datamodel != null)
+{
+switch (datamodel)
+{
+case 64: return true;
+case 32: return false;
+}
+}
+String arch = System.getProperty(os.arch);
+return arch.contains(64) || arch.contains(sparcv9);
+}
 }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/io/util/Memory.java
--
diff --git a/src/java/org/apache/cassandra/io/util/Memory.java 
b/src/java/org/apache/cassandra/io/util/Memory.java
index f276190..263205b 100644
--- a/src/java/org/apache/cassandra/io/util/Memory.java
+++ b/src/java/org/apache/cassandra/io/util/Memory.java
@@ -17,9 +17,10 @@
  */
 package org.apache.cassandra.io.util;
 
-import sun.misc.Unsafe;
+import java.nio.ByteOrder;
 
 import org.apache.cassandra.config.DatabaseDescriptor;
+import sun.misc.Unsafe;
 
 /**
  * An off-heap region of memory that must be manually free'd when no longer 
needed.
@@ -30,6 +31,16 @@ public class Memory
 private static final IAllocator allocator = 
DatabaseDescriptor.getoffHeapMemoryAllocator();
 private static final long BYTE_ARRAY_BASE_OFFSET = 
unsafe.arrayBaseOffset(byte[].class);
 
+private static final boolean bigEndian = 
ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
+private static final 

[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-02-03 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890421#comment-13890421
 ] 

Lyuben Todorov commented on CASSANDRA-6572:
---

I thought it would make more sense to have this kind of functionality in 
StorageProxy, but it makes sense to keep it simple by only coding in QP and it 
will be better if someone else wishes to extend the functionality of the 
workload recording. 

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.6


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890422#comment-13890422
 ] 

Jonathan Ellis commented on CASSANDRA-5263:
---

Why don't we just pick the depth that covers the appropriate number of 
partitions/hashes?  2**16 = 64K, 2**17 = 128K, etc.

 Allow Merkle tree maximum depth to be configurable
 --

 Key: CASSANDRA-5263
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Affects Versions: 1.1.9
Reporter: Ahmed Bashir
Assignee: Minh Do

 Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  
 This value should be configurable, just like phi_convict_treshold and other 
 properties.
 Given a cluster with nodes responsible for a large number of row keys, Merkle 
 tree comparisons can result in a large amount of unnecessary row keys being 
 streamed.
 Empirical testing indicates that reasonable changes to this depth (18, 20, 
 etc) don't affect the Merkle tree generation and differencing timings all 
 that much, and they can significantly reduce the amount of data being 
 streamed during repair. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6472) Node hangs when Drop Keyspace / Table is executed

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6472:
--

Priority: Minor  (was: Major)
Assignee: Mikhail Stepura  (was: Benedict)

[~mishail] can you shed any light on the cqlsh hang?

 Node hangs when Drop Keyspace / Table is executed
 -

 Key: CASSANDRA-6472
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6472
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: amorton
Assignee: Mikhail Stepura
Priority: Minor
 Fix For: 2.1


 from http://www.mail-archive.com/user@cassandra.apache.org/msg33566.html
 CommitLogSegmentManager.flushDataFrom() returns a FutureTask to wait on the 
 flushes, but the task is not started in flushDataFrom(). 
 The CLSM manager thread does not use the result and forceRecycleAll 
 (eventually called when making schema mods) does not start it so hangs when 
 calling get().
 plan to patch so flushDataFrom() returns a Future. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (CASSANDRA-5549) Remove Table.switchLock

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-5549.
---

Resolution: Fixed
  Reviewer: Jonathan Ellis

 Remove Table.switchLock
 ---

 Key: CASSANDRA-5549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Benedict
  Labels: performance
 Fix For: 2.1

 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png


 As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write 
 path.  ReentrantReadWriteLock is not lightweight, even if there is no 
 contention per se between readers and writers of the lock (in Cassandra, 
 memtable updates and switches).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-5549) Remove Table.switchLock

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-5549:
--

Issue Type: Improvement  (was: Bug)

 Remove Table.switchLock
 ---

 Key: CASSANDRA-5549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Benedict
  Labels: performance
 Fix For: 2.1

 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png


 As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write 
 path.  ReentrantReadWriteLock is not lightweight, even if there is no 
 contention per se between readers and writers of the lock (in Cassandra, 
 memtable updates and switches).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (CASSANDRA-6594) CqlRecordWriter marked final

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-6594.
---

   Resolution: Fixed
Fix Version/s: 2.1
 Reviewer: Jonathan Ellis
 Assignee: Luca Rosellini

SGTM; committed

 CqlRecordWriter marked final
 

 Key: CASSANDRA-6594
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6594
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Luca Rosellini
Assignee: Luca Rosellini
  Labels: CQL3, HADOOP
 Fix For: 2.1

 Attachments: CqlRecordWriter.diff


 We have an use case in which we need a custom implementation of 
 CqlRecordWriter. It would be nice to have an extensible version of it (it 
 would save us the pain of replicating upstream changes). 
 See attached patch. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


git commit: make CqlRecordWriter extensible

2014-02-03 Thread jbellis
Updated Branches:
  refs/heads/trunk b25a63a81 - 0842681e2


make CqlRecordWriter extensible


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0842681e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0842681e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0842681e

Branch: refs/heads/trunk
Commit: 0842681e214229b5d83283574911d70b5b050586
Parents: b25a63a
Author: Jonathan Ellis jbel...@apache.org
Authored: Tue Feb 4 00:07:30 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Tue Feb 4 00:07:30 2014 -0600

--
 .../apache/cassandra/hadoop/cql3/CqlRecordWriter.java | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/0842681e/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java
--
diff --git a/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java 
b/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java
index 27d1c70..e354ad6 100644
--- a/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java
+++ b/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java
@@ -59,21 +59,21 @@ import org.apache.thrift.transport.TTransport;
  *
  * @see CqlOutputFormat
  */
-final class CqlRecordWriter extends 
AbstractColumnFamilyRecordWriterMapString, ByteBuffer, ListByteBuffer
+class CqlRecordWriter extends AbstractColumnFamilyRecordWriterMapString, 
ByteBuffer, ListByteBuffer
 {
 private static final Logger logger = 
LoggerFactory.getLogger(CqlRecordWriter.class);
 
 // handles for clients for each range running in the threadpool
-private final MapRange, RangeClient clients;
+protected final MapRange, RangeClient clients;
 
 // host to prepared statement id mappings
-private ConcurrentHashMapCassandra.Client, Integer preparedStatements = 
new ConcurrentHashMapCassandra.Client, Integer();
+protected final ConcurrentHashMapCassandra.Client, Integer 
preparedStatements = new ConcurrentHashMapCassandra.Client, Integer();
 
-private final String cql;
+protected final String cql;
 
-private AbstractType? keyValidator;
-private String [] partitionKeyColumns;
-private ListString clusterColumns;
+protected AbstractType? keyValidator;
+protected String [] partitionKeyColumns;
+protected ListString clusterColumns;
 
 /**
  * Upon construction, obtain the map that this writer will use to collect



[jira] [Commented] (CASSANDRA-6157) Selectively Disable hinted handoff for a data center

2014-02-03 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890427#comment-13890427
 ] 

Lyuben Todorov commented on CASSANDRA-6157:
---

[~kohlisankalp] Still working on this?

 Selectively Disable hinted handoff for a data center
 

 Key: CASSANDRA-6157
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6157
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: sankalp kohli
Priority: Minor
 Fix For: 2.0.6

 Attachments: trunk-6157-v2.diff, trunk-6157-v3.diff, 
 trunk-6157-v4.diff, trunk-6157.txt


 Cassandra supports disabling the hints or reducing the window for hints. 
 It would be helpful to have a switch which stops hints to a down data center 
 but continue hints to other DCs.
 This is helpful during data center fail over as hints will put more 
 unnecessary pressure on the DC taking double traffic. Also since now 
 Cassandra is under reduced reduncany, we don't want to disable hints within 
 the DC. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6568) sstables incorrectly getting marked as not live

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6568:
--

Fix Version/s: 2.0.6
   1.2.15

 sstables incorrectly getting marked as not live
 -

 Key: CASSANDRA-6568
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6568
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.12 with several 1.2.13 patches
Reporter: Chris Burroughs
Assignee: Marcus Eriksson
 Fix For: 1.2.15, 2.0.6


 {noformat}
 -rw-rw-r-- 14 cassandra cassandra 1.4G Nov 25 19:46 
 /data/sstables/data/ks/cf/ks-cf-ic-402383-Data.db
 -rw-rw-r-- 14 cassandra cassandra  13G Nov 26 00:04 
 /data/sstables/data/ks/cf/ks-cf-ic-402430-Data.db
 -rw-rw-r-- 14 cassandra cassandra  13G Nov 26 05:03 
 /data/sstables/data/ks/cf/ks-cf-ic-405231-Data.db
 -rw-rw-r-- 31 cassandra cassandra  21G Nov 26 08:38 
 /data/sstables/data/ks/cf/ks-cf-ic-405232-Data.db
 -rw-rw-r--  2 cassandra cassandra 2.6G Dec  3 13:44 
 /data/sstables/data/ks/cf/ks-cf-ic-434662-Data.db
 -rw-rw-r-- 14 cassandra cassandra 1.5G Dec  5 09:05 
 /data/sstables/data/ks/cf/ks-cf-ic-438698-Data.db
 -rw-rw-r--  2 cassandra cassandra 3.1G Dec  6 12:10 
 /data/sstables/data/ks/cf/ks-cf-ic-440983-Data.db
 -rw-rw-r--  2 cassandra cassandra  96M Dec  8 01:52 
 /data/sstables/data/ks/cf/ks-cf-ic-444041-Data.db
 -rw-rw-r--  2 cassandra cassandra 3.3G Dec  9 16:37 
 /data/sstables/data/ks/cf/ks-cf-ic-451116-Data.db
 -rw-rw-r--  2 cassandra cassandra 876M Dec 10 11:23 
 /data/sstables/data/ks/cf/ks-cf-ic-453552-Data.db
 -rw-rw-r--  2 cassandra cassandra 891M Dec 11 03:21 
 /data/sstables/data/ks/cf/ks-cf-ic-454518-Data.db
 -rw-rw-r--  2 cassandra cassandra 102M Dec 11 12:27 
 /data/sstables/data/ks/cf/ks-cf-ic-455429-Data.db
 -rw-rw-r--  2 cassandra cassandra 906M Dec 11 23:54 
 /data/sstables/data/ks/cf/ks-cf-ic-455533-Data.db
 -rw-rw-r--  1 cassandra cassandra 214M Dec 12 05:02 
 /data/sstables/data/ks/cf/ks-cf-ic-456426-Data.db
 -rw-rw-r--  1 cassandra cassandra 203M Dec 12 10:49 
 /data/sstables/data/ks/cf/ks-cf-ic-456879-Data.db
 -rw-rw-r--  1 cassandra cassandra  49M Dec 12 12:03 
 /data/sstables/data/ks/cf/ks-cf-ic-456963-Data.db
 -rw-rw-r-- 18 cassandra cassandra  20G Dec 25 01:09 
 /data/sstables/data/ks/cf/ks-cf-ic-507770-Data.db
 -rw-rw-r--  3 cassandra cassandra  12G Jan  8 04:22 
 /data/sstables/data/ks/cf/ks-cf-ic-567100-Data.db
 -rw-rw-r--  3 cassandra cassandra 957M Jan  8 22:51 
 /data/sstables/data/ks/cf/ks-cf-ic-569015-Data.db
 -rw-rw-r--  2 cassandra cassandra 923M Jan  9 17:04 
 /data/sstables/data/ks/cf/ks-cf-ic-571303-Data.db
 -rw-rw-r--  1 cassandra cassandra 821M Jan 10 08:20 
 /data/sstables/data/ks/cf/ks-cf-ic-574642-Data.db
 -rw-rw-r--  1 cassandra cassandra  18M Jan 10 08:48 
 /data/sstables/data/ks/cf/ks-cf-ic-574723-Data.db
 {noformat}
 I tried to do a user defined compaction on sstables from November and got it 
 is not an active sstable.  Live sstable count from jmx was about 7 while on 
 disk there were over 20.  Live vs total size showed about a ~50 GiB 
 difference.
 Forcing a gc from jconsole had no effect.  However, restarting the node 
 resulted in live sstables/bytes *increasing* to match what was on disk.  User 
 compaction could now compact the November sstables.  This cluster was last 
 restarted in mid December.
 I'm not sure what affect not live had on other operations of the cluster.  
 From the logs it seems that the files were sent at least at some point as 
 part of repair, but I don't know if they were being being used for read 
 requests or not.  Because the problem that got me looking in the first place 
 was poor performance I suspect they were  used for reads (and the reads were 
 slow because so many sstables were being read).  I presume based on their age 
 at the least they were being excluded from compaction.
 I'm not aware of any isLive() or getRefCount() to problematically confirm 
 which nodes have this problem.  In this cluster almost all columns have a 14 
 day TTL, based on the number of nodes with November sstables it appears to be 
 occurring on a significant fraction of the nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6557) CommitLogSegment may be duplicated in unlikely race scenario

2014-02-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6557:
--

Priority: Minor  (was: Major)

 CommitLogSegment may be duplicated in unlikely race scenario
 

 Key: CASSANDRA-6557
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6557
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 2.1
Reporter: Benedict
Priority: Minor
 Fix For: 2.1


 In the unlikely event that the thread that switched to a new CLS has not 
 finished executing the cleanup of its switch by the time the CLS has finished 
 being used, it is possible for the same segment to be 'switched' in again. 
 This would be benign except that it is added to the activeSegments queue a 
 second time also, which would permit it to be recycled twice, creating two 
 different CLS objects in memory pointing to the same CLS on disk, after which 
 all bets are off.
 The issue is highly unlikely to occur, but highly unlikely means it will 
 probably happen eventually. I've fixed this based on my patch for 
 CASSANDRA-5549, using the NonBlockingQueue I introduce there to simplify the 
 logic and make it more obviously correct.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890436#comment-13890436
 ] 

Jonathan Ellis commented on CASSANDRA-6572:
---

Do you have any strong feelings here [~iamaleksey] on the QP/SP divide?

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.6


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890438#comment-13890438
 ] 

Yuki Morishita commented on CASSANDRA-5351:
---

bq. Dropping sstable to UNREPAIRED during major compaction means that all 
repaired data status is cleared for the node.

That's what I meant. Current major compaction produces one SSTable and I think 
changing that behavior would confuse users, maybe. My opinion is to keep it as 
is, but .

Additional review comments:

* Does PrepareMessage needs to carry around dataCenters? Only coordinator sends 
out messages so I think you can drop it(also from ParentRepairSession).
* CF ID is preferred to use over Keyspace name/CF name pair.
* PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't 
one message per replica node enough?
* I think we need clean up for parentRepairSessions when something bad 
happened. Otherwise ParentRepairSession in the map keep reference to SSTables.

I just worked on the first one above and the commit is here(on top of your 
branch): 
https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361


 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890438#comment-13890438
 ] 

Yuki Morishita edited comment on CASSANDRA-5351 at 2/4/14 6:20 AM:
---

bq. Dropping sstable to UNREPAIRED during major compaction means that all 
repaired data status is cleared for the node.

That's what I meant. Current major compaction produces one SSTable and I think 
changing that behavior would confuse users, maybe. My opinion is to keep it as 
is .

Additional review comments:

* Does PrepareMessage needs to carry around dataCenters? Only coordinator sends 
out messages so I think you can drop it(also from ParentRepairSession).
* CF ID is preferred to use over Keyspace name/CF name pair.
* PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't 
one message per replica node enough?
* I think we need clean up for parentRepairSessions when something bad 
happened. Otherwise ParentRepairSession in the map keep reference to SSTables.

I just worked on the first one above and the commit is here(on top of your 
branch): 
https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361



was (Author: yukim):
bq. Dropping sstable to UNREPAIRED during major compaction means that all 
repaired data status is cleared for the node.

That's what I meant. Current major compaction produces one SSTable and I think 
changing that behavior would confuse users, maybe. My opinion is to keep it as 
is, but .

Additional review comments:

* Does PrepareMessage needs to carry around dataCenters? Only coordinator sends 
out messages so I think you can drop it(also from ParentRepairSession).
* CF ID is preferred to use over Keyspace name/CF name pair.
* PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't 
one message per replica node enough?
* I think we need clean up for parentRepairSessions when something bad 
happened. Otherwise ParentRepairSession in the map keep reference to SSTables.

I just worked on the first one above and the commit is here(on top of your 
branch): 
https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361


 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890447#comment-13890447
 ] 

Jonathan Ellis commented on CASSANDRA-5351:
---

bq. Dropping sstable to UNREPAIRED during major compaction means that all 
repaired data status is cleared for the node. Maybe we could make major 
compaction do 2 separate compactions? Ending up with 2 sstables should be fine 
for users right?

I think this is a better approach than stomping on the repair information.  
People major compact to free up disk space or improve read performance; either 
way, having a small amount of data in an unrepaired sandbox should be 
acceptable.  (If I am wrong, we can add a utility to clear repaired flags, or 
add a flag to compact to treat everything as unrepaired...  but I'd rather not 
add this complexity unless we see a clear demand for it.)

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable

2014-02-03 Thread Minh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890449#comment-13890449
 ] 

Minh Do commented on CASSANDRA-5263:


If I understand correctly, are you saying that if N is the total number of rows 
in all SSTables on a node for a given token range, then depth = logN with log 
base 2?  This works if a node does not hold too many rows.  Can we safely 
assume that a node does not hold more than 2^24 rows (or 16.7M rows)? Because 
for this many rows, we need to build a Merkle tree with depth 24 and requires 
about 1.6G of heap.  Beyond this number, I would say we run into memory heap 
allocation issue.  I was thinking earlier that depth 20 is the maximum 
allowable depth and I worked my way down to compute lower depth tree.   


 Allow Merkle tree maximum depth to be configurable
 --

 Key: CASSANDRA-5263
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Affects Versions: 1.1.9
Reporter: Ahmed Bashir
Assignee: Minh Do

 Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  
 This value should be configurable, just like phi_convict_treshold and other 
 properties.
 Given a cluster with nodes responsible for a large number of row keys, Merkle 
 tree comparisons can result in a large amount of unnecessary row keys being 
 streamed.
 Empirical testing indicates that reasonable changes to this depth (18, 20, 
 etc) don't affect the Merkle tree generation and differencing timings all 
 that much, and they can significantly reduce the amount of data being 
 streamed during repair. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable

2014-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890451#comment-13890451
 ] 

Jonathan Ellis commented on CASSANDRA-5263:
---

No, we can't assume that, but capping it at 20 is certainly better than capping 
it at 16 as it does now.

 Allow Merkle tree maximum depth to be configurable
 --

 Key: CASSANDRA-5263
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Affects Versions: 1.1.9
Reporter: Ahmed Bashir
Assignee: Minh Do

 Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  
 This value should be configurable, just like phi_convict_treshold and other 
 properties.
 Given a cluster with nodes responsible for a large number of row keys, Merkle 
 tree comparisons can result in a large amount of unnecessary row keys being 
 streamed.
 Empirical testing indicates that reasonable changes to this depth (18, 20, 
 etc) don't affect the Merkle tree generation and differencing timings all 
 that much, and they can significantly reduce the amount of data being 
 streamed during repair. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)