date:20140203

Updated Tags:  refs/tags/2.0.5-tentative [deleted] 0191b359f

Git Push Summary

Updated Tags:  refs/tags/2.0.5-tentative [created] b71372146

[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

[
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889330#comment-13889330
]

Marcus Eriksson commented on CASSANDRA-6364:

About the ignore case, lets hard code something for now - rate limit at one log
error message per second perhaps?

I don't think we should default to 'ignore' in Config.java - if someone does a
minor upgrade they most likely wont check NEWS or update their config files to
add the new parameter.

The shipped config in cassandra.yaml looks wrong, should be
commit_failure_policy, not disk_failure_policy I guess

There should be different disk_failure_policies for data and commit volumes
or commit volume failure should always cause node exit
--

Key: CASSANDRA-6364
URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
Project: Cassandra
Issue Type: Improvement
Components: Core
Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
Fix For: 2.0.5

We're doing fault testing on a pre-production Cassandra cluster. One of the
tests was to simulation failure of the commit volume/disk, which in our case
is on a dedicated disk. We expected failure of the commit volume to be
handled somehow, but what we found was that no action was taken by Cassandra
when the commit volume fail. We simulated this simply by pulling the
physical disk that backed the commit volume, which resulted in filesystem I/O
errors on the mount point.
What then happened was that the Cassandra Heap filled up to the point that it
was spending 90% of its time doing garbage collection. No errors were logged
in regards to the failed commit volume. Gossip on other nodes in the cluster
eventually flagged the node as down. Gossip on the local node showed itself
as up, and all other nodes as down.
The most serious problem was that connections to the coordinator on this node
became very slow due to the on-going GC, as I assume uncommitted writes piled
up on the JVM heap. What we believe should have happened is that Cassandra
should have caught the I/O error and exited with a useful log message, or
otherwise done some sort of useful cleanup. Otherwise the node goes into a
sort of Zombie state, spending most of its time in GC, and thus slowing down
any transactions that happen to use the coordinator on said node.
A limit on in-memory, unflushed writes before refusing requests may also
work. Point being, something should be done to handle the commit volume
dying as doing nothing results in affecting the entire cluster. I should
note, we are using: disk_failure_policy: best_effort

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2014-02-03 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889343#comment-13889343
]

Benedict commented on CASSANDRA-6364:
-

bq. I don't think we should default to 'ignore' in Config.java

Well, I wasn't too sure about this. On the one hand switching the default to
stop means we could over cautiously kill user's hosts unexpectedly, maybe
resulting in interruption of service (especially, say, our users running on
SAN, as much as it is strongly discouraged). Whereas switching to ignore
means we may not be durable. Neither are great defaults, but both are better
than before. I'm comfortable with both, so if you feel strongly it should be
stop, I'll happily switch it. Perhaps I lean slightly in favour of it too,
but it depends on if the user favours durability over availability, really, so
there doesn't seem a single correct answer to me. Note that the default
disk_failure_policy is also ignore, and the prior behaviour was closest to
ignore, so introducing a default that results in a failing node is somewhat
unprecedented for disk failure.

bq. The shipped config in cassandra.yaml looks wrong, should be
commit_failure_policy, not disk_failure_policy I guess

Right, looks like I didn't update the first or last lines I copy-pasted.
Thanks.

bq. About the ignore case, lets hard code something for now - rate limit at one
log error message per second perhaps?

If we're just rate limiting the log messages, I'd say one per minute might be
better. But I'm not sure having the threads spin trying to make progress is
useful. The PCLES, for instance, will just start burning one core until it can
successfully sync, assuming it doesn't actually have to wait each time to
encounter the error. Tempted to have a 1s pause after an error during which we
just sleep the erroring thread.

Another issue that slightly concerns me is what happens if the CLES sync()
starts failing, but the append and CLA doesn't. With ignore this could
potentially result in us mapping in and allocating huge amounts of disk space,
but not being able to sync or clear it. This might either result in lots of
swapping, and/or us exceeding by a large margin our max log space goal. Since
we never guarantee to keep to this I'm not sure how much of a problem it would
be, but an error down to ACLs that stops us syncing one file might potentally
end up eating up huge quantities of commit disk space. I'm tempted to have the
CLA thread block once it hits twice its goal max space (or maybe introduce a
second config parameter for a hard maximum). But I'm also tempted to leave
these changes for the 2.1 branch, since it's a fairly specific failure case,
and what we have is a big improvement over the current state of affairs.

There should be different disk_failure_policies for data and commit volumes
or commit volume failure should always cause node exit
--

[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit

2014-02-03 Thread Dmitry Shohov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889346#comment-13889346
 ] 

Dmitry Shohov commented on CASSANDRA-6628:
--

I checked your patch and jvm doesn't crash

I also did some performance tests. I don't know the usage pattern for 
comparator, but my simple tests show that my changes to comparer would make it 
slower than pure java implementation :) I fully agree that it's better to use 
java implementation on solaris sparcv9, than change unsafe implementation

 Cassandra crashes on Solaris sparcv9 using java 64bit
 -

 Key: CASSANDRA-6628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: checked 1.2.x line and 2.0.x
Reporter: Dmitry Shohov
Assignee: Dmitry Shohov
 Fix For: 2.0.5

 Attachments: solaris_unsafe_fix.patch, tmp.patch


 When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, 
 JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid.
 The reason for this crash is some memory allignment related problems and 
 incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, 
 you will see that it checks os.arch before using getLong methods.
 I have a patch, which check for the os.arch and if it is not one of the 
 known, it reads longs and ints byte by byte.
 Although patch fixes the problem in cassandra, it will still crash without 
 similar fixes in the lz4 library. I already provided the patch for Unsafe 
 usage in lz4.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit

2014-02-03 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889349#comment-13889349
]

Benedict commented on CASSANDRA-6628:
-

bq. my changes to comparer would make it slower than pure java implementation

This isn't very surprising given what they were doing, but always good to have
the confirmation :-)

It should be possible to make a special unsafe comparer tailored for sparcv9
(and any other aligned access only architectures) that is quite a bit faster,
in the manner I mention above, but it's not something we're likely to consider
a priority in the near future. As always feel free to have a crack at it
yourself and submit, I'd be more than happy to review.

Cassandra crashes on Solaris sparcv9 using java 64bit
-

Key: CASSANDRA-6628
URL: https://issues.apache.org/jira/browse/CASSANDRA-6628
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: checked 1.2.x line and 2.0.x
Reporter: Dmitry Shohov
Assignee: Dmitry Shohov
Fix For: 2.0.5

Attachments: solaris_unsafe_fix.patch, tmp.patch

When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit,
JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid.
The reason for this crash is some memory allignment related problems and
incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk,
you will see that it checks os.arch before using getLong methods.
I have a patch, which check for the os.arch and if it is not one of the
known, it reads longs and ints byte by byte.
Although patch fixes the problem in cassandra, it will still crash without
similar fixes in the lz4 library. I already provided the patch for Unsafe
usage in lz4.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit

2014-02-03 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889356#comment-13889356
 ] 

Benedict commented on CASSANDRA-6628:
-

This patch is ready for commit.

 Cassandra crashes on Solaris sparcv9 using java 64bit
 -

 Key: CASSANDRA-6628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: checked 1.2.x line and 2.0.x
Reporter: Dmitry Shohov
Assignee: Dmitry Shohov
 Fix For: 2.0.5

 Attachments: solaris_unsafe_fix.patch, tmp.patch


 When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, 
 JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid.
 The reason for this crash is some memory allignment related problems and 
 incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, 
 you will see that it checks os.arch before using getLong methods.
 I have a patch, which check for the os.arch and if it is not one of the 
 known, it reads longs and ints byte by byte.
 Although patch fixes the problem in cassandra, it will still crash without 
 similar fixes in the lz4 library. I already provided the patch for Unsafe 
 usage in lz4.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family

2014-02-03 Thread Sergio Bossa (JIRA)

Sergio Bossa created CASSANDRA-6645:
---

 Summary: upgradesstables causes NPE for secondary indexes without 
an underlying column family
 Key: CASSANDRA-6645
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa


SecondaryIndex#getIndexCfs is allowed to return null by contract, if the index 
is not backed by a column family, but this causes an NPE as 
StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do not 
check for null values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family

2014-02-03 Thread Sergio Bossa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Bossa updated CASSANDRA-6645:


Attachment: CASSANDRA-6645.patch

 upgradesstables causes NPE for secondary indexes without an underlying column 
 family
 

 Key: CASSANDRA-6645
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Attachments: CASSANDRA-6645.patch


 SecondaryIndex#getIndexCfs is allowed to return null by contract, if the 
 index is not backed by a column family, but this causes an NPE as 
 StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do 
 not check for null values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Git Push Summary

Updated Tags:  refs/tags/cassandra-1.2.14 [created] f9bef16ae

Git Push Summary

Updated Tags:  refs/tags/1.2.14-tentative [deleted] 6a9314408

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494
]

Marcus Eriksson commented on CASSANDRA-5351:

More complete version now pushed to
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the
repair (if full repair, all sstables are included) otherwise only the ones with
repairedAt set to 0. Note that we don't do any locking of the sstables, if they
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use
the new repairs.
* anticompaction
- Split an sstable in 2 new ones. One sstable with all keys that were in the
repaired ranges and one with unrepaired data.
- If the repaired ranges cover the entire sstable, we rewrite sstable
metadata. This means that the optimal way to run incremental repairs is to not
do partitioner range repairs etc.
* Compaction
* LCS
- We always first check if there are any unrepaired sstables to do STCS on,
if there is, we do that. Reasoning being that new data (which needs compaction)
is unrepaired.
- We keep all sstables in the LeveledManifest, then filter out the
unrepaired ones when getting compaction candidates etc.
* STCS
- Major compaction is done by taking the biggest set of sstables - so for a
total major compaction, you will need to run nodetool compact twice.
- Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair
new sstable is not repaired.
* Upgradesstables - Keep repaired status

Avoid repairing already-repaired data by default

Key: CASSANDRA-5351
URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
Project: Cassandra
Issue Type: Task
Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
Labels: repair
Fix For: 2.1

Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log,
5351_nodetool.log

Repair has always built its merkle tree from all the data in a columnfamily,
which is guaranteed to work but is inefficient.
We can improve this by remembering which sstables have already been
successfully repaired, and only repairing sstables new since the last repair.
(This automatically makes CASSANDRA-3362 much less of a problem too.)
The tricky part is, compaction will (if not taught otherwise) mix repaired
data together with non-repaired. So we should segregate unrepaired sstables
from the repaired ones.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default

[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494
]

Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:18 PM:

More complete version now pushed to
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use
the new repairs.
* anticompaction
** Split an sstable in 2 new ones. One sstable with all keys that were in the
repaired ranges and one with unrepaired data.
** If the repaired ranges cover the entire sstable, we rewrite sstable
metadata. This means that the optimal way to run incremental repairs is to not
do partitioner range repairs etc.
* LCS
** We always first check if there are any unrepaired sstables to do STCS on,
if there is, we do that. Reasoning being that new data (which needs compaction)
is unrepaired.
** We keep all sstables in the LeveledManifest, then filter out the
unrepaired ones when getting compaction candidates etc.
* STCS
** Major compaction is done by taking the biggest set of sstables - so for a
total major compaction, you will need to run nodetool compact twice.
** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair
new sstable is not repaired.
* Upgradesstables - Keep repaired status

was (Author: krummas):
More complete version now pushed to
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Avoid repairing already-repaired data by default

Key:

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889498#comment-13889498
 ] 

Marcus Eriksson commented on CASSANDRA-5351:


if its unclear, purpose of the Prepare step is that we want to do 
anticompaction on all ranges involved in the repair, if we did not do that, we 
would have to anticompact 3 times for a nodetool repair with RF=3 (and 3*256 
times with 256 vnodes (i think))

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default

[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494
]

Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:30 PM:

More complete version now pushed to
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use
the new repairs.
* anticompaction
** Split an sstable in 2 new ones. One sstable with all keys that were in the
repaired ranges and one with unrepaired data.
** If the repaired ranges cover the entire sstable, we rewrite sstable
metadata. This means that the optimal way to run incremental repairs is to not
do partitioner range repairs etc.
* LCS
** We always first check if there are any unrepaired sstables to do STCS on,
if there is, we do that. Reasoning being that new data (which needs compaction)
is unrepaired.
** We keep all sstables in the LeveledManifest, then filter out the
unrepaired ones when getting compaction candidates etc.
* STCS
** Major compaction is done by taking the biggest set of sstables - so for a
total major compaction, you will need to run nodetool compact twice.
** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during scrub
new sstable is not repaired.
* Upgradesstables - Keep repaired status

was (Author: krummas):
More complete version now pushed to
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use
the new repairs.
* anticompaction
** Split an sstable in 2 new ones. One sstable with all keys that were in the
repaired ranges and one with unrepaired data.
** If the repaired ranges cover the entire sstable, we rewrite sstable
metadata. This means that the optimal way to run incremental repairs is to not
do partitioner range repairs etc.
* LCS
** We always first check if there are any unrepaired sstables to do STCS on,
if there is, we do that. Reasoning being that new data (which needs compaction)
is unrepaired.
** We keep all sstables in the LeveledManifest, then filter out the
unrepaired ones when getting compaction candidates etc.
* STCS
** Major compaction is done by taking the biggest set of sstables - so for a
total major compaction, you will need to run nodetool compact twice.
** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair
new sstable is not repaired.
* Upgradesstables - Keep repaired status

Avoid repairing already-repaired data by default

Key: CASSANDRA-5351

[jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address

2014-02-03 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889546#comment-13889546
 ] 

Brandon Williams commented on CASSANDRA-6622:
-

Can you attach logs from both the replacing node and the node that is failing 
the stream session?

 Streaming session failures during node replace using replace_address
 

 Key: CASSANDRA-6622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
 Project: Cassandra
  Issue Type: Bug
 Environment: RHEL6, cassandra-2.0.4
Reporter: Ravi Prasad
Assignee: Brandon Williams
 Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 
 6622-2.0.txt


 When using replace_address, Gossiper ApplicationState is set to hibernate, 
 which is a down state. We are seeing that the peer nodes are seeing streaming 
 plan request even before the Gossiper on them marks the replacing node as 
 dead. As a result, streaming on peer nodes convicts the replacing node by 
 closing the stream handler.  
 I think, making the StorageService thread on the replacing node, sleep for 
 BROADCAST_INTERVAL before bootstrapping, would avoid this scenario.
 Relevant logs from peer node (see that the Gossiper on peer node mark the 
 replacing node as down, 2 secs after  the streaming init request):
 {noformat}
  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 
 StreamResultFuture.java (line 116) [Stream 
 #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
 
  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is 
 complete
  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) 
 InetAddress /x.x.x.x is now DOWN
 ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 
 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
 java.lang.RuntimeException: Outgoing stream handler has been closed
 at 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
 at 
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
 at 
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
 at java.lang.Thread.run(Thread.java:722)
  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with 
 /x.x.x.x is complete
  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (CASSANDRA-4851) CQL3: improve support for paginating over composites

2014-02-03 Thread Sylvain Lebresne (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4851:


Attachment: 4851.txt

Attaching rather simple patch for this. 

The patch implements the tuple/vector syntax described above (so {{WHERE (c1, 
c2)  (1, 0)}} typically) as that's the easier and imo the most natural syntax 
anyway when you want to do such a thing.

 CQL3: improve support for paginating over composites
 

 Key: CASSANDRA-4851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 4851.txt


 Consider the following table:
 {noformat}
 CREATE TABLE test (
 k int,
 c1 int,
 c2 int,
 PRIMARY KEY (k, c1, c2)
 )
 {noformat}
 with the following data:
 {noformat}
 k | c1 | c2
 
 0 | 0  | 0
 0 | 0  | 1
 0 | 1  | 0
 0 | 1  | 1
 {noformat}
 Currently, CQL3 allows to slice over either c1 or c2:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND c1  0 AND c1  2
 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2  0 AND c2  2
 {noformat}
 but you cannot express a query that return the 3 last records. Indeed, for 
 that you would need to do a query like say:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2  0) OR c2  0)
 {noformat}
 but we don't support that.
 This can make it hard to paginate over say all records for {{k = 0}} (I'm 
 saying can because if the value for c2 cannot be very large, an easy 
 workaround could be to paginate by entire value of c1, which you can do).
 For the case where you only paginate to avoid OOMing on a query, 
 CASSANDRA-4415 will that and is probably the best solution. However, there 
 may be case where the pagination is say user (as in, the user of your 
 application) triggered.
 I note that one solution would be to add the OR support at least in case like 
 the one above. That's definitively doable but on the other side, we won't be 
 able to support full-blown OR, so it may not be very natural that we support 
 seemingly random combination of OR and not others.
 Another solution would be to allow the following syntax:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND (c1, c2)  (0, 0)
 {noformat}
 which would literally mean that you want records where the values of c1 and 
 c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This 
 is less SQL-like (though maybe some SQL store have that, it's a fairly thing 
 to have imo?), but would be much simpler to implement and probably to use too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites

2014-02-03 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889582#comment-13889582
 ] 

Sylvain Lebresne commented on CASSANDRA-4851:
-

For info, I pushed a dtest too: 
https://github.com/riptano/cassandra-dtest/commit/da2fb8451b465299c095b320fbfc83c90467a49b

 CQL3: improve support for paginating over composites
 

 Key: CASSANDRA-4851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 4851.txt


 Consider the following table:
 {noformat}
 CREATE TABLE test (
 k int,
 c1 int,
 c2 int,
 PRIMARY KEY (k, c1, c2)
 )
 {noformat}
 with the following data:
 {noformat}
 k | c1 | c2
 
 0 | 0  | 0
 0 | 0  | 1
 0 | 1  | 0
 0 | 1  | 1
 {noformat}
 Currently, CQL3 allows to slice over either c1 or c2:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND c1  0 AND c1  2
 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2  0 AND c2  2
 {noformat}
 but you cannot express a query that return the 3 last records. Indeed, for 
 that you would need to do a query like say:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2  0) OR c2  0)
 {noformat}
 but we don't support that.
 This can make it hard to paginate over say all records for {{k = 0}} (I'm 
 saying can because if the value for c2 cannot be very large, an easy 
 workaround could be to paginate by entire value of c1, which you can do).
 For the case where you only paginate to avoid OOMing on a query, 
 CASSANDRA-4415 will that and is probably the best solution. However, there 
 may be case where the pagination is say user (as in, the user of your 
 application) triggered.
 I note that one solution would be to add the OR support at least in case like 
 the one above. That's definitively doable but on the other side, we won't be 
 able to support full-blown OR, so it may not be very natural that we support 
 seemingly random combination of OR and not others.
 Another solution would be to allow the following syntax:
 {noformat}
 SELECT * FROM test WHERE k = 0 AND (c1, c2)  (0, 0)
 {noformat}
 which would literally mean that you want records where the values of c1 and 
 c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This 
 is less SQL-like (though maybe some SQL store have that, it's a fairly thing 
 to have imo?), but would be much simpler to implement and probably to use too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-5323) Revisit disabled dtests

2014-02-03 Thread Michael Shuler (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889590#comment-13889590
 ] 

Michael Shuler commented on CASSANDRA-5323:
---

*All* dtest have been running on cassandra-2.0 branch for a few days, without 
hanging up the entire test  :)

 Revisit disabled dtests
 ---

 Key: CASSANDRA-5323
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5323
 Project: Cassandra
  Issue Type: Test
Reporter: Ryan McGuire
Assignee: Michael Shuler

 The following dtests are disabled in buildbot, if they can be re-enabled 
 great, if they can't can they be fixed? 
 upgrade|decommission|sstable_gen|global_row|putget_2dc|cql3_insert



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6640) Improve custom 2i performance and abstraction

2014-02-03 Thread Sam Tunnicliffe (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889600#comment-13889600
 ] 

Sam Tunnicliffe commented on CASSANDRA-6640:


Second patch LGTM +1

 Improve custom 2i performance and abstraction
 -

 Key: CASSANDRA-6640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Miguel Angel Fernandez Diaz
Assignee: Miguel Angel Fernandez Diaz
  Labels: patch, performance
 Fix For: 2.1

 Attachments: 6640.diff, 6640v2.diff


 With the current implementation, the method update from SecondaryIndexManager 
 forces to insert and delete a cell. That happens because we assume that we 
 need the value of the old cell in order to locate the cell that we are 
 updating in our custom secondary index implementation. 
 However, depending on the implementation, a insert and a delete operations 
 could have much worse performance than a simple update. Moreover, if our 
 custom secondary index doesn't use inverted indexes, we don't really need the 
 old cell information and the key information is enough. 
 Therefore, a good solution would be to make the update method more abstract. 
 Thus, the update method for PerColumnSecondaryIndex would receive also the 
 old cell information and from that point we could decide if we must carry out 
 the delete+insert operation or just a update operation.
 I attach a patch that implements this solution.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2014-02-03 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889646#comment-13889646
 ] 

Yuki Morishita commented on CASSANDRA-5351:
---

[~krummas] Still reviewing your patch, but I see following problems:

* Sequential(Snapshot) repair is now default, so to use this feature users need 
to do 'nodetool repair -par -inc' since I don't see incremental repair codes in 
snapshot repair code path.
* Performing anti-compaction in repair thread sequentially for all parent 
sessions seems problematic performance-wise.

And for STCS major compaction, I prefer not to change current behavior, 
dropping compacted SSTable to UNREPAIRED is fine I think. I think it is 
surprise for users (despite major compaction is not recommended).

I'll look deeper but that's what I have currently.

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1

 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
 5351_nodetool.log


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

git commit: Improve custom 2i performance and abstraction Patch by Miguel Angel Fernandez Diaz, reviewed by Sam Tunnicliffe for CASSANDRA-6640

2014-02-03 Thread brandonwilliams

Updated Branches:
  refs/heads/trunk aa29b6af6 - fc91071c0


Improve custom 2i performance and abstraction
Patch by Miguel Angel Fernandez Diaz, reviewed by Sam Tunnicliffe for
CASSANDRA-6640


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fc91071c
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fc91071c
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fc91071c

Branch: refs/heads/trunk
Commit: fc91071c01c33500774de83944bf5f937397c089
Parents: aa29b6a
Author: Brandon Williams brandonwilli...@apache.org
Authored: Mon Feb 3 11:33:37 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Mon Feb 3 11:33:37 2014 -0600

--
 .../db/index/AbstractSimplePerColumnSecondaryIndex.java   | 7 +--
 .../apache/cassandra/db/index/PerColumnSecondaryIndex.java| 3 ++-
 .../org/apache/cassandra/db/index/SecondaryIndexManager.java  | 7 +++
 test/unit/org/apache/cassandra/db/RangeTombstoneTest.java | 2 +-
 .../org/apache/cassandra/db/SecondaryIndexCellSizeTest.java   | 2 +-
 5 files changed, 12 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
--
diff --git 
a/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
 
b/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
index 5987d7a..e2a6608 100644
--- 
a/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
+++ 
b/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java
@@ -135,9 +135,12 @@ public abstract class 
AbstractSimplePerColumnSecondaryIndex extends PerColumnSec
 indexCfs.apply(valueKey, cfi, SecondaryIndexManager.nullUpdater, 
opGroup, null);
 }
 
-public void update(ByteBuffer rowKey, Cell col, OpOrder.Group opGroup)
-{
+public void update(ByteBuffer rowKey, Cell oldCol, Cell col, OpOrder.Group 
opGroup)
+{
+// insert the new value before removing the old one, so we never have 
a period
+// where the row is invisible to both queries (the opposite seems 
preferable); see CASSANDRA-5540
 insert(rowKey, col, opGroup);
+delete(rowKey, oldCol, opGroup);
 }
 
 public void removeIndex(ByteBuffer columnName)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
--
diff --git 
a/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java 
b/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
index e094c4c..79087d2 100644
--- a/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
+++ b/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java
@@ -49,9 +49,10 @@ public abstract class PerColumnSecondaryIndex extends 
SecondaryIndex
  * update a column from the index
  *
  * @param rowKey the underlying row key which is indexed
+ * @param oldCol the previous column info
  * @param col all the column info
  */
-public abstract void update(ByteBuffer rowKey, Cell col, OpOrder.Group 
opGroup);
+public abstract void update(ByteBuffer rowKey, Cell oldCol, Cell col, 
OpOrder.Group opGroup);
 
 public String getNameForSystemKeyspace(ByteBuffer column)
 {

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
--
diff --git a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java 
b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
index 66e549d..946e3be 100644
--- a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
+++ b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
@@ -676,11 +676,10 @@ public class SecondaryIndexManager
 {
 if (index instanceof PerColumnSecondaryIndex)
 {
-// insert the new value before removing the old one, so we 
never have a period
-// where the row is invisible to both queries (the 
opposite seems preferable); see CASSANDRA-5540
 if (!cell.isMarkedForDelete(System.currentTimeMillis()))
-((PerColumnSecondaryIndex) index).insert(key.key, 
cell, opGroup);
-((PerColumnSecondaryIndex) index).delete(key.key, oldCell, 
opGroup);
+((PerColumnSecondaryIndex) index).update(key.key, 
oldCell, cell,

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889667#comment-13889667
]

Marcus Eriksson commented on CASSANDRA-5351:

[~yukim] thanks for the comments;

bq. I don't see incremental repair codes in snapshot repair code path.
right.. snapshot repairs... will look at them tomorrow

bq. STCS major compaction, I prefer not to change current behavior
Dropping sstable to UNREPAIRED during major compaction means that all repaired
data status is cleared for the node. Maybe we could make major compaction do 2
separate compactions? Ending up with 2 sstables should be fine for users right?

bq. Performing anti-compaction in repair thread sequentially for all parent
sessions seems problematic performance-wise.
I will move anticompaction out of the repair thread as well

Avoid repairing already-repaired data by default

Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log,
5351_nodetool.log

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (CASSANDRA-6646) Disk Failure Policy ignores CorruptBlockException

2014-02-03 Thread sankalp kohli (JIRA)

sankalp kohli created CASSANDRA-6646:


 Summary: Disk Failure Policy ignores CorruptBlockException 
 Key: CASSANDRA-6646
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6646
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Priority: Minor


If Cassandra is using compression and has a bad drive or stable, it will throw 
an CorruptBlockException. 
Disk Failure Policy only works if it is an FSError and does not work for 
IOExceptions like this. 
We need to better handle such exceptions as it causes nodes to not respond to 
the co-ordinator causing the client to timeout. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup

2014-02-03 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889826#comment-13889826
 ] 

Vijay commented on CASSANDRA-6590:
--

Sorry was shooting a different message during the startup, fixed and pushed to 
https://github.com/Vijay2win/cassandra/tree/6590-v3. Thanks!



 Gossip does not heal after a temporary partition at startup
 ---

 Key: CASSANDRA-6590
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Vijay
 Fix For: 2.0.6

 Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 
 6590_disable_echo.txt


 See CASSANDRA-6571 for background.  If a node is partitioned on startup when 
 the echo command is sent, but then the partition heals, the halves of the 
 partition will never mark each other up despite being able to communicate.  
 This stems from CASSANDRA-3533.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

git commit: Let scrub optionally skip broken counter partitions

Updated Branches:
  refs/heads/cassandra-2.0 b71372146 - 728c4fa9b


Let scrub optionally skip broken counter partitions

patch by Tyler Hobbs; reviewed by Aleksey Yeschenko for CASSANDRA-5930


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/728c4fa9
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/728c4fa9
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/728c4fa9

Branch: refs/heads/cassandra-2.0
Commit: 728c4fa9bf2b2c11dbc61c8e5536b1542abc1ccb
Parents: b713721
Author: Aleksey Yeschenko alek...@apache.org
Authored: Mon Feb 3 23:01:31 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Mon Feb 3 23:01:31 2014 +0300

--
 CHANGES.txt |  4 +
 NEWS.txt| 12 ++-
 .../apache/cassandra/db/ColumnFamilyStore.java  |  4 +-
 .../db/compaction/CompactionManager.java| 12 +--
 .../cassandra/db/compaction/Scrubber.java   | 37 ++---
 .../cassandra/service/StorageService.java   |  4 +-
 .../cassandra/service/StorageServiceMBean.java  |  2 +-
 .../org/apache/cassandra/tools/NodeCmd.java |  6 +-
 .../org/apache/cassandra/tools/NodeProbe.java   |  4 +-
 .../cassandra/tools/StandaloneScrubber.java |  6 +-
 .../apache/cassandra/tools/NodeToolHelp.yaml|  6 +-
 .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++--
 12 files changed, 140 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 13b4c5b..a1a58a3 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,3 +1,7 @@
+2.0.6
+ * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
+
+
 2.0.5
  * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609)
  * Add ks.cf names to tombstone logging (CASSANDRA-6597)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 92446c8..b21fbaa 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -14,11 +14,21 @@ restore snapshots created with the previous major version 
using the
 using the provided 'sstableupgrade' tool.
 
 
+2.0.6
+=
+
+New features
+
+- Scrub can now optionally skip corrupt counter partitions. Please note
+  that this will lead to the loss of all the counter updates in the skipped
+  partition. See the --skip-corrupted option.
+
+
 2.0.5
 =
 
 New features
-
+
 - Batchlog replay can be, and is throttled by default now.
   See batchlog_replay_throttle_in_kb setting in cassandra.yaml.
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index 8750026..38d87db 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -1115,12 +1115,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 CompactionManager.instance.performCleanup(ColumnFamilyStore.this, 
renewer);
 }
 
-public void scrub(boolean disableSnapshot) throws ExecutionException, 
InterruptedException
+public void scrub(boolean disableSnapshot, boolean skipCorrupted) throws 
ExecutionException, InterruptedException
 {
 // skip snapshot creation during scrub, SEE JIRA 5891
 if(!disableSnapshot)
 snapshotWithoutFlush(pre-scrub- + System.currentTimeMillis());
-CompactionManager.instance.performScrub(ColumnFamilyStore.this);
+CompactionManager.instance.performScrub(ColumnFamilyStore.this, 
skipCorrupted);
 }
 
 public void sstablesRewrite(boolean excludeCurrentVersion) throws 
ExecutionException, InterruptedException

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
--
diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java 
b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
index 168ee02..48900c8 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
@@ -227,13 +227,13 @@ public class CompactionManager implements 
CompactionManagerMBean
 executor.submit(runnable).get();
 }
 
-public void

[jira] [Updated] (CASSANDRA-6622) Streaming session failures during node replace using replace_address

2014-02-03 Thread Ravi Prasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prasad updated CASSANDRA-6622:
---

Attachment: logs.tgz

 Streaming session failures during node replace using replace_address
 

 Key: CASSANDRA-6622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
 Project: Cassandra
  Issue Type: Bug
 Environment: RHEL6, cassandra-2.0.4
Reporter: Ravi Prasad
Assignee: Brandon Williams
 Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 
 6622-2.0.txt, logs.tgz


 When using replace_address, Gossiper ApplicationState is set to hibernate, 
 which is a down state. We are seeing that the peer nodes are seeing streaming 
 plan request even before the Gossiper on them marks the replacing node as 
 dead. As a result, streaming on peer nodes convicts the replacing node by 
 closing the stream handler.  
 I think, making the StorageService thread on the replacing node, sleep for 
 BROADCAST_INTERVAL before bootstrapping, would avoid this scenario.
 Relevant logs from peer node (see that the Gossiper on peer node mark the 
 replacing node as down, 2 secs after  the streaming init request):
 {noformat}
  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 
 StreamResultFuture.java (line 116) [Stream 
 #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
 
  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is 
 complete
  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) 
 InetAddress /x.x.x.x is now DOWN
 ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 
 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
 java.lang.RuntimeException: Outgoing stream handler has been closed
 at 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
 at 
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
 at 
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
 at java.lang.Thread.run(Thread.java:722)
  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with 
 /x.x.x.x is complete
  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address

2014-02-03 Thread Ravi Prasad (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889856#comment-13889856
 ] 

Ravi Prasad commented on CASSANDRA-6622:


In attached logs, .72 was the replacing node, .73 is where the streaming 
session failed. I had trace logging turned on in .73 for 
org.apache.cassandra.gms.  Looks like, it is FailureDetector is convicting.  I 
have to mention that this was with 
'0001-don-t-signal-restart-of-dead-states.txt' applied on cassandra-2.0.4.

 Streaming session failures during node replace using replace_address
 

 Key: CASSANDRA-6622
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
 Project: Cassandra
  Issue Type: Bug
 Environment: RHEL6, cassandra-2.0.4
Reporter: Ravi Prasad
Assignee: Brandon Williams
 Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 
 6622-2.0.txt, logs.tgz


 When using replace_address, Gossiper ApplicationState is set to hibernate, 
 which is a down state. We are seeing that the peer nodes are seeing streaming 
 plan request even before the Gossiper on them marks the replacing node as 
 dead. As a result, streaming on peer nodes convicts the replacing node by 
 closing the stream handler.  
 I think, making the StorageService thread on the replacing node, sleep for 
 BROADCAST_INTERVAL before bootstrapping, would avoid this scenario.
 Relevant logs from peer node (see that the Gossiper on peer node mark the 
 replacing node as down, 2 secs after  the streaming init request):
 {noformat}
  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 
 StreamResultFuture.java (line 116) [Stream 
 #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
 
  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is 
 complete
  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 
 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) 
 InetAddress /x.x.x.x is now DOWN
 ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 
 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
 java.lang.RuntimeException: Outgoing stream handler has been closed
 at 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
 at 
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
 at 
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
 at java.lang.Thread.run(Thread.java:722)
  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with 
 /x.x.x.x is complete
  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java 
 (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[2/4] Merge branch 'cassandra-2.0' into trunk

http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/test/unit/org/apache/cassandra/db/ScrubTest.java
--
diff --cc test/unit/org/apache/cassandra/db/ScrubTest.java
index 38c8b62,08dd435..d8ab9ff
--- a/test/unit/org/apache/cassandra/db/ScrubTest.java
+++ b/test/unit/org/apache/cassandra/db/ScrubTest.java
@@@ -39,9 -41,9 +41,11 @@@ import org.apache.cassandra.db.compacti
  import org.apache.cassandra.exceptions.ConfigurationException;
  import org.apache.cassandra.db.columniterator.IdentityQueryFilter;
  import org.apache.cassandra.db.compaction.CompactionManager;
++import org.apache.cassandra.exceptions.WriteTimeoutException;
  import org.apache.cassandra.io.sstable.*;
  import org.apache.cassandra.utils.ByteBufferUtil;
  
++import static org.apache.cassandra.Util.cellname;
  import static org.apache.cassandra.Util.column;
  import static org.junit.Assert.assertEquals;
  import static org.junit.Assert.fail;
@@@ -76,6 -79,53 +81,53 @@@ public class ScrubTest extends SchemaLo
  }
  
  @Test
 -public void testScrubCorruptedCounterRow() throws IOException, 
InterruptedException, ExecutionException
++public void testScrubCorruptedCounterRow() throws IOException, 
InterruptedException, ExecutionException, WriteTimeoutException
+ {
+ CompactionManager.instance.disableAutoCompaction();
+ Keyspace keyspace = Keyspace.open(KEYSPACE);
+ ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(COUNTER_CF);
+ cfs.clearUnsafe();
+ 
+ fillCounterCF(cfs, 2);
+ 
+ ListRow rows = cfs.getRangeSlice(Util.range(, ), null, new 
IdentityQueryFilter(), 1000);
+ assertEquals(2, rows.size());
+ 
+ SSTableReader sstable = cfs.getSSTables().iterator().next();
+ 
+ // overwrite one row with garbage
+ long row0Start = 
sstable.getPosition(RowPosition.forKey(ByteBufferUtil.bytes(0), 
sstable.partitioner), SSTableReader.Operator.EQ).position;
+ long row1Start = 
sstable.getPosition(RowPosition.forKey(ByteBufferUtil.bytes(1), 
sstable.partitioner), SSTableReader.Operator.EQ).position;
+ long startPosition = row0Start  row1Start ? row0Start : row1Start;
+ long endPosition = row0Start  row1Start ? row1Start : row0Start;
+ 
+ RandomAccessFile file = new RandomAccessFile(sstable.getFilename(), 
rw);
+ file.seek(startPosition);
+ file.writeBytes(StringUtils.repeat('z', (int) (endPosition - 
startPosition)));
+ file.close();
+ 
+ // with skipCorrupted == false, the scrub is expected to fail
+ Scrubber scrubber = new Scrubber(cfs, sstable, false);
+ try
+ {
+ scrubber.scrub();
+ fail(Expected a CorruptSSTableException to be thrown);
+ }
+ catch (IOError err) {}
+ 
+ // with skipCorrupted == true, the corrupt row will be skipped
+ scrubber = new Scrubber(cfs, sstable, true);
+ scrubber.scrub();
+ scrubber.close();
+ cfs.replaceCompactedSSTables(Collections.singletonList(sstable), 
Collections.singletonList(scrubber.getNewSSTable()), OperationType.SCRUB);
+ assertEquals(1, cfs.getSSTables().size());
+ 
+ // verify that we can read all of the rows, and there is now one less 
row
+ rows = cfs.getRangeSlice(Util.range(, ), null, new 
IdentityQueryFilter(), 1000);
+ assertEquals(1, rows.size());
+ }
+ 
+ @Test
  public void testScrubDeletedRow() throws IOException, ExecutionException, 
InterruptedException, ConfigurationException
  {
  CompactionManager.instance.disableAutoCompaction();
@@@ -207,4 -256,20 +258,20 @@@
  
  cfs.forceBlockingFlush();
  }
+ 
 -protected void fillCounterCF(ColumnFamilyStore cfs, int rowsPerSSTable) 
throws ExecutionException, InterruptedException, IOException
++protected void fillCounterCF(ColumnFamilyStore cfs, int rowsPerSSTable) 
throws ExecutionException, InterruptedException, IOException, 
WriteTimeoutException
+ {
+ for (int i = 0; i  rowsPerSSTable; i++)
+ {
+ String key = String.valueOf(i);
+ ColumnFamily cf = 
TreeMapBackedSortedColumns.factory.create(KEYSPACE, COUNTER_CF);
 -RowMutation rm = new RowMutation(KEYSPACE, 
ByteBufferUtil.bytes(key), cf);
 -rm.addCounter(COUNTER_CF, ByteBufferUtil.bytes(Column1), 100);
++Mutation rm = new Mutation(KEYSPACE, ByteBufferUtil.bytes(key), 
cf);
++rm.addCounter(COUNTER_CF, cellname(Column1), 100);
+ CounterMutation cm = new CounterMutation(rm, 
ConsistencyLevel.ONE);
+ cm.apply();
+ }
+ 
+ cfs.forceBlockingFlush();
+ }
+ 
 -}
 +}

[1/4] git commit: Let scrub optionally skip broken counter partitions

Updated Branches:
  refs/heads/trunk fc91071c0 - 63f110b5e


Let scrub optionally skip broken counter partitions

patch by Tyler Hobbs; reviewed by Aleksey Yeschenko for CASSANDRA-5930


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/728c4fa9
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/728c4fa9
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/728c4fa9

Branch: refs/heads/trunk
Commit: 728c4fa9bf2b2c11dbc61c8e5536b1542abc1ccb
Parents: b713721
Author: Aleksey Yeschenko alek...@apache.org
Authored: Mon Feb 3 23:01:31 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Mon Feb 3 23:01:31 2014 +0300

--
 CHANGES.txt |  4 +
 NEWS.txt| 12 ++-
 .../apache/cassandra/db/ColumnFamilyStore.java  |  4 +-
 .../db/compaction/CompactionManager.java| 12 +--
 .../cassandra/db/compaction/Scrubber.java   | 37 ++---
 .../cassandra/service/StorageService.java   |  4 +-
 .../cassandra/service/StorageServiceMBean.java  |  2 +-
 .../org/apache/cassandra/tools/NodeCmd.java |  6 +-
 .../org/apache/cassandra/tools/NodeProbe.java   |  4 +-
 .../cassandra/tools/StandaloneScrubber.java |  6 +-
 .../apache/cassandra/tools/NodeToolHelp.yaml|  6 +-
 .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++--
 12 files changed, 140 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 13b4c5b..a1a58a3 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,3 +1,7 @@
+2.0.6
+ * Let scrub optionally skip broken counter partitions (CASSANDRA-5930)
+
+
 2.0.5
  * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609)
  * Add ks.cf names to tombstone logging (CASSANDRA-6597)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 92446c8..b21fbaa 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -14,11 +14,21 @@ restore snapshots created with the previous major version 
using the
 using the provided 'sstableupgrade' tool.
 
 
+2.0.6
+=
+
+New features
+
+- Scrub can now optionally skip corrupt counter partitions. Please note
+  that this will lead to the loss of all the counter updates in the skipped
+  partition. See the --skip-corrupted option.
+
+
 2.0.5
 =
 
 New features
-
+
 - Batchlog replay can be, and is throttled by default now.
   See batchlog_replay_throttle_in_kb setting in cassandra.yaml.
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index 8750026..38d87db 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -1115,12 +1115,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 CompactionManager.instance.performCleanup(ColumnFamilyStore.this, 
renewer);
 }
 
-public void scrub(boolean disableSnapshot) throws ExecutionException, 
InterruptedException
+public void scrub(boolean disableSnapshot, boolean skipCorrupted) throws 
ExecutionException, InterruptedException
 {
 // skip snapshot creation during scrub, SEE JIRA 5891
 if(!disableSnapshot)
 snapshotWithoutFlush(pre-scrub- + System.currentTimeMillis());
-CompactionManager.instance.performScrub(ColumnFamilyStore.this);
+CompactionManager.instance.performScrub(ColumnFamilyStore.this, 
skipCorrupted);
 }
 
 public void sstablesRewrite(boolean excludeCurrentVersion) throws 
ExecutionException, InterruptedException

http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
--
diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java 
b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
index 168ee02..48900c8 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
@@ -227,13 +227,13 @@ public class CompactionManager implements 
CompactionManagerMBean
 executor.submit(runnable).get();
 }
 
-public void performScrub(ColumnFamilyStore

[4/4] git commit: Merge branch 'cassandra-2.0' into trunk