[jira] [Commented] (CASSANDRA-6888) Store whether a counter sstable still use some local/remote shards in the sstable metadata
[ https://issues.apache.org/jira/browse/CASSANDRA-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959755#comment-13959755 ] Marcus Eriksson commented on CASSANDRA-6888: +1 Store whether a counter sstable still use some local/remote shards in the sstable metadata -- Key: CASSANDRA-6888 URL: https://issues.apache.org/jira/browse/CASSANDRA-6888 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Aleksey Yeschenko Fix For: 2.1 beta2 CASSANDRA-6504 has made so we don't distinguish different type of shard in counters. Yet, even though we don't generate those local/remote type of shards, those won't disappear just by running upgradesstable, they need to be compacted away (and even then, they really only disappear if there has been a new update on the counter post-6504). But we want to get rid of those ultimately, since they make things like CASSANDRA-6506 less optimal. Now, even though the final step of that remain to be discussed, the first step is probably to keep track of whether such shard still exist in the system or not. That part is simple, we can just store a boolean in the SSTableMetadata to say whether or not said sstable still has at least one Cell using such old shard type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959757#comment-13959757 ] Marcus Eriksson commented on CASSANDRA-6696: [~benedict] do you mean having a background job move data around after upgrade? Or hanging on startup and rewriting everything? Current version would end up with data on the correct disks eventually with compactions, but I agree it would be nice to be able to just care about the disks when flushing and streaming. Manually copying sstables into the datadirs and calling 'nodetool refresh' would also need some care. Drive replacement in JBOD can cause data to reappear. -- Key: CASSANDRA-6696 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: Marcus Eriksson Fix For: 3.0 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases. Also this is true for corrupt stables in which we delete the corrupt stable and run repair. Here is an example: Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. row=sankalp col=sankalp is written 20 days back and successfully went to all three nodes. Then a delete/tombstone was written successfully for the same row column 15 days back. Since this tombstone is more than gc grace, it got compacted in Nodes A and B since it got compacted with the actual data. So there is no trace of this row column in node A and B. Now in node C, say the original data is in drive1 and tombstone is in drive2. Compaction has not yet reclaimed the data and tombstone. Drive2 becomes corrupt and was replaced with new empty drive. Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp has come back to life. Now after replacing the drive we run repair. This data will be propagated to all nodes. Note: This is still a problem even if we run repair every gc grace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959841#comment-13959841 ] Benedict commented on CASSANDRA-6694: - rebased and pushed -f Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (CASSANDRA-6982) start_column in get_page_slice has odd behaivor
[ https://issues.apache.org/jira/browse/CASSANDRA-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo reassigned CASSANDRA-6982: -- Assignee: Edward Capriolo start_column in get_page_slice has odd behaivor --- Key: CASSANDRA-6982 URL: https://issues.apache.org/jira/browse/CASSANDRA-6982 Project: Cassandra Issue Type: Bug Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Critical get_paged_slice is described as so: {code} /** returns a range of columns, wrapping to the next rows if necessary to collect max_results. */ listKeySlice get_paged_slice(1:required string column_family, 2:required KeyRange range, 3:required binary start_column, 4:required ConsistencyLevel consistency_level=ConsistencyLevel.ONE) throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te), {code} The term max_results is not defined, I take it to mean key_range.count. The larger issue I have found is that start_column seems to be ignored in some cases. testNormal() produces this error junit.framework.ComparisonFailure: null expected:[c] but was:[a] The problem seems to be KeyRanges that use tokens and not keys. {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(); kr.setEnd_token(); {code} A failing test is here: https://github.com/edwardcapriolo/cassandra/compare/pg?expand=1 Is this a bug? It feels like one, or is this just undefined behaviour. If it is a bug I would like to fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6982) start_column in get_page_slice has odd behaivor
[ https://issues.apache.org/jira/browse/CASSANDRA-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959995#comment-13959995 ] Edward Capriolo commented on CASSANDRA-6982: This is not as much a bug as thrift is allowing users to do something that does not work. We should reject that request. I will explain This works: {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_key(ByteBufferUtil.bytes(aslice)); kr.setEnd_key(ByteBufferUtil.bytes(aslice)); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {code} When you specify a start token and a start column you get what you would expect. This is correct but may not be intuitive. {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_key(ByteBufferUtil.bytes()); kr.setEnd_key(ByteBufferUtil.bytes()); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {code} Your slice is starting before the row in question. You get back columns a,b,c not c,d,e. The problem comes using token. With Murmur3 and Random Partitioner the relation between tokens-keys is not one to one. The pig unit tests fires up ByteOrderPartitioner, the rest of our testing is Murmer3. Here are the things we should not allow: (not sure if I am supposed to hex encode so I tried both) {quote} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(ByteBufferUtil.bytesToHex(ByteBuffer.wrap(l.token.toString().getBytes(; kr.setEnd_token(ByteBufferUtil.bytesToHex(ByteBuffer.wrap(l.token.toString().getBytes(; ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {quote} {quote} Murmur3Partitioner m = new Murmur3Partitioner(); LongToken l = m.getToken(ByteBufferUtil.bytes(aslice)); KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(l.toString()); kr.setEnd_token(l.toString()); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {quote} Because the relationship of token to key is not 1 to 1. There is no way to start at a specific row. Since you can not start at a specific row the start_column is meaningless. I *think* we should reject a KeyRange using a start_token and a start_column. We should throw an InvalidRequestException. start_column in get_page_slice has odd behaivor --- Key: CASSANDRA-6982 URL: https://issues.apache.org/jira/browse/CASSANDRA-6982 Project: Cassandra Issue Type: Bug Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Critical get_paged_slice is described as so: {code} /** returns a range of columns, wrapping to the next rows if necessary to collect max_results. */ listKeySlice get_paged_slice(1:required string column_family, 2:required KeyRange range, 3:required binary start_column, 4:required ConsistencyLevel consistency_level=ConsistencyLevel.ONE) throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te), {code} The term max_results is not defined, I take it to mean key_range.count. The larger issue I have found is that start_column seems to be ignored in some cases. testNormal() produces this error junit.framework.ComparisonFailure: null expected:[c] but was:[a] The problem seems to be KeyRanges that use tokens and not keys. {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(); kr.setEnd_token(); {code} A failing test is here: https://github.com/edwardcapriolo/cassandra/compare/pg?expand=1 Is this a bug? It feels like one, or is this just undefined behaviour. If it is a bug I would like to fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6982) start_column in get_page_slice has odd behaivor
[ https://issues.apache.org/jira/browse/CASSANDRA-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959995#comment-13959995 ] Edward Capriolo edited comment on CASSANDRA-6982 at 4/4/14 2:29 PM: This is not as much a bug as thrift is allowing users to do something that does not work. We should reject that request. I will explain This works: {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_key(ByteBufferUtil.bytes(aslice)); kr.setEnd_key(ByteBufferUtil.bytes(aslice)); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {code} When you specify a start token and a start column you get what you would expect. This is correct but may not be intuitive. {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_key(ByteBufferUtil.bytes()); kr.setEnd_key(ByteBufferUtil.bytes()); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {code} Your slice is starting before the row in question. You get back columns a,b,c not c,d,e. The problem comes using token. With Murmur3 and Random Partitioner the relation between tokens-keys is not one to one. The pig unit tests fires up ByteOrderPartitioner, the rest of our testing is Murmer3. Here are the things we should not allow: (not sure if I am supposed to hex encode so I tried both) {quote} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(ByteBufferUtil.bytesToHex(ByteBuffer.wrap(l.token.toString().getBytes(; kr.setEnd_token(ByteBufferUtil.bytesToHex(ByteBuffer.wrap(l.token.toString().getBytes(; ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {quote} {quote} Murmur3Partitioner m = new Murmur3Partitioner(); LongToken l = m.getToken(ByteBufferUtil.bytes(aslice)); KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(l.toString()); kr.setEnd_token(l.toString()); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {quote} Because the relationship of token to key is not 1 to 1. There is no way to start at a specific row. Since you can not start at a specific row the start_column is meaningless. I *think* we should reject a KeyRange using a start_token and a start_column when the partitioner does not provide 1 to 1 tokens. We should throw an InvalidRequestException. was (Author: appodictic): This is not as much a bug as thrift is allowing users to do something that does not work. We should reject that request. I will explain This works: {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_key(ByteBufferUtil.bytes(aslice)); kr.setEnd_key(ByteBufferUtil.bytes(aslice)); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {code} When you specify a start token and a start column you get what you would expect. This is correct but may not be intuitive. {code} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_key(ByteBufferUtil.bytes()); kr.setEnd_key(ByteBufferUtil.bytes()); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {code} Your slice is starting before the row in question. You get back columns a,b,c not c,d,e. The problem comes using token. With Murmur3 and Random Partitioner the relation between tokens-keys is not one to one. The pig unit tests fires up ByteOrderPartitioner, the rest of our testing is Murmer3. Here are the things we should not allow: (not sure if I am supposed to hex encode so I tried both) {quote} KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(ByteBufferUtil.bytesToHex(ByteBuffer.wrap(l.token.toString().getBytes(; kr.setEnd_token(ByteBufferUtil.bytesToHex(ByteBuffer.wrap(l.token.toString().getBytes(; ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {quote} {quote} Murmur3Partitioner m = new Murmur3Partitioner(); LongToken l = m.getToken(ByteBufferUtil.bytes(aslice)); KeyRange kr = new KeyRange(); kr.setCount(3); kr.setStart_token(l.toString()); kr.setEnd_token(l.toString()); ListKeySlice t = server.get_paged_slice(Standard1, kr, ByteBufferUtil.bytes(c), ConsistencyLevel.ONE); {quote} Because the relationship of token to key is not 1 to 1. There is no way to start at a specific row. Since you can not start at a specific row the start_column is meaningless. I *think* we should reject a KeyRange using a start_token and a start_column. We should
git commit: Track presence of legacy counter shards in sstables
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 6d901f90a - 57b18e600 Track presence of legacy counter shards in sstables patch by Aleksey Yeschenko; reviewed by Marcus Eriksson for CASSANDRA-6888 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/57b18e60 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/57b18e60 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/57b18e60 Branch: refs/heads/cassandra-2.1 Commit: 57b18e600c6d79d19d29f3569b81cb946ef9ee57 Parents: 6d901f9 Author: Aleksey Yeschenko alek...@apache.org Authored: Fri Apr 4 17:36:15 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Fri Apr 4 17:36:15 2014 +0300 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnFamily.java | 12 ++- .../org/apache/cassandra/db/CounterCell.java| 5 ++ .../db/compaction/LazilyCompactedRow.java | 12 +-- .../cassandra/db/context/CounterContext.java| 18 + .../cassandra/io/sstable/ColumnStats.java | 12 ++- .../apache/cassandra/io/sstable/Descriptor.java | 3 + .../cassandra/io/sstable/SSTableWriter.java | 26 --- .../metadata/LegacyMetadataSerializer.java | 1 + .../io/sstable/metadata/MetadataCollector.java | 67 ++--- .../io/sstable/metadata/StatsMetadata.java | 14 .../io/sstable/SSTableMetadataTest.java | 77 +--- 12 files changed, 194 insertions(+), 54 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/57b18e60/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index ac2f624..4cfc957 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -40,6 +40,7 @@ * Optimize CounterColumn#reconcile() (CASSANDRA-6953) * Properly remove 1.2 sstable support in 2.1 (CASSANDRA-6869) * Lock counter cells, not partitions (CASSANDRA-6880) + * Track presence of legacy counter shards in sstables (CASSANDRA-6888) Merged from 2.0: * Allow compaction of system tables during startup (CASSANDRA-6913) * Restrict Windows to parallel repairs (CASSANDRA-6907) http://git-wip-us.apache.org/repos/asf/cassandra/blob/57b18e60/src/java/org/apache/cassandra/db/ColumnFamily.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamily.java b/src/java/org/apache/cassandra/db/ColumnFamily.java index e7aab37..da404b0 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamily.java +++ b/src/java/org/apache/cassandra/db/ColumnFamily.java @@ -402,6 +402,7 @@ public abstract class ColumnFamily implements IterableCell, IRowCacheEntry int maxLocalDeletionTime = Integer.MIN_VALUE; ListByteBuffer minColumnNamesSeen = Collections.emptyList(); ListByteBuffer maxColumnNamesSeen = Collections.emptyList(); +boolean hasLegacyCounterShards = false; for (Cell cell : this) { if (deletionInfo().getTopLevelDeletion().localDeletionTime Integer.MAX_VALUE) @@ -420,8 +421,17 @@ public abstract class ColumnFamily implements IterableCell, IRowCacheEntry tombstones.update(deletionTime); minColumnNamesSeen = ColumnNameHelper.minComponents(minColumnNamesSeen, cell.name, metadata.comparator); maxColumnNamesSeen = ColumnNameHelper.maxComponents(maxColumnNamesSeen, cell.name, metadata.comparator); +if (cell instanceof CounterCell) +hasLegacyCounterShards = hasLegacyCounterShards || ((CounterCell) cell).hasLegacyShards(); } -return new ColumnStats(getColumnCount(), minTimestampSeen, maxTimestampSeen, maxLocalDeletionTime, tombstones, minColumnNamesSeen, maxColumnNamesSeen); +return new ColumnStats(getColumnCount(), + minTimestampSeen, + maxTimestampSeen, + maxLocalDeletionTime, + tombstones, + minColumnNamesSeen, + maxColumnNamesSeen, + hasLegacyCounterShards); } public boolean isMarkedForDelete() http://git-wip-us.apache.org/repos/asf/cassandra/blob/57b18e60/src/java/org/apache/cassandra/db/CounterCell.java -- diff --git a/src/java/org/apache/cassandra/db/CounterCell.java b/src/java/org/apache/cassandra/db/CounterCell.java index 6b588ef..fc4ac3f 100644 --- a/src/java/org/apache/cassandra/db/CounterCell.java +++ b/src/java/org/apache/cassandra/db/CounterCell.java @@ -182,6 +182,11 @@ public class CounterCell extends Cell
[1/2] git commit: Track presence of legacy counter shards in sstables
Repository: cassandra Updated Branches: refs/heads/trunk f4e8fc3f6 - 0015f37a3 Track presence of legacy counter shards in sstables patch by Aleksey Yeschenko; reviewed by Marcus Eriksson for CASSANDRA-6888 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/57b18e60 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/57b18e60 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/57b18e60 Branch: refs/heads/trunk Commit: 57b18e600c6d79d19d29f3569b81cb946ef9ee57 Parents: 6d901f9 Author: Aleksey Yeschenko alek...@apache.org Authored: Fri Apr 4 17:36:15 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Fri Apr 4 17:36:15 2014 +0300 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnFamily.java | 12 ++- .../org/apache/cassandra/db/CounterCell.java| 5 ++ .../db/compaction/LazilyCompactedRow.java | 12 +-- .../cassandra/db/context/CounterContext.java| 18 + .../cassandra/io/sstable/ColumnStats.java | 12 ++- .../apache/cassandra/io/sstable/Descriptor.java | 3 + .../cassandra/io/sstable/SSTableWriter.java | 26 --- .../metadata/LegacyMetadataSerializer.java | 1 + .../io/sstable/metadata/MetadataCollector.java | 67 ++--- .../io/sstable/metadata/StatsMetadata.java | 14 .../io/sstable/SSTableMetadataTest.java | 77 +--- 12 files changed, 194 insertions(+), 54 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/57b18e60/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index ac2f624..4cfc957 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -40,6 +40,7 @@ * Optimize CounterColumn#reconcile() (CASSANDRA-6953) * Properly remove 1.2 sstable support in 2.1 (CASSANDRA-6869) * Lock counter cells, not partitions (CASSANDRA-6880) + * Track presence of legacy counter shards in sstables (CASSANDRA-6888) Merged from 2.0: * Allow compaction of system tables during startup (CASSANDRA-6913) * Restrict Windows to parallel repairs (CASSANDRA-6907) http://git-wip-us.apache.org/repos/asf/cassandra/blob/57b18e60/src/java/org/apache/cassandra/db/ColumnFamily.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamily.java b/src/java/org/apache/cassandra/db/ColumnFamily.java index e7aab37..da404b0 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamily.java +++ b/src/java/org/apache/cassandra/db/ColumnFamily.java @@ -402,6 +402,7 @@ public abstract class ColumnFamily implements IterableCell, IRowCacheEntry int maxLocalDeletionTime = Integer.MIN_VALUE; ListByteBuffer minColumnNamesSeen = Collections.emptyList(); ListByteBuffer maxColumnNamesSeen = Collections.emptyList(); +boolean hasLegacyCounterShards = false; for (Cell cell : this) { if (deletionInfo().getTopLevelDeletion().localDeletionTime Integer.MAX_VALUE) @@ -420,8 +421,17 @@ public abstract class ColumnFamily implements IterableCell, IRowCacheEntry tombstones.update(deletionTime); minColumnNamesSeen = ColumnNameHelper.minComponents(minColumnNamesSeen, cell.name, metadata.comparator); maxColumnNamesSeen = ColumnNameHelper.maxComponents(maxColumnNamesSeen, cell.name, metadata.comparator); +if (cell instanceof CounterCell) +hasLegacyCounterShards = hasLegacyCounterShards || ((CounterCell) cell).hasLegacyShards(); } -return new ColumnStats(getColumnCount(), minTimestampSeen, maxTimestampSeen, maxLocalDeletionTime, tombstones, minColumnNamesSeen, maxColumnNamesSeen); +return new ColumnStats(getColumnCount(), + minTimestampSeen, + maxTimestampSeen, + maxLocalDeletionTime, + tombstones, + minColumnNamesSeen, + maxColumnNamesSeen, + hasLegacyCounterShards); } public boolean isMarkedForDelete() http://git-wip-us.apache.org/repos/asf/cassandra/blob/57b18e60/src/java/org/apache/cassandra/db/CounterCell.java -- diff --git a/src/java/org/apache/cassandra/db/CounterCell.java b/src/java/org/apache/cassandra/db/CounterCell.java index 6b588ef..fc4ac3f 100644 --- a/src/java/org/apache/cassandra/db/CounterCell.java +++ b/src/java/org/apache/cassandra/db/CounterCell.java @@ -182,6 +182,11 @@ public class CounterCell extends Cell
[2/2] git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0015f37a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0015f37a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0015f37a Branch: refs/heads/trunk Commit: 0015f37a3fa6ff34a63566e253433dbc4d3cf384 Parents: f4e8fc3 57b18e6 Author: Aleksey Yeschenko alek...@apache.org Authored: Fri Apr 4 17:39:20 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Fri Apr 4 17:39:20 2014 +0300 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/ColumnFamily.java | 12 ++- .../org/apache/cassandra/db/CounterCell.java| 5 ++ .../db/compaction/LazilyCompactedRow.java | 12 +-- .../cassandra/db/context/CounterContext.java| 18 + .../cassandra/io/sstable/ColumnStats.java | 12 ++- .../apache/cassandra/io/sstable/Descriptor.java | 3 + .../cassandra/io/sstable/SSTableWriter.java | 26 --- .../metadata/LegacyMetadataSerializer.java | 1 + .../io/sstable/metadata/MetadataCollector.java | 67 ++--- .../io/sstable/metadata/StatsMetadata.java | 14 .../io/sstable/SSTableMetadataTest.java | 77 +--- 12 files changed, 194 insertions(+), 54 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/0015f37a/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/0015f37a/src/java/org/apache/cassandra/db/ColumnFamily.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/0015f37a/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java --
[jira] [Commented] (CASSANDRA-6553) Benchmark counter improvements (counters++)
[ https://issues.apache.org/jira/browse/CASSANDRA-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960040#comment-13960040 ] Aleksey Yeschenko commented on CASSANDRA-6553: -- [~rhatch] One last time this week, could you run QUORUM only, counter cache ON only https://github.com/iamaleksey/cassandra/tree/cassandra-2.0 vs. https://github.com/iamaleksey/cassandra/tree/6553 ? (for screenshots) Thanks. Benchmark counter improvements (counters++) --- Key: CASSANDRA-6553 URL: https://issues.apache.org/jira/browse/CASSANDRA-6553 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Russ Hatch Fix For: 2.1 beta2 Attachments: 6553.txt, 6553.uber.quorum.bdplab.read.png, 6553.uber.quorum.bdplab.write.png, high_cl_one.png, high_cl_quorum.png, low_cl_one.png, low_cl_quorum.png, tracing.txt, uber_cl_one.png, uber_cl_quorum.png Benchmark the difference in performance between CASSANDRA-6504 and trunk. * Updating totally unrelated counters (different partitions) * Updating the same counters a lot (same cells in the same partition) * Different cells in the same few partitions (hot counter partition) benchmark: https://github.com/apache/cassandra/tree/1218bcacba7edefaf56cf8440d0aea5794c89a1e (old counters) compared to: https://github.com/apache/cassandra/tree/714c423360c36da2a2b365efaf9c5c4f623ed133 (new counters) So far, the above changes should only affect the write path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination
[ https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960050#comment-13960050 ] Shridhar commented on CASSANDRA-6311: - [~alexliu68] We downloaded cassandra-2.0.6 and added patch (6311-v11.txt) on top of this. Still we are getting the same error as in CASSANDRA-6151. Add CqlRecordReader to take advantage of native CQL pagination -- Key: CASSANDRA-6311 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Fix For: 2.0.7 Attachments: 6311-v10.txt, 6311-v11.txt, 6311-v3-2.0-branch.txt, 6311-v4.txt, 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6311-v7.txt, 6311-v8.txt, 6311-v9.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt Since the latest Cql pagination is done and it should be more efficient, so we need update CqlPagingRecordReader to use it instead of the custom thrift paging. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6747) MessagingService should handle failures on remote nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-6747: -- Fix Version/s: 2.1 beta2 MessagingService should handle failures on remote nodes. Key: CASSANDRA-6747 URL: https://issues.apache.org/jira/browse/CASSANDRA-6747 Project: Cassandra Issue Type: Improvement Reporter: sankalp kohli Assignee: sankalp kohli Priority: Minor Labels: Core Fix For: 2.1 beta2 Attachments: CASSANDRA-6747.diff While going through the code of MessagingService, I discovered that we don't handle callbacks on failure very well. If a Verb Handler on the remote machine throws an exception, it goes right through uncaught exception handler. The machine which triggered the message will keep waiting and will timeout. On timeout, it will so some stuff hard coded in the MS like hints and add to Latency. There is no way in IAsyncCallback to specify that to do on timeouts and also on failures. Here are some examples which I found will help if we enhance this system to also propagate failures back. So IAsyncCallback will have methods like onFailure. 1) From ActiveRepairService.prepareForRepair IAsyncCallback callback = new IAsyncCallback() { @Override public void response(MessageIn msg) { prepareLatch.countDown(); } @Override public boolean isLatencyForSnitch() { return false; } }; ListUUID cfIds = new ArrayList(columnFamilyStores.size()); for (ColumnFamilyStore cfs : columnFamilyStores) cfIds.add(cfs.metadata.cfId); for(InetAddress neighbour : endpoints) { PrepareMessage message = new PrepareMessage(parentRepairSession, cfIds, ranges); MessageOutRepairMessage msg = message.createMessage(); MessagingService.instance().sendRR(msg, neighbour, callback); } try { prepareLatch.await(1, TimeUnit.HOURS); } catch (InterruptedException e) { parentRepairSessions.remove(parentRepairSession); throw new RuntimeException(Did not get replies from all endpoints., e); } 2) During snapshot phase in repair, if SnapshotVerbHandler throws an exception, we will wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6747) MessagingService should handle failures on remote nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-6747: -- Reviewer: Yuki Morishita [~kohlisankalp] I like your approach. One thing you need to change is in SnapshotTask's callback#onFailure, you can't just throw RuntimeException, you have to call task.setException so repair knows there's exception during snapshotting. MessagingService should handle failures on remote nodes. Key: CASSANDRA-6747 URL: https://issues.apache.org/jira/browse/CASSANDRA-6747 Project: Cassandra Issue Type: Improvement Reporter: sankalp kohli Assignee: sankalp kohli Priority: Minor Labels: Core Fix For: 2.1 beta2 Attachments: CASSANDRA-6747.diff While going through the code of MessagingService, I discovered that we don't handle callbacks on failure very well. If a Verb Handler on the remote machine throws an exception, it goes right through uncaught exception handler. The machine which triggered the message will keep waiting and will timeout. On timeout, it will so some stuff hard coded in the MS like hints and add to Latency. There is no way in IAsyncCallback to specify that to do on timeouts and also on failures. Here are some examples which I found will help if we enhance this system to also propagate failures back. So IAsyncCallback will have methods like onFailure. 1) From ActiveRepairService.prepareForRepair IAsyncCallback callback = new IAsyncCallback() { @Override public void response(MessageIn msg) { prepareLatch.countDown(); } @Override public boolean isLatencyForSnitch() { return false; } }; ListUUID cfIds = new ArrayList(columnFamilyStores.size()); for (ColumnFamilyStore cfs : columnFamilyStores) cfIds.add(cfs.metadata.cfId); for(InetAddress neighbour : endpoints) { PrepareMessage message = new PrepareMessage(parentRepairSession, cfIds, ranges); MessageOutRepairMessage msg = message.createMessage(); MessagingService.instance().sendRR(msg, neighbour, callback); } try { prepareLatch.await(1, TimeUnit.HOURS); } catch (InterruptedException e) { parentRepairSessions.remove(parentRepairSession); throw new RuntimeException(Did not get replies from all endpoints., e); } 2) During snapshot phase in repair, if SnapshotVerbHandler throws an exception, we will wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6747) MessagingService should handle failures on remote nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960085#comment-13960085 ] sankalp kohli commented on CASSANDRA-6747: -- Please review v2 with your suggestions. MessagingService should handle failures on remote nodes. Key: CASSANDRA-6747 URL: https://issues.apache.org/jira/browse/CASSANDRA-6747 Project: Cassandra Issue Type: Improvement Reporter: sankalp kohli Assignee: sankalp kohli Priority: Minor Labels: Core Fix For: 2.1 beta2 Attachments: CASSANDRA-6747-v2.diff, CASSANDRA-6747.diff While going through the code of MessagingService, I discovered that we don't handle callbacks on failure very well. If a Verb Handler on the remote machine throws an exception, it goes right through uncaught exception handler. The machine which triggered the message will keep waiting and will timeout. On timeout, it will so some stuff hard coded in the MS like hints and add to Latency. There is no way in IAsyncCallback to specify that to do on timeouts and also on failures. Here are some examples which I found will help if we enhance this system to also propagate failures back. So IAsyncCallback will have methods like onFailure. 1) From ActiveRepairService.prepareForRepair IAsyncCallback callback = new IAsyncCallback() { @Override public void response(MessageIn msg) { prepareLatch.countDown(); } @Override public boolean isLatencyForSnitch() { return false; } }; ListUUID cfIds = new ArrayList(columnFamilyStores.size()); for (ColumnFamilyStore cfs : columnFamilyStores) cfIds.add(cfs.metadata.cfId); for(InetAddress neighbour : endpoints) { PrepareMessage message = new PrepareMessage(parentRepairSession, cfIds, ranges); MessageOutRepairMessage msg = message.createMessage(); MessagingService.instance().sendRR(msg, neighbour, callback); } try { prepareLatch.await(1, TimeUnit.HOURS); } catch (InterruptedException e) { parentRepairSessions.remove(parentRepairSession); throw new RuntimeException(Did not get replies from all endpoints., e); } 2) During snapshot phase in repair, if SnapshotVerbHandler throws an exception, we will wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6747) MessagingService should handle failures on remote nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sankalp kohli updated CASSANDRA-6747: - Attachment: CASSANDRA-6747-v2.diff MessagingService should handle failures on remote nodes. Key: CASSANDRA-6747 URL: https://issues.apache.org/jira/browse/CASSANDRA-6747 Project: Cassandra Issue Type: Improvement Reporter: sankalp kohli Assignee: sankalp kohli Priority: Minor Labels: Core Fix For: 2.1 beta2 Attachments: CASSANDRA-6747-v2.diff, CASSANDRA-6747.diff While going through the code of MessagingService, I discovered that we don't handle callbacks on failure very well. If a Verb Handler on the remote machine throws an exception, it goes right through uncaught exception handler. The machine which triggered the message will keep waiting and will timeout. On timeout, it will so some stuff hard coded in the MS like hints and add to Latency. There is no way in IAsyncCallback to specify that to do on timeouts and also on failures. Here are some examples which I found will help if we enhance this system to also propagate failures back. So IAsyncCallback will have methods like onFailure. 1) From ActiveRepairService.prepareForRepair IAsyncCallback callback = new IAsyncCallback() { @Override public void response(MessageIn msg) { prepareLatch.countDown(); } @Override public boolean isLatencyForSnitch() { return false; } }; ListUUID cfIds = new ArrayList(columnFamilyStores.size()); for (ColumnFamilyStore cfs : columnFamilyStores) cfIds.add(cfs.metadata.cfId); for(InetAddress neighbour : endpoints) { PrepareMessage message = new PrepareMessage(parentRepairSession, cfIds, ranges); MessageOutRepairMessage msg = message.createMessage(); MessagingService.instance().sendRR(msg, neighbour, callback); } try { prepareLatch.await(1, TimeUnit.HOURS); } catch (InterruptedException e) { parentRepairSessions.remove(parentRepairSession); throw new RuntimeException(Did not get replies from all endpoints., e); } 2) During snapshot phase in repair, if SnapshotVerbHandler throws an exception, we will wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6553) Benchmark counter improvements (counters++)
[ https://issues.apache.org/jira/browse/CASSANDRA-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-6553: -- Attachment: logs.tar.gz adding logs in logs.tar.gz These should be ordered same as the graph links above (starting with the no counter cache tests). Like so: no counter cache/low contention/2.0/write/cl.one no counter cache/low contention/2.0/read/cl.one no counter cache/low contention/2.1/write/cl.one no counter cache/low contention/2.1/read/cl.one counter cache enabled/uber contention/aleksey's patched 2.1/read/cl.quorum Benchmark counter improvements (counters++) --- Key: CASSANDRA-6553 URL: https://issues.apache.org/jira/browse/CASSANDRA-6553 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Russ Hatch Fix For: 2.1 beta2 Attachments: 6553.txt, 6553.uber.quorum.bdplab.read.png, 6553.uber.quorum.bdplab.write.png, high_cl_one.png, high_cl_quorum.png, logs.tar.gz, low_cl_one.png, low_cl_quorum.png, tracing.txt, uber_cl_one.png, uber_cl_quorum.png Benchmark the difference in performance between CASSANDRA-6504 and trunk. * Updating totally unrelated counters (different partitions) * Updating the same counters a lot (same cells in the same partition) * Different cells in the same few partitions (hot counter partition) benchmark: https://github.com/apache/cassandra/tree/1218bcacba7edefaf56cf8440d0aea5794c89a1e (old counters) compared to: https://github.com/apache/cassandra/tree/714c423360c36da2a2b365efaf9c5c4f623ed133 (new counters) So far, the above changes should only affect the write path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960112#comment-13960112 ] Jonathan Ellis commented on CASSANDRA-6694: --- Is there a case to be made here that there's more abstraction than necessary? Because I'm still having trouble wrapping my head around it. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960117#comment-13960117 ] Benedict commented on CASSANDRA-6694: - Well, it's probably indicative of something wrong, but I don't think it's the level of abstraction. Probably I can re-organise it to make it clearer, though. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960187#comment-13960187 ] Benedict commented on CASSANDRA-6694: - Rebased, reorganised and pushed to [6694-reorg|https://github.com/belliottsmith/cassandra/tree/6694-reorg] Does that make it clearer what's going on? Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960192#comment-13960192 ] Benedict commented on CASSANDRA-6694: - We basically have: BBAllocator (and implementors) BBPool + BBPoolAllocator (and implementors) NativePool + NativeAllocator BBPoolAllocator creates a BBAllocator per session, by wrapping the session's OpOrder.Group BBAllocator is used to construct Buffer* implementations (necessary without further refactor, as that's how CellName implementors work, and don't want to rip those apart in this commit); DataAllocator wraps the above to create arbitrary implementations (i.e. Native* or Buffer*, atm) Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6913) Compaction of system keyspaces during startup can cause early loading of non-system keyspaces
[ https://issues.apache.org/jira/browse/CASSANDRA-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960220#comment-13960220 ] Ravi Prasad commented on CASSANDRA-6913: we were noticing occasional FilenotFoundException due to compaction leftovers at startup on restart, after upgrading to cassandra-2.0 (CASSANDRA-5151). I think this fixes that issue. Would it make sense to change the changes.txt to 'Avoid early loading of non-system keyspaces before compaction-leftovers cleanup at startup' instead of https://github.com/apache/cassandra/blob/56d84a7c028c0498158efb1a3cadea149ab7c1cd/CHANGES.txt#L2 ? Compaction of system keyspaces during startup can cause early loading of non-system keyspaces - Key: CASSANDRA-6913 URL: https://issues.apache.org/jira/browse/CASSANDRA-6913 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.0.7, 2.1 beta2 Attachments: 6913.txt This then can result in an inconsistent CFS state, as cleanup of e.g. compaction leftovers does not get reflected in DataTracker. It happens because StorageService.getLoad() iterates over and opens all CFS, and this is called by Compaction. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6913) Compaction of system keyspaces during startup can cause early loading of non-system keyspaces
[ https://issues.apache.org/jira/browse/CASSANDRA-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960234#comment-13960234 ] Benedict commented on CASSANDRA-6913: - Hi [~ravilr], yes that's exactly the symptom you'd expect when hitting this issue. +1 to CHANGES.txt suggestion, even if it is a bit of a mouthful. Compaction of system keyspaces during startup can cause early loading of non-system keyspaces - Key: CASSANDRA-6913 URL: https://issues.apache.org/jira/browse/CASSANDRA-6913 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.0.7, 2.1 beta2 Attachments: 6913.txt This then can result in an inconsistent CFS state, as cleanup of e.g. compaction leftovers does not get reflected in DataTracker. It happens because StorageService.getLoad() iterates over and opens all CFS, and this is called by Compaction. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6977) attempting to create 10K column families fails with 100 node cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960312#comment-13960312 ] Daniel Meyer commented on CASSANDRA-6977: - I am not sure if memory is the issue here. I monitored memory with visualvm and found the maximum used heap to be only 1GB. There were no OOM errors in the logs. Further, if memory were the issue I would think that the 5 node cluster would run into this; however, in the case of the 5 node cluster this issue does not occur and we are able to create the 10K cfs without a problem (albeit it takes a while). attempting to create 10K column families fails with 100 node cluster Key: CASSANDRA-6977 URL: https://issues.apache.org/jira/browse/CASSANDRA-6977 Project: Cassandra Issue Type: Bug Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances Reporter: Daniel Meyer Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, keyspace_create.py, logs.tar, tpstats.txt, visualvm_tracer_data.csv During this test we are attempting to create a total of 1K keyspaces with 10 column families each to bring the total column families to 10K. With a 5 node cluster this operation can be completed; however, it fails with 100 nodes. Please see the two charts. For the 5 node case the time required to create each keyspace and subsequent 10 column families increases linearly until the number of keyspaces is 1K. For a 100 node cluster there is a sudden increase in latency between 450 keyspaces and 550 keyspaces. The test ends when the test script times out. After the test script times out it is impossible to reconnect to the cluster with the datastax python driver because it cannot connect to the host: cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'10.199.5.98': OperationTimedOut()} It was found that running the following stress command does work from the same machine the test script runs on. cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT It should be noted that this test was initially done with DSE 4.0 and c* version 2.0.5.24 and in that case it was not possible to run stress against the cluster even locally on a node due to not finding the host. Attached are system logs from one of the nodes, charts showing schema creation latency for 5 and 100 node clusters and virtualvm tracer data for cpu, memory, num_threads and gc runs, tpstat output and the test script. The test script was on an m1.large aws instance outside of the cluster under test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960318#comment-13960318 ] Brandon Williams commented on CASSANDRA-6971: - Hmm, so, the ks was created on node1, node2 saw it and applied it, but node3 never noticed any of this and never got it, so we must have a problem with the schema pull code. That's about all I can tell at INFO, if you can get DEBUG there may be more. nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: node1.log, node2.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960319#comment-13960319 ] Brandon Williams commented on CASSANDRA-6971: - gossipinfo from this state would help rule out a problem there. nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: node1.log, node2.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6553) Benchmark counter improvements (counters++)
[ https://issues.apache.org/jira/browse/CASSANDRA-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960339#comment-13960339 ] Russ Hatch commented on CASSANDRA-6553: --- [~iamaleksey] -- here you go: http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.low_contention_CL_quorum_4_4.json http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.high_contention_CL_quorum_4_4.json http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.user_contention_CL_quorum_4_4.json btw, if you want to change the top title or legend information for better presentation, you can update the json. Clicking a legend will disable/enable that dataset. If you want to make changes but don't want to push them to github, you can clone the cassandra_performance repo and run it locally from the 'graph' directory with python -m SimpleHTTPServer Benchmark counter improvements (counters++) --- Key: CASSANDRA-6553 URL: https://issues.apache.org/jira/browse/CASSANDRA-6553 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Russ Hatch Fix For: 2.1 beta2 Attachments: 6553.txt, 6553.uber.quorum.bdplab.read.png, 6553.uber.quorum.bdplab.write.png, high_cl_one.png, high_cl_quorum.png, logs.tar.gz, low_cl_one.png, low_cl_quorum.png, tracing.txt, uber_cl_one.png, uber_cl_quorum.png Benchmark the difference in performance between CASSANDRA-6504 and trunk. * Updating totally unrelated counters (different partitions) * Updating the same counters a lot (same cells in the same partition) * Different cells in the same few partitions (hot counter partition) benchmark: https://github.com/apache/cassandra/tree/1218bcacba7edefaf56cf8440d0aea5794c89a1e (old counters) compared to: https://github.com/apache/cassandra/tree/714c423360c36da2a2b365efaf9c5c4f623ed133 (new counters) So far, the above changes should only affect the write path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6553) Benchmark counter improvements (counters++)
[ https://issues.apache.org/jira/browse/CASSANDRA-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960339#comment-13960339 ] Russ Hatch edited comment on CASSANDRA-6553 at 4/4/14 8:03 PM: --- [~iamaleksey] -- here you go: http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.low_contention_CL_quorum_4_4.json http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.high_contention_CL_quorum_4_4.json http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.user_contention_CL_quorum_4_4.json btw, if you want to change the top title or legend information for better presentation, you can update the json. Clicking a color in the legend will disable/enable that dataset. If you want to make changes but don't want to push them to github, you can clone the cassandra_performance repo and run it locally from the 'graph' directory with python -m SimpleHTTPServer was (Author: rhatch): [~iamaleksey] -- here you go: http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.low_contention_CL_quorum_4_4.json http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.high_contention_CL_quorum_4_4.json http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.user_contention_CL_quorum_4_4.json btw, if you want to change the top title or legend information for better presentation, you can update the json. Clicking a legend will disable/enable that dataset. If you want to make changes but don't want to push them to github, you can clone the cassandra_performance repo and run it locally from the 'graph' directory with python -m SimpleHTTPServer Benchmark counter improvements (counters++) --- Key: CASSANDRA-6553 URL: https://issues.apache.org/jira/browse/CASSANDRA-6553 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Russ Hatch Fix For: 2.1 beta2 Attachments: 6553.txt, 6553.uber.quorum.bdplab.read.png, 6553.uber.quorum.bdplab.write.png, high_cl_one.png, high_cl_quorum.png, logs.tar.gz, low_cl_one.png, low_cl_quorum.png, tracing.txt, uber_cl_one.png, uber_cl_quorum.png Benchmark the difference in performance between CASSANDRA-6504 and trunk. * Updating totally unrelated counters (different partitions) * Updating the same counters a lot (same cells in the same partition) * Different cells in the same few partitions (hot counter partition) benchmark: https://github.com/apache/cassandra/tree/1218bcacba7edefaf56cf8440d0aea5794c89a1e (old counters) compared to: https://github.com/apache/cassandra/tree/714c423360c36da2a2b365efaf9c5c4f623ed133 (new counters) So far, the above changes should only affect the write path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6957) testNewRepairedSSTable fails intermittently
[ https://issues.apache.org/jira/browse/CASSANDRA-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960351#comment-13960351 ] Jonathan Ellis commented on CASSANDRA-6957: --- Is there a way there could be a race and we end up wanting to put an sstable back in the same level it started? If so we'd want something like v3. testNewRepairedSSTable fails intermittently --- Key: CASSANDRA-6957 URL: https://issues.apache.org/jira/browse/CASSANDRA-6957 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Marcus Eriksson Fix For: 2.1 beta2 Attachments: 0001-doh-clear-out-L0-as-well.patch, 6957-v2.txt, 6957-v3.txt, system.log.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6957) testNewRepairedSSTable fails intermittently
[ https://issues.apache.org/jira/browse/CASSANDRA-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6957: -- Attachment: 6957-v3.txt testNewRepairedSSTable fails intermittently --- Key: CASSANDRA-6957 URL: https://issues.apache.org/jira/browse/CASSANDRA-6957 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Marcus Eriksson Fix For: 2.1 beta2 Attachments: 0001-doh-clear-out-L0-as-well.patch, 6957-v2.txt, 6957-v3.txt, system.log.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6831) Updates to COMPACT STORAGE tables via cli drop CQL information
[ https://issues.apache.org/jira/browse/CASSANDRA-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960368#comment-13960368 ] Mikhail Stepura commented on CASSANDRA-6831: Here's what I've seen on the 2.1 branch. The table was created in ``cqlsh`` {code:title=cqlsh} [cqlsh 5.0.0 | Cassandra 2.1.0-beta1-SNAPSHOT | CQL spec 3.1.5 | Native protocol v2] Use HELP for help. cqlsh cqlsh CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh use test; cqlsh:test CREATE TABLE foo (bar text, baz text, qux text, PRIMARY KEY(bar, baz) ) WITH COMPACT STORAGE; cqlsh:test DESCRIBE TABLE foo; CREATE TABLE test.foo ( bar text, baz text, qux text, PRIMARY KEY (bar, baz) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (baz ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND populate_io_cache_on_flush = false AND read_repair_chance = 0.1 AND speculative_retry = '99.0PERCENTILE' {code} Then I did the stuff in ``cassandra-cli`` {code:title=cassandra-cli} mstepura-mac:cassandra mikhail$ bin/cassandra-cli Connected to: Test Cluster on 127.0.0.1/9160 Welcome to Cassandra CLI version 2.1.0-beta1-SNAPSHOT The CLI is deprecated and will be removed in Cassandra 3.0. Consider migrating to cqlsh. CQL is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use test; Authenticated to keyspace: test [default@test] UPDATE COLUMN FAMILY foo WITH comment='hey this is a comment'; org.apache.thrift.transport.TTransportException {code} Meanwhile in the logs {code} ERROR 20:14:17 Exception in thread Thread[MigrationStage:1,5,main] java.lang.AssertionError: There shouldn't be more than one compact value defined: got ColumnDefinition{name=qux, type=org.apache.cassandra.db.marshal.UTF8Type, kind=COMPACT_VALUE, componentIndex=null, indexName=null, indexType=null} and ColumnDefinition{name=value, type=org.apache.cassandra.db.marshal.UTF8Type, kind=COMPACT_VALUE, componentIndex=null, indexName=null, indexType=null} at org.apache.cassandra.config.CFMetaData.rebuild(CFMetaData.java:1981) ~[main/:na] at org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1751) ~[main/:na] at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1791) ~[main/:na] at org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:320) ~[main/:na] at org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:306) ~[main/:na] at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:181) ~[main/:na] at org.apache.cassandra.service.MigrationManager$2.runMayThrow(MigrationManager.java:306) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] ERROR 20:14:17 Error occurred during processing of message. java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError: There shouldn't be more than one compact value defined: got ColumnDefinition{name=qux, type=org.apache.cassandra.db.marshal.UTF8Type, kind=COMPACT_VALUE, componentIndex=null, indexName=null, indexType=null} and ColumnDefinition{name=value, type=org.apache.cassandra.db.marshal.UTF8Type, kind=COMPACT_VALUE, componentIndex=null, indexName=null, indexType=null} at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411) ~[main/:na] at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:288) ~[main/:na] at org.apache.cassandra.service.MigrationManager.announceColumnFamilyUpdate(MigrationManager.java:242) ~[main/:na] at org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1676) ~[main/:na] at
[jira] [Updated] (CASSANDRA-6959) Reusing Keyspace and CF names raises assertion errors
[ https://issues.apache.org/jira/browse/CASSANDRA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-6959: Reproduced In: 2.1 beta1, 2.0.6 (was: 2.0.6, 2.1 beta1) Fix Version/s: 2.1 beta2 Assignee: Benedict Let's just get the CL part for 2.1 fixed then. Reusing Keyspace and CF names raises assertion errors - Key: CASSANDRA-6959 URL: https://issues.apache.org/jira/browse/CASSANDRA-6959 Project: Cassandra Issue Type: Bug Reporter: Ryan McGuire Assignee: Benedict Fix For: 2.1 beta2 The [dtest I introduced|https://github.com/riptano/cassandra-dtest/commit/36960090d219ab8dbc7f108faa91c3ea5cea2bec] to test CASSANDRA-6924 introduces some log errors which I think may be related to CASSANDRA-5202. On 2.1 : {code} ERROR [MigrationStage:1] 2014-03-31 14:36:43,463 CommitLogSegmentManager.java:306 - Failed waiting for a forced recycle of in-use commit log segments java.lang.AssertionError: null at org.apache.cassandra.db.commitlog.CommitLogSegmentManager.forceRecycleAll(CommitLogSegmentManager.java:301) ~[main/:na] at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:160) [main/:na] at org.apache.cassandra.db.DefsTables.dropColumnFamily(DefsTables.java:497) [main/:na] at org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:296) [main/:na] at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:181) [main/:na] at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) [main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] {code} On 2.0: {code} ERROR [ReadStage:3] 2014-03-31 13:28:11,014 CassandraDaemon.java (line 198) Exception in thread Thread[ReadStage:3,5,main] java.lang.AssertionError at org.apache.cassandra.db.filter.ExtendedFilter$WithClauses.getExtraFilter(ExtendedFilter.java:258) at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1744) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1699) at org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:119) at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:39) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} To reproduce, you many need to comment out the assertion in that test, as it is not 100% reproducible on the first try. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6913) Compaction of system keyspaces during startup can cause early loading of non-system keyspaces
[ https://issues.apache.org/jira/browse/CASSANDRA-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960388#comment-13960388 ] Jonathan Ellis commented on CASSANDRA-6913: --- Done. Compaction of system keyspaces during startup can cause early loading of non-system keyspaces - Key: CASSANDRA-6913 URL: https://issues.apache.org/jira/browse/CASSANDRA-6913 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 2.0.7, 2.1 beta2 Attachments: 6913.txt This then can result in an inconsistent CFS state, as cleanup of e.g. compaction leftovers does not get reflected in DataTracker. It happens because StorageService.getLoad() iterates over and opens all CFS, and this is called by Compaction. -- This message was sent by Atlassian JIRA (v6.2#6252)
git commit: Ensure safe resource cleanup when replacing SSTables
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 57b18e600 - 5ebadc11e Ensure safe resource cleanup when replacing SSTables Patch by belliotsmith; reviewed by Tyler Hobbs for CASSANDRA-6912 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5ebadc11 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5ebadc11 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5ebadc11 Branch: refs/heads/cassandra-2.1 Commit: 5ebadc11e36749e6479f9aba19406db3aacdaf41 Parents: 57b18e6 Author: belliottsmith git...@sub.laerad.com Authored: Fri Apr 4 15:37:09 2014 -0500 Committer: Tyler Hobbs ty...@datastax.com Committed: Fri Apr 4 15:37:09 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/DataTracker.java| 28 +- .../cassandra/io/sstable/IndexSummary.java | 2 +- .../io/sstable/IndexSummaryManager.java | 22 +- .../cassandra/io/sstable/SSTableReader.java | 318 +-- .../cassandra/utils/AlwaysPresentFilter.java| 3 +- .../org/apache/cassandra/utils/BloomFilter.java | 3 +- .../org/apache/cassandra/utils/IFilter.java | 2 + .../org/apache/cassandra/utils/obs/IBitSet.java | 2 + .../cassandra/utils/obs/OffHeapBitSet.java | 2 +- .../apache/cassandra/utils/obs/OpenBitSet.java | 2 +- .../io/sstable/IndexSummaryManagerTest.java | 2 +- .../cassandra/io/sstable/SSTableReaderTest.java | 28 +- 13 files changed, 278 insertions(+), 137 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ebadc11/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 4cfc957..0f1ae93 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -41,6 +41,7 @@ * Properly remove 1.2 sstable support in 2.1 (CASSANDRA-6869) * Lock counter cells, not partitions (CASSANDRA-6880) * Track presence of legacy counter shards in sstables (CASSANDRA-6888) + * Ensure safe resource cleanup when replacing sstables (CASSANDRA-6912) Merged from 2.0: * Allow compaction of system tables during startup (CASSANDRA-6913) * Restrict Windows to parallel repairs (CASSANDRA-6907) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ebadc11/src/java/org/apache/cassandra/db/DataTracker.java -- diff --git a/src/java/org/apache/cassandra/db/DataTracker.java b/src/java/org/apache/cassandra/db/DataTracker.java index c8fc699..9c8f9a0 100644 --- a/src/java/org/apache/cassandra/db/DataTracker.java +++ b/src/java/org/apache/cassandra/db/DataTracker.java @@ -192,14 +192,17 @@ public class DataTracker public boolean markCompacting(IterableSSTableReader sstables) { assert sstables != null !Iterables.isEmpty(sstables); +while (true) +{ +View currentView = view.get(); +SetSSTableReader inactive = Sets.difference(ImmutableSet.copyOf(sstables), currentView.compacting); +if (inactive.size() Iterables.size(sstables)) +return false; -View currentView = view.get(); -SetSSTableReader inactive = Sets.difference(ImmutableSet.copyOf(sstables), currentView.compacting); -if (inactive.size() Iterables.size(sstables)) -return false; - -View newView = currentView.markCompacting(inactive); -return view.compareAndSet(currentView, newView); +View newView = currentView.markCompacting(inactive); +if (view.compareAndSet(currentView, newView)) +return true; +} } /** @@ -333,14 +336,6 @@ public class DataTracker */ public void replaceReaders(CollectionSSTableReader oldSSTables, CollectionSSTableReader newSSTables) { -// data component will be unchanged but the index summary will be a different size -// (since we save that to make restart fast) -long sizeIncrease = 0; -for (SSTableReader sstable : oldSSTables) -sizeIncrease -= sstable.bytesOnDisk(); -for (SSTableReader sstable : newSSTables) -sizeIncrease += sstable.bytesOnDisk(); - View currentView, newView; do { @@ -349,9 +344,6 @@ public class DataTracker } while (!view.compareAndSet(currentView, newView)); -StorageMetrics.load.inc(sizeIncrease); -cfstore.metric.liveDiskSpaceUsed.inc(sizeIncrease); - for (SSTableReader sstable : newSSTables) sstable.setTrackedBy(this); http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ebadc11/src/java/org/apache/cassandra/io/sstable/IndexSummary.java -- diff --git
[1/2] git commit: Ensure safe resource cleanup when replacing SSTables
Repository: cassandra Updated Branches: refs/heads/trunk 0015f37a3 - 64bc45849 Ensure safe resource cleanup when replacing SSTables Patch by belliotsmith; reviewed by Tyler Hobbs for CASSANDRA-6912 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5ebadc11 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5ebadc11 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5ebadc11 Branch: refs/heads/trunk Commit: 5ebadc11e36749e6479f9aba19406db3aacdaf41 Parents: 57b18e6 Author: belliottsmith git...@sub.laerad.com Authored: Fri Apr 4 15:37:09 2014 -0500 Committer: Tyler Hobbs ty...@datastax.com Committed: Fri Apr 4 15:37:09 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/DataTracker.java| 28 +- .../cassandra/io/sstable/IndexSummary.java | 2 +- .../io/sstable/IndexSummaryManager.java | 22 +- .../cassandra/io/sstable/SSTableReader.java | 318 +-- .../cassandra/utils/AlwaysPresentFilter.java| 3 +- .../org/apache/cassandra/utils/BloomFilter.java | 3 +- .../org/apache/cassandra/utils/IFilter.java | 2 + .../org/apache/cassandra/utils/obs/IBitSet.java | 2 + .../cassandra/utils/obs/OffHeapBitSet.java | 2 +- .../apache/cassandra/utils/obs/OpenBitSet.java | 2 +- .../io/sstable/IndexSummaryManagerTest.java | 2 +- .../cassandra/io/sstable/SSTableReaderTest.java | 28 +- 13 files changed, 278 insertions(+), 137 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ebadc11/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 4cfc957..0f1ae93 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -41,6 +41,7 @@ * Properly remove 1.2 sstable support in 2.1 (CASSANDRA-6869) * Lock counter cells, not partitions (CASSANDRA-6880) * Track presence of legacy counter shards in sstables (CASSANDRA-6888) + * Ensure safe resource cleanup when replacing sstables (CASSANDRA-6912) Merged from 2.0: * Allow compaction of system tables during startup (CASSANDRA-6913) * Restrict Windows to parallel repairs (CASSANDRA-6907) http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ebadc11/src/java/org/apache/cassandra/db/DataTracker.java -- diff --git a/src/java/org/apache/cassandra/db/DataTracker.java b/src/java/org/apache/cassandra/db/DataTracker.java index c8fc699..9c8f9a0 100644 --- a/src/java/org/apache/cassandra/db/DataTracker.java +++ b/src/java/org/apache/cassandra/db/DataTracker.java @@ -192,14 +192,17 @@ public class DataTracker public boolean markCompacting(IterableSSTableReader sstables) { assert sstables != null !Iterables.isEmpty(sstables); +while (true) +{ +View currentView = view.get(); +SetSSTableReader inactive = Sets.difference(ImmutableSet.copyOf(sstables), currentView.compacting); +if (inactive.size() Iterables.size(sstables)) +return false; -View currentView = view.get(); -SetSSTableReader inactive = Sets.difference(ImmutableSet.copyOf(sstables), currentView.compacting); -if (inactive.size() Iterables.size(sstables)) -return false; - -View newView = currentView.markCompacting(inactive); -return view.compareAndSet(currentView, newView); +View newView = currentView.markCompacting(inactive); +if (view.compareAndSet(currentView, newView)) +return true; +} } /** @@ -333,14 +336,6 @@ public class DataTracker */ public void replaceReaders(CollectionSSTableReader oldSSTables, CollectionSSTableReader newSSTables) { -// data component will be unchanged but the index summary will be a different size -// (since we save that to make restart fast) -long sizeIncrease = 0; -for (SSTableReader sstable : oldSSTables) -sizeIncrease -= sstable.bytesOnDisk(); -for (SSTableReader sstable : newSSTables) -sizeIncrease += sstable.bytesOnDisk(); - View currentView, newView; do { @@ -349,9 +344,6 @@ public class DataTracker } while (!view.compareAndSet(currentView, newView)); -StorageMetrics.load.inc(sizeIncrease); -cfstore.metric.liveDiskSpaceUsed.inc(sizeIncrease); - for (SSTableReader sstable : newSSTables) sstable.setTrackedBy(this); http://git-wip-us.apache.org/repos/asf/cassandra/blob/5ebadc11/src/java/org/apache/cassandra/io/sstable/IndexSummary.java -- diff --git
[2/2] git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/64bc4584 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/64bc4584 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/64bc4584 Branch: refs/heads/trunk Commit: 64bc45849fd2a488b766ca9ddfe8456dae50a187 Parents: 0015f37 5ebadc1 Author: Tyler Hobbs ty...@datastax.com Authored: Fri Apr 4 15:37:58 2014 -0500 Committer: Tyler Hobbs ty...@datastax.com Committed: Fri Apr 4 15:37:58 2014 -0500 -- CHANGES.txt | 1 + .../org/apache/cassandra/db/DataTracker.java| 28 +- .../cassandra/io/sstable/IndexSummary.java | 2 +- .../io/sstable/IndexSummaryManager.java | 22 +- .../cassandra/io/sstable/SSTableReader.java | 318 +-- .../cassandra/utils/AlwaysPresentFilter.java| 3 +- .../org/apache/cassandra/utils/BloomFilter.java | 3 +- .../org/apache/cassandra/utils/IFilter.java | 2 + .../org/apache/cassandra/utils/obs/IBitSet.java | 2 + .../cassandra/utils/obs/OffHeapBitSet.java | 2 +- .../apache/cassandra/utils/obs/OpenBitSet.java | 2 +- .../io/sstable/IndexSummaryManagerTest.java | 2 +- .../cassandra/io/sstable/SSTableReaderTest.java | 28 +- 13 files changed, 278 insertions(+), 137 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/64bc4584/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/64bc4584/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java --
[jira] [Updated] (CASSANDRA-6912) SSTableReader.isReplaced does not allow for safe resource cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-6912: --- Since Version: 2.1 beta1 SSTableReader.isReplaced does not allow for safe resource cleanup - Key: CASSANDRA-6912 URL: https://issues.apache.org/jira/browse/CASSANDRA-6912 Project: Cassandra Issue Type: Bug Reporter: Benedict Assignee: Benedict Fix For: 2.1 beta2 There are a number of possible race conditions on resource cleanup from the use of cloneWithNewSummarySamplingLevel, because the replacement sstable can be itself replaced/obsoleted while the prior sstable is still referenced (this is actually quite easy with compaction, but can happen in other circumstances less commonly). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-3668) Parallel streaming for sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960435#comment-13960435 ] Joshua McKenzie commented on CASSANDRA-3668: A quick update on this - going the route of multiple StreamSessions per StreamPlan with the current architecture is going to require some restructuring. The current design assumes a single socket for streaming and multiple StreamSessions means multiple ConnectionHandlers, all of which assume ownership of polling the readChannel on a socket. To respect the single-socket-for-streaming paradigm we currently have, I'm working on promoting IncomingMessageHandler and OutgoingMessageHandler into higher-level abstractions that are responsible for polling the socket and dispatching to various StreamSessions based on deserialized session indices on the inbound or following the current PriorityQueue polling mechanism for the outbound rather than the current paradigm of being owned by a StreamSession. It doesn't look like we're at risk of a bottleneck on network resources even over a single socket as my prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. Parallel streaming for sstableloader Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Improvement Components: API Reporter: Manish Zope Assignee: Joshua McKenzie Priority: Minor Labels: streaming Fix For: 2.1 beta2 Attachments: 3668-1.1-v2.txt, 3668-1.1.txt, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-3668) Parallel streaming for sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960435#comment-13960435 ] Joshua McKenzie edited comment on CASSANDRA-3668 at 4/4/14 9:19 PM: A quick update on this - going the route of multiple StreamSessions per StreamPlan is going to require some restructuring. The current design assumes a single socket for streaming and changing to multiple StreamSessions means multiple ConnectionHandlers, all of which assume ownership of polling the readChannel on a socket. To respect the single-socket-for-streaming paradigm we currently have, I'm working on promoting IncomingMessageHandler and OutgoingMessageHandler into higher-level abstractions that are responsible for polling the socket and dispatching to various StreamSessions based on deserialized session indices on the inbound or following the current PriorityQueue polling mechanism for the outbound rather than the current paradigm of being owned by a StreamSession. It doesn't look like we're at risk of a bottleneck on network resources even over a single socket as my prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. was (Author: joshuamckenzie): A quick update on this - going the route of multiple StreamSessions per StreamPlan with the current architecture is going to require some restructuring. The current design assumes a single socket for streaming and multiple StreamSessions means multiple ConnectionHandlers, all of which assume ownership of polling the readChannel on a socket. To respect the single-socket-for-streaming paradigm we currently have, I'm working on promoting IncomingMessageHandler and OutgoingMessageHandler into higher-level abstractions that are responsible for polling the socket and dispatching to various StreamSessions based on deserialized session indices on the inbound or following the current PriorityQueue polling mechanism for the outbound rather than the current paradigm of being owned by a StreamSession. It doesn't look like we're at risk of a bottleneck on network resources even over a single socket as my prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. Parallel streaming for sstableloader Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Improvement Components: API Reporter: Manish Zope Assignee: Joshua McKenzie Priority: Minor Labels: streaming Fix For: 2.1 beta2 Attachments: 3668-1.1-v2.txt, 3668-1.1.txt, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6983) DirectoriesTest fails when run as root
Brandon Williams created CASSANDRA-6983: --- Summary: DirectoriesTest fails when run as root Key: CASSANDRA-6983 URL: https://issues.apache.org/jira/browse/CASSANDRA-6983 Project: Cassandra Issue Type: Bug Components: Tests Reporter: Brandon Williams Assignee: Yuki Morishita Fix For: 2.0.7 When you run the DirectoriesTest as a normal user, it passes because it fails to create the 'bad' directory: {noformat} [junit] - Standard Error - [junit] ERROR 16:16:18,111 Failed to create /tmp/cassandra4119802552776680052unittest/ks/bad directory [junit] WARN 16:16:18,112 Blacklisting /tmp/cassandra4119802552776680052unittest/ks/bad for writes [junit] - --- {noformat} But when you run the test as root, it succeeds in making the directory, causing an assertion failure that it's unwritable: {noformat} [junit] Testcase: testDiskFailurePolicy_best_effort(org.apache.cassandra.db.DirectoriesTest): FAILED [junit] [junit] junit.framework.AssertionFailedError: [junit] at org.apache.cassandra.db.DirectoriesTest.testDiskFailurePolicy_best_effort(DirectoriesTest.java:199) {noformat} It seems to me that we shouldn't be relying on failing the make the directory. If we're just going to test a nonexistent dir, why try to make one at all? And if that is supposed to succeed, then we have a problem with either the test or blacklisting. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-3668) Parallel streaming for sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960435#comment-13960435 ] Joshua McKenzie edited comment on CASSANDRA-3668 at 4/4/14 9:25 PM: (edit) Scratch what I wrote previously - we're good with the multiple StreamSessions per peer, I just need to iron out a socket-connection race on startup of streams. Prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. was (Author: joshuamckenzie): A quick update on this - going the route of multiple StreamSessions per StreamPlan is going to require some restructuring. The current design assumes a single socket for streaming and changing to multiple StreamSessions means multiple ConnectionHandlers, all of which assume ownership of polling the readChannel on a socket. To respect the single-socket-for-streaming paradigm we currently have, I'm working on promoting IncomingMessageHandler and OutgoingMessageHandler into higher-level abstractions that are responsible for polling the socket and dispatching to various StreamSessions based on deserialized session indices on the inbound or following the current PriorityQueue polling mechanism for the outbound rather than the current paradigm of being owned by a StreamSession. It doesn't look like we're at risk of a bottleneck on network resources even over a single socket as my prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. Parallel streaming for sstableloader Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Improvement Components: API Reporter: Manish Zope Assignee: Joshua McKenzie Priority: Minor Labels: streaming Fix For: 2.1 beta2 Attachments: 3668-1.1-v2.txt, 3668-1.1.txt, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-3668) Parallel streaming for sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960435#comment-13960435 ] Joshua McKenzie edited comment on CASSANDRA-3668 at 4/4/14 9:26 PM: (edit) We're good with the multiple StreamSessions per peer, I just need to iron out a socket-connection race on startup of streams. Prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. was (Author: joshuamckenzie): (edit) Scratch what I wrote previously - we're good with the multiple StreamSessions per peer, I just need to iron out a socket-connection race on startup of streams. Prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. Parallel streaming for sstableloader Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Improvement Components: API Reporter: Manish Zope Assignee: Joshua McKenzie Priority: Minor Labels: streaming Fix For: 2.1 beta2 Attachments: 3668-1.1-v2.txt, 3668-1.1.txt, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-3668) Parallel streaming for sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960435#comment-13960435 ] Joshua McKenzie edited comment on CASSANDRA-3668 at 4/4/14 9:29 PM: We should be good with multiple StreamSessions per peer with some minimal code-changes to clean up and consolidate StreamSessions and ProgressInfo data. Prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. was (Author: joshuamckenzie): (edit) We're good with the multiple StreamSessions per peer, I just need to iron out a socket-connection race on startup of streams. Prelim parallelized stream testing is peaking at ~ 55MB/s on 5 connections-per-host vs. 49MB/s on 4 connections - diminishing returns as we get higher. Compared to the 24MB/s I'm benchmarking on a single connection it's still a respectable increase. Parallel streaming for sstableloader Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Improvement Components: API Reporter: Manish Zope Assignee: Joshua McKenzie Priority: Minor Labels: streaming Fix For: 2.1 beta2 Attachments: 3668-1.1-v2.txt, 3668-1.1.txt, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960445#comment-13960445 ] Benedict commented on CASSANDRA-6934: - Initial patch available [here|https://github.com/belliottsmith/cassandra/tree/6934] Optimises to some extent the various compare() implementations for AbstractCType, and at the same time slightly optimises compare in BTree to avoid unwrapping the special +/-Inf values except when absolutely necessary, and to perform (potentially) one fewer comparison per update() when not updating identical values, and to not perform a wasteful start-of-range compare() when _not_ inserting. There are further performance improvements that can be made to AbstractCType.compare() and its inheritors, but they're a little more invasive, and since CASSANDRA-6694 will entail some optimisation work to make comparisons less expensive, I will wait until then to do anything more. I need to make some tweaks to stress so I can properly test the impact of this patch on CQL, as there's no easy way to perform inserts of random columns. As shown with CASSANDRA-6553, there is a marked improvement for simple composites, and some quick and dirty benchmarking on my local box for thrift columns with only the general purpose improvements showed a lesser but still marked impact. Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false
[ https://issues.apache.org/jira/browse/CASSANDRA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960455#comment-13960455 ] Tyler Hobbs commented on CASSANDRA-6961: I'm seeing some issues with repair while one node is running with join_ring=false. Here's what I did: * Start a three node ccm cluster * Start a stress write with RF=3 * Stop node3 * Start node3 * Run a repair against node3 It looks like the repair finishes everything diffing and streaming, but the repair command hangs, and netstats shows continuously increasing completed Command/Response counts. nodes should go into hibernate when join_ring is false -- Key: CASSANDRA-6961 URL: https://issues.apache.org/jira/browse/CASSANDRA-6961 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 2.0.7 Attachments: 6961.txt The impetus here is this: a node that was down for some period and comes back can serve stale information. We know from CASSANDRA-768 that we can't just wait for hints, and know that tangentially related CASSANDRA-3569 prevents us from having the node in a down (from the FD's POV) state handle streaming. We can *almost* set join_ring to false, then repair, and then join the ring to narrow the window (actually, you can do this and everything succeeds because the node doesn't know it's a member yet, which is probably a bit of a bug.) If instead we modified this to put the node in hibernate, like replace_address does, it could work almost like replace, except you could run a repair (manually) while in the hibernate state, and then flip to normal when it's done. This won't prevent the staleness 100%, but it will greatly reduce the chance if the node has been down a significant amount of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false
[ https://issues.apache.org/jira/browse/CASSANDRA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960455#comment-13960455 ] Tyler Hobbs edited comment on CASSANDRA-6961 at 4/4/14 9:49 PM: I'm seeing some issues with repair while one node is running with join_ring=false. Here's what I did: * Start a three node ccm cluster * Start a stress write with RF=3 * Stop node3 * Start node3 with join_ring=false * Run a repair against node3 It looks like the repair finishes everything diffing and streaming, but the repair command hangs, and netstats shows continuously increasing completed Command/Response counts. was (Author: thobbs): I'm seeing some issues with repair while one node is running with join_ring=false. Here's what I did: * Start a three node ccm cluster * Start a stress write with RF=3 * Stop node3 * Start node3 * Run a repair against node3 It looks like the repair finishes everything diffing and streaming, but the repair command hangs, and netstats shows continuously increasing completed Command/Response counts. nodes should go into hibernate when join_ring is false -- Key: CASSANDRA-6961 URL: https://issues.apache.org/jira/browse/CASSANDRA-6961 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 2.0.7 Attachments: 6961.txt The impetus here is this: a node that was down for some period and comes back can serve stale information. We know from CASSANDRA-768 that we can't just wait for hints, and know that tangentially related CASSANDRA-3569 prevents us from having the node in a down (from the FD's POV) state handle streaming. We can *almost* set join_ring to false, then repair, and then join the ring to narrow the window (actually, you can do this and everything succeeds because the node doesn't know it's a member yet, which is probably a bit of a bug.) If instead we modified this to put the node in hibernate, like replace_address does, it could work almost like replace, except you could run a repair (manually) while in the hibernate state, and then flip to normal when it's done. This won't prevent the staleness 100%, but it will greatly reduce the chance if the node has been down a significant amount of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6863) Incorrect read repair of range thombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6863: -- Attachment: 6863-v2.txt Incorrect read repair of range thombstones -- Key: CASSANDRA-6863 URL: https://issues.apache.org/jira/browse/CASSANDRA-6863 Project: Cassandra Issue Type: Bug Environment: 2.0 Reporter: Oleg Anastasyev Attachments: 6863-v2.txt, 6863-v2.txt, ReadRepairRangeThombstoneDiff.txt, ReadRepairsDebugLogger.txt Rows with range thombstones are read repaired for every replica, if RR is triggered (this is because CF.diff() returns non null if !isEmpty(), which in turn returns false if range thombstones list is not empty). Also, full rangethombstone list is send to all nodes, which could be a problem if you have wide partition. Fixed this by evaluating diff on range thombstone lists as well as on deteleInfo of endpoint CF versions. Also return null from CF.diff, if no diff in RTL. A second patch (ReadRepairsDebugLogger.txt) adds some debug logging to look at read repairs. You may find it useful as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false
[ https://issues.apache.org/jira/browse/CASSANDRA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960468#comment-13960468 ] Brandon Williams commented on CASSANDRA-6961: - Hmm, I can't reproduce that, even wiping the node before starting it with join_ring=false: {noformat} [2014-04-04 21:51:30,888] Repair command #1 finished {noformat} and nodetool exits. nodes should go into hibernate when join_ring is false -- Key: CASSANDRA-6961 URL: https://issues.apache.org/jira/browse/CASSANDRA-6961 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 2.0.7 Attachments: 6961.txt The impetus here is this: a node that was down for some period and comes back can serve stale information. We know from CASSANDRA-768 that we can't just wait for hints, and know that tangentially related CASSANDRA-3569 prevents us from having the node in a down (from the FD's POV) state handle streaming. We can *almost* set join_ring to false, then repair, and then join the ring to narrow the window (actually, you can do this and everything succeeds because the node doesn't know it's a member yet, which is probably a bit of a bug.) If instead we modified this to put the node in hibernate, like replace_address does, it could work almost like replace, except you could run a repair (manually) while in the hibernate state, and then flip to normal when it's done. This won't prevent the staleness 100%, but it will greatly reduce the chance if the node has been down a significant amount of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-4050) Rewrite RandomAccessReader to use FileChannel / nio to address Windows file access violations
[ https://issues.apache.org/jira/browse/CASSANDRA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-4050: --- Description: On Windows w/older java I/O libraries the files are not opened with FILE_SHARE_DELETE. This causes problems as hard-links cannot be deleted while the original file is opened - our snapshots are a big problem in particular. The nio library and FileChannels open with FILE_SHARE_DELETE which should help remedy this problem. Original text: I'm using Cassandra 1.0.8, on Windows 7. When I take a snapshot of the database, I find that I am unable to delete the snapshot directory (i.e., dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) while Cassandra is running: The action can't be completed because the folder or a file in it is open in another program. Close the folder or file and try again [in Windows Explorer]. If I terminate Cassandra, then I can delete the directory with no problem. I expect to be able to move or delete the snapshotted files while Cassandra is running, as this should not affect the runtime operation of Cassandra. was: I'm using Cassandra 1.0.8, on Windows 7. When I take a snapshot of the database, I find that I am unable to delete the snapshot directory (i.e., dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) while Cassandra is running: The action can't be completed because the folder or a file in it is open in another program. Close the folder or file and try again [in Windows Explorer]. If I terminate Cassandra, then I can delete the directory with no problem. I expect to be able to move or delete the snapshotted files while Cassandra is running, as this should not affect the runtime operation of Cassandra. Summary: Rewrite RandomAccessReader to use FileChannel / nio to address Windows file access violations (was: Unable to remove snapshot files on Windows while original sstables are live) Rewrite RandomAccessReader to use FileChannel / nio to address Windows file access violations - Key: CASSANDRA-4050 URL: https://issues.apache.org/jira/browse/CASSANDRA-4050 Project: Cassandra Issue Type: Bug Environment: Windows 7 Reporter: Jim Newsham Assignee: Joshua McKenzie Priority: Minor Attachments: CASSANDRA-4050_v1.patch On Windows w/older java I/O libraries the files are not opened with FILE_SHARE_DELETE. This causes problems as hard-links cannot be deleted while the original file is opened - our snapshots are a big problem in particular. The nio library and FileChannels open with FILE_SHARE_DELETE which should help remedy this problem. Original text: I'm using Cassandra 1.0.8, on Windows 7. When I take a snapshot of the database, I find that I am unable to delete the snapshot directory (i.e., dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) while Cassandra is running: The action can't be completed because the folder or a file in it is open in another program. Close the folder or file and try again [in Windows Explorer]. If I terminate Cassandra, then I can delete the directory with no problem. I expect to be able to move or delete the snapshotted files while Cassandra is running, as this should not affect the runtime operation of Cassandra. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6863) Incorrect read repair of range thombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960467#comment-13960467 ] Jonathan Ellis commented on CASSANDRA-6863: --- The approach is sound, but I'm worried about the upgrade scenario. Consensus on irc was that we should make this a 2.1 feature, and enable it when we detect the entire cluster is on 2.1. v2 attached Incorrect read repair of range thombstones -- Key: CASSANDRA-6863 URL: https://issues.apache.org/jira/browse/CASSANDRA-6863 Project: Cassandra Issue Type: Bug Environment: 2.0 Reporter: Oleg Anastasyev Attachments: 6863-v2.txt, 6863-v2.txt, ReadRepairRangeThombstoneDiff.txt, ReadRepairsDebugLogger.txt Rows with range thombstones are read repaired for every replica, if RR is triggered (this is because CF.diff() returns non null if !isEmpty(), which in turn returns false if range thombstones list is not empty). Also, full rangethombstone list is send to all nodes, which could be a problem if you have wide partition. Fixed this by evaluating diff on range thombstone lists as well as on deteleInfo of endpoint CF versions. Also return null from CF.diff, if no diff in RTL. A second patch (ReadRepairsDebugLogger.txt) adds some debug logging to look at read repairs. You may find it useful as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-4050) Rewrite RandomAccessReader to use FileChannel / nio to address Windows file access violations
[ https://issues.apache.org/jira/browse/CASSANDRA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960469#comment-13960469 ] Joshua McKenzie commented on CASSANDRA-4050: I'll rebase your branch against trunk and post a revised patch early next week. I know how much you love rebasing and I figure I owe you one for the house-cleaning on this patch. ;) Rewrite RandomAccessReader to use FileChannel / nio to address Windows file access violations - Key: CASSANDRA-4050 URL: https://issues.apache.org/jira/browse/CASSANDRA-4050 Project: Cassandra Issue Type: Bug Environment: Windows 7 Reporter: Jim Newsham Assignee: Joshua McKenzie Priority: Minor Attachments: CASSANDRA-4050_v1.patch On Windows w/older java I/O libraries the files are not opened with FILE_SHARE_DELETE. This causes problems as hard-links cannot be deleted while the original file is opened - our snapshots are a big problem in particular. The nio library and FileChannels open with FILE_SHARE_DELETE which should help remedy this problem. Original text: I'm using Cassandra 1.0.8, on Windows 7. When I take a snapshot of the database, I find that I am unable to delete the snapshot directory (i.e., dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) while Cassandra is running: The action can't be completed because the folder or a file in it is open in another program. Close the folder or file and try again [in Windows Explorer]. If I terminate Cassandra, then I can delete the directory with no problem. I expect to be able to move or delete the snapshotted files while Cassandra is running, as this should not affect the runtime operation of Cassandra. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.
[ https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960479#comment-13960479 ] Joshua McKenzie commented on CASSANDRA-6283: On CASSANDRA-4050 we're converting our RandomAccessReader to use nio which should fix the can't delete a hard-link while original file is open for most use-cases. Unfortunately you cannot delete hard-linked files on Windows if you have a memory-mapped segment in the original file - I've done some benchmarking on CASSANDRA-6890 regarding removing memory mapped I/O and the performance cost / feature loss is high enough that we're going to keep it for now. I'll put together a patch for this ticket to create something similar to an SSTableDeletingTask for a snapshot folder - walk the files and try to delete them, re-scheduling a job to try and clear this folder again after a GC if there's any failures due to access violations. That combined with CASSANDRA-4050 should give us immediate and full clear on compressed cf's and partial / incrementally improving snapshot clearing on snapshots where there's memory mapped readers to the original sstables. I don't like having partially cleared out snapshots floating around on the file-system though. I'd guess this will cause some confusion for people in the future. Windows 7 data files keept open / can't be deleted after compaction. Key: CASSANDRA-6283 URL: https://issues.apache.org/jira/browse/CASSANDRA-6283 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows 7 (32) / Java 1.7.0.45 Reporter: Andreas Schnitzerling Assignee: Joshua McKenzie Labels: compaction Fix For: 2.0.7 Attachments: 6283_StreamWriter_patch.txt, leakdetect.patch, neighbor-log.zip, root-log.zip, screenshot-1.jpg, system.log Files cannot be deleted, patch CASSANDRA-5383 (Win7 deleting problem) doesn't help on Win-7 on Cassandra 2.0.2. Even 2.1 Snapshot is not running. The cause is: Opened file handles seem to be lost and not closed properly. Win 7 blames, that another process is still using the file (but its obviously cassandra). Only restart of the server makes the files deleted. But after heavy using (changes) of tables, there are about 24K files in the data folder (instead of 35 after every restart) and Cassandra crashes. I experiminted and I found out, that a finalizer fixes the problem. So after GC the files will be deleted (not optimal, but working fine). It runs now 2 days continously without problem. Possible fix/test: I wrote the following finalizer at the end of class org.apache.cassandra.io.util.RandomAccessReader: {code:title=RandomAccessReader.java|borderStyle=solid} @Override protected void finalize() throws Throwable { deallocate(); super.finalize(); } {code} Can somebody test / develop / patch it? Thx. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960546#comment-13960546 ] Benedict commented on CASSANDRA-6934: - I decided to put in some of the extra optimisations now after all - whenever the clustering/ordering components of a type all support unsigned comparison we now avoid almost all virtual method calls. This gives a 10-20% bump in some very quick and dirty tests on my box versus the prior optimisation, and probably has a larger effect on clustering columns (which still can't easily be benchmarked, but I will fix that next week) Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6984) NullPointerException in Streaming During Repair
Tyler Hobbs created CASSANDRA-6984: -- Summary: NullPointerException in Streaming During Repair Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6984) NullPointerException in Streaming During Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-6984: Fix Version/s: 2.0.7 NullPointerException in Streaming During Repair --- Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita Fix For: 2.0.7 In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6984) NullPointerException in Streaming During Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-6984: Priority: Blocker (was: Major) NullPointerException in Streaming During Repair --- Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita Priority: Blocker Fix For: 2.0.7 In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false
[ https://issues.apache.org/jira/browse/CASSANDRA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960799#comment-13960799 ] Tyler Hobbs commented on CASSANDRA-6961: CASSANDRA-6984 was the cause of the hung repair. nodes should go into hibernate when join_ring is false -- Key: CASSANDRA-6961 URL: https://issues.apache.org/jira/browse/CASSANDRA-6961 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 2.0.7 Attachments: 6961.txt The impetus here is this: a node that was down for some period and comes back can serve stale information. We know from CASSANDRA-768 that we can't just wait for hints, and know that tangentially related CASSANDRA-3569 prevents us from having the node in a down (from the FD's POV) state handle streaming. We can *almost* set join_ring to false, then repair, and then join the ring to narrow the window (actually, you can do this and everything succeeds because the node doesn't know it's a member yet, which is probably a bit of a bug.) If instead we modified this to put the node in hibernate, like replace_address does, it could work almost like replace, except you could run a repair (manually) while in the hibernate state, and then flip to normal when it's done. This won't prevent the staleness 100%, but it will greatly reduce the chance if the node has been down a significant amount of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960834#comment-13960834 ] Russ Hatch commented on CASSANDRA-6971: --- Here's the gossipinfo output. This is several minutes after the test failed. Schema uuid's still mismatch. {noformat} rhatch@whatup:/tmp/dtest-VZ3n7v/test/node1$ bin/nodetool -p 7100 gossipinfo /127.0.0.2 DC:datacenter1 HOST_ID:8929d0a0-5a4f-4a6f-85c3-665bb3aaf140 SCHEMA:19969d6d-daaa-328f-ade0-8640043e37b9 NET_VERSION:6 RPC_ADDRESS:127.0.0.2 RACK:rack1 LOAD:52150.0 STATUS:NORMAL,-3074457345618258603 RELEASE_VERSION:1.2.0-SNAPSHOT /127.0.0.3 DC:datacenter1 SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f HOST_ID:864a00d4-f661-421e-81c3-90e7b8f90ef1 NET_VERSION:6 RPC_ADDRESS:127.0.0.3 RACK:rack1 LOAD:14361.0 STATUS:NORMAL,3074457345618258602 RELEASE_VERSION:1.2.0-SNAPSHOT /127.0.0.1 DC:datacenter1 HOST_ID:06bbda3d-5265-4134-8248-11e0a2ddf798 RPC_ADDRESS:127.0.0.1 NET_VERSION:6 SCHEMA:19969d6d-daaa-328f-ade0-8640043e37b9 RACK:rack1 LOAD:52153.0 STATUS:NORMAL,-9223372036854775808 RELEASE_VERSION:1.2.0-SNAPSHOT rhatch@whatup:/tmp/dtest-VZ3n7v/test/node1$ bin/nodetool -p 7200 gossipinfo /127.0.0.2 HOST_ID:8929d0a0-5a4f-4a6f-85c3-665bb3aaf140 SCHEMA:19969d6d-daaa-328f-ade0-8640043e37b9 NET_VERSION:6 LOAD:52150.0 RPC_ADDRESS:127.0.0.2 RACK:rack1 DC:datacenter1 RELEASE_VERSION:1.2.0-SNAPSHOT STATUS:NORMAL,-3074457345618258603 /127.0.0.3 HOST_ID:864a00d4-f661-421e-81c3-90e7b8f90ef1 SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f NET_VERSION:6 LOAD:14361.0 RPC_ADDRESS:127.0.0.3 RACK:rack1 DC:datacenter1 RELEASE_VERSION:1.2.0-SNAPSHOT STATUS:NORMAL,3074457345618258602 /127.0.0.1 HOST_ID:06bbda3d-5265-4134-8248-11e0a2ddf798 SCHEMA:19969d6d-daaa-328f-ade0-8640043e37b9 NET_VERSION:6 LOAD:52153.0 RPC_ADDRESS:127.0.0.1 RACK:rack1 DC:datacenter1 RELEASE_VERSION:1.2.0-SNAPSHOT STATUS:NORMAL,-9223372036854775808 rhatch@whatup:/tmp/dtest-VZ3n7v/test/node1$ bin/nodetool -p 7300 gossipinfo /127.0.0.2 NET_VERSION:6 RELEASE_VERSION:1.2.0-SNAPSHOT DC:datacenter1 SCHEMA:19969d6d-daaa-328f-ade0-8640043e37b9 HOST_ID:8929d0a0-5a4f-4a6f-85c3-665bb3aaf140 LOAD:52150.0 STATUS:NORMAL,-3074457345618258603 RACK:rack1 RPC_ADDRESS:127.0.0.2 /127.0.0.3 NET_VERSION:6 RELEASE_VERSION:1.2.0-SNAPSHOT DC:datacenter1 SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f HOST_ID:864a00d4-f661-421e-81c3-90e7b8f90ef1 LOAD:14361.0 STATUS:NORMAL,3074457345618258602 RACK:rack1 RPC_ADDRESS:127.0.0.3 /127.0.0.1 NET_VERSION:6 RELEASE_VERSION:1.2.0-SNAPSHOT DC:datacenter1 SCHEMA:19969d6d-daaa-328f-ade0-8640043e37b9 HOST_ID:06bbda3d-5265-4134-8248-11e0a2ddf798 LOAD:52153.0 STATUS:NORMAL,-9223372036854775808 RACK:rack1 RPC_ADDRESS:127.0.0.1 {noformat} nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: node1.log, node2.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960844#comment-13960844 ] Brandon Williams commented on CASSANDRA-6971: - Ok, so we know the problem is in the 'passive' detection. That narrows it down, thanks. nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-6971: -- Attachment: node3.log node2.log node1.log attaching logs with debug output. nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-6971: Attachment: 6971-debugging.txt Can you get debug logs with this extra debugging patch applied? nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: 6971-debugging.txt, node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960864#comment-13960864 ] Pavel Yaskevich commented on CASSANDRA-6694: Sorry guys, I've been busy with multiple things this week, will try to take a look at this on weekend. Slightly More Off-Heap Memtables Key: CASSANDRA-6694 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 beta2 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap overhead is still very large. It should not be tremendously difficult to extend these changes so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their associated overhead). The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the VM to allow us to address a reasonably large memory space, although this trick is unlikely to last us forever, at which point we will have to bite the bullet and accept a 24-byte per cell overhead), and 4-byte object reference for maintaining our internal list of allocations, which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting. The ugliest thing here is going to be implementing the various CellName instances so that they may be backed by native memory OR heap memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960884#comment-13960884 ] Russ Hatch commented on CASSANDRA-6971: --- weird, with the logging patch I didn't see any output from MigrationManager. I did see this exception though (maybe was occuring before and I just didn't notice): {noformat} DEBUG [Thrift:2] 2014-04-04 19:03:59,051 CustomTThreadPoolServer.java (line 209) Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: 6971-debugging.txt, node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6984) NullPointerException in Streaming During Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960891#comment-13960891 ] Jack Krupansky commented on CASSANDRA-6984: --- Is there any suggested workaround before a patch becomes generally available? Such as some other repair or rebuild sequence or parameters? Here's a SO user who appears to be hitting this with DataStax Enterprise, which uses C* 2.0. http://stackoverflow.com/questions/22837895/restarting-a-failed-stalled-stream-during-bootstrap-of-new-node NullPointerException in Streaming During Repair --- Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita Priority: Blocker Fix For: 2.0.7 In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960892#comment-13960892 ] Brandon Williams commented on CASSANDRA-6971: - That's unrelated and it's at debug for a reason, it just means a client dropped the connection on us. nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: 6971-debugging.txt, node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6984) NullPointerException in Streaming During Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960893#comment-13960893 ] Brandon Williams commented on CASSANDRA-6984: - Not really, it's a streaming problem. NullPointerException in Streaming During Repair --- Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita Priority: Blocker Fix For: 2.0.7 In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6984) NullPointerException in Streaming During Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960899#comment-13960899 ] Yuki Morishita commented on CASSANDRA-6984: --- From the stacktrace above, this is caused by CASSANDRA-6818 which is not released yet. I cannot tell what the user in SO is hitting since exception other than IOException is hidden due to NPE in ConnectionHandler. NullPointerException in Streaming During Repair --- Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita Priority: Blocker Fix For: 2.0.7 In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-6971: -- Attachment: debug3.log debug2.log debug1.log attaching logs with debug patch (from a test run when the problem happened of course). nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: 6971-debugging.txt, debug1.log, debug2.log, debug3.log, node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-6971: Attachment: 6971.txt My guess is that passiveAnnounce hasn't been called in time when the onAlive/onRestart events have been called closely enough, so we need to also check onChange. Patch to do so, plus the debugging. nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Attachments: 6971-debugging.txt, 6971.txt, debug1.log, debug2.log, debug3.log, node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (CASSANDRA-6971) nodes not catching up to creation of new keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams reassigned CASSANDRA-6971: --- Assignee: Brandon Williams nodes not catching up to creation of new keyspace - Key: CASSANDRA-6971 URL: https://issues.apache.org/jira/browse/CASSANDRA-6971 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Assignee: Brandon Williams Attachments: 6971-debugging.txt, 6971.txt, debug1.log, debug2.log, debug3.log, node1.log, node1.log, node2.log, node2.log, node3.log, node3.log The dtest suite is running a test which creates a 3 node cluster, then adds a keyspace and column family. For some reason the 3 nodes are not agreeing on the schema version. The problem is intermittent -- either the nodes all agree on schema quickly, or they seem to stay stuck in limbo. The simplest way to reproduce is to run the dtest (simple_increment_test): https://github.com/riptano/cassandra-dtest/blob/master/counter_tests.py using nosetests: {noformat} nosetests -vs counter_tests.py:TestCounters.simple_increment_test {noformat} If the problem is reproduced nose will return this: ProgrammingError: Bad Request: Keyspace 'ks' does not exist I am not yet sure if the bug is reproducible outside of the dtest suite. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6984) NullPointerException in Streaming During Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-6984: -- Attachment: 6984-2.0.txt Turning DEBUG on o.a.c.streaming shows that the node is receiving ACK and remove tasks for completion, before executing fileSent. I think simply adding null check will solve the problem. NullPointerException in Streaming During Repair --- Key: CASSANDRA-6984 URL: https://issues.apache.org/jira/browse/CASSANDRA-6984 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Yuki Morishita Priority: Blocker Fix For: 2.0.7 Attachments: 6984-2.0.txt In cassandra-2.0, I can trigger a NullPointerException with a repair. These steps should reproduce the issue: * create a three node ccm cluster (with vnodes) * start a stress write (I'm using {{tools/bin/cassandra-stress --replication-factor=3 -n 1000 -k -t 1}}) * stop node3 while stress is running, then wait a minute * start node 3 * run ccm node3 repair In the logs for node1, I see this: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 17:40:08,547 CassandraDaemon.java (line 198) Exception in thread Thread[STREAM-OUT-/127.0.0.3,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375) at java.lang.Thread.run(Thread.java:724) {noformat} After applying Yuki's suggested patch: {noformat} diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java index 356138b..b06a818 100644 --- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java +++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java @@ -366,7 +366,7 @@ public class ConnectionHandler { throw new AssertionError(e); } -catch (IOException e) +catch (Throwable e) { session.onError(e); } {noformat} I see a new NPE: {noformat} ERROR [STREAM-OUT-/127.0.0.3] 2014-04-04 18:12:35,912 StreamSession.java (line 420) [Stream #9b592af0-bc4e-11e3-a6f9-43eb3a328df9] Streaming error occurred java.lang.NullPointerException at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:465) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:60) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383) at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:355) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)