[jira] [Commented] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned
[ https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700737#comment-14700737 ] Joel Knighton commented on CASSANDRA-10068: --- [~krummas] Instructions on setting up the environment are available at https://github.com/riptano/jepsen/tree/cassandra/cassandra. Specifically, the test under consideration can be run as {code} lein with-profile +trunk test :only cassandra.mv-test/mv-crash-subset-decommission {code} That said, I understand the environment setup is a bit laborious, and I'm still working on reproducing this with the provided dtest. > Batchlog replay fails with exception after a node is decommissioned > --- > > Key: CASSANDRA-10068 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10068 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Marcus Eriksson > Fix For: 3.0 beta 2 > > Attachments: n1.log, n2.log, n3.log, n4.log, n5.log > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > At the conclusion of the test, a batchlog replay is initiated through > nodetool and hits the following assertion due to a missing host ID: > https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 > A nodetool status on the node with failed batchlog replay shows the following > entry for the decommissioned node: > DN 10.0.0.5 ? 256 ? null > rack1 > On the unaffected nodes, there is no entry for the decommissioned node as > expected. > There are occasional hits of the same assertions for logs in other nodes; it > looks like the issue might occasionally resolve itself, but one node seems to > have the errant null entry indefinitely. > In logs for the nodes, this possibly unrelated exception also appears: > java.lang.RuntimeException: Trying to get the view natural endpoint on a > non-data replica > at > org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) > ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] > I have a running cluster with the issue on my machine; it is also repeatable. > Nothing stands out in the logs of the decommissioned node (n4) for me. The > logs of each node in the cluster are attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9434) If a node loses schema_columns SSTables it could delete all secondary indexes from the schema
[ https://issues.apache.org/jira/browse/CASSANDRA-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700697#comment-14700697 ] Richard Low commented on CASSANDRA-9434: Thanks Aleksey. So it sounds like we should close this as behaves correctly? > If a node loses schema_columns SSTables it could delete all secondary indexes > from the schema > - > > Key: CASSANDRA-9434 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9434 > Project: Cassandra > Issue Type: Bug >Reporter: Richard Low >Assignee: Aleksey Yeschenko > Fix For: 2.0.x > > > It is possible that a single bad node can delete all secondary indexes if it > restarts and cannot read its schema_columns SSTables. Here's a reproduction: > * Create a 2 node cluster (we saw it on 2.0.11) > * Create the schema: > {code} > create keyspace myks with replication = {'class':'SimpleStrategy', > 'replication_factor':1}; > use myks; > create table mytable (a text, b text, c text, PRIMARY KEY (a, b) ); > create index myindex on mytable(b); > {code} > NB index must be on clustering column to repro > * Kill one node > * Wipe its commitlog and system/schema_columns sstables. > * Start it again > * Run on this node > select index_name from system.schema_columns where keyspace_name = 'myks' and > columnfamily_name = 'mytable' and column_name = 'b'; > and you'll see the index is null. > * Run 'describe schema' on the other node. Sometimes it will not show the > index, but you might need to bounce for it to disappear. > I think the culprit is SystemKeyspace.copyAllAliasesToColumnsProper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
[ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700684#comment-14700684 ] Stefania commented on CASSANDRA-8630: - Thanks for your analysis. I repeated the tests, 3 identical runs each time, albeit with a smaller data set. They still indicate it is the uncompressed case where something has gone wrong, not the compressed case. And more specifically I traced the slowness to mmap disk access. Here are the results, because I am on a 64-bit machine {{disk_access_mode=auto}} resolves to {{mmap}} (although I am not sure at which version this behavior started so it may not be true for all versions). In the 'uncomp-std' test I forced the disk access mode to standard. ||Version||Run 1||Run 2||Run 3||Rounded AVG|| |8630 comp|17.91|18.31|17.94|18| |8630 uncomp|28.06|28.95|28.02|28| |8630 uncomp-std|19.31|18.09|18.9|19| |TRUNK comp|17.95|17.64|17.72|18| |TRUNK uncomp|20.81|20.01|18.81|20| |2.2 comp|19.95|20.33|19.97|20| |2.2 uncomp|19.14|19.18|20.1|19| |2.1 comp|21.61|20.43|20.43|21| |2.1 uncomp|20.4|19.67|19.71|20| |2.0 comp|18.8|19.42|19.66|19| |2.0 uncomp|19.48|19.55|19.68|20| Notes: * Reduced data to 1M entries, which corresponds to approximately 220 MB of data. This allowed me to keep the machine _more or less_ idle during the tests. * All tests done with Java 8 update 51 except for 2.0 which was done with Java 7 update 80. * Tests performed on a 64-bit linux laptop with SSD * Compaction strategy was the default strategy used by the stress tool: SizedTieredCompactionStrategy Next I need to understand why mmap is so slow, I think I must have broken something when I moved the segments to the RAR however. bq. I usually set the file read and write and contention thresholds to one millisecond. What parameters do you use to achieve this? > Faster sequential IO (on compaction, streaming, etc) > > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools >Reporter: Oleg Anastasyev >Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, > flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot > of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as > their matching write* are implemented with numerous calls of byte by byte > read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in > either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read and > SequencialWriter.write methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and > ColumnNameHelper.maxComponents, which were on my profiler's hotspot method > list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% > faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU > load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10086) Add a "CLEAR" cqlsh command to clear the console
[ https://issues.apache.org/jira/browse/CASSANDRA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700652#comment-14700652 ] Paul O'Fallon commented on CASSANDRA-10086: --- Thanks! Yes, I can certainly reorder the changes to cqlsh. I'll also see what I can do regarding the tests. I'll update the issue later in the week. > Add a "CLEAR" cqlsh command to clear the console > > > Key: CASSANDRA-10086 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10086 > Project: Cassandra > Issue Type: Improvement >Reporter: Paul O'Fallon >Priority: Trivial > Labels: cqlsh, doc-impacting > Attachments: 10086.txt > > > It would be very helpful to have a "CLEAR" command to clear the cqlsh > console. I learned (after researching a patch for this) that lowercase > CTRL+L will clear the screen, but having a discrete command would make that > more obvious. To match the expectations of Windows users, an alias to "CLS" > would be nice as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10113) Undroppable messages can be dropped if message queue gets large
Yuki Morishita created CASSANDRA-10113: -- Summary: Undroppable messages can be dropped if message queue gets large Key: CASSANDRA-10113 URL: https://issues.apache.org/jira/browse/CASSANDRA-10113 Project: Cassandra Issue Type: Bug Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor When outgoing messages are queued, OutboundTcpConnection checks the size of backlog, and [if it gets more than 1024, it drops expired messages silently from the backlog|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L150]. {{expireMessages()}} just checks message's timeout, which can be {{request_timeout_in_ms}} (10 sec default) for non- read/write message, and does not check if its droppable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10112) Apply disk_failure_policy to transaction logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700562#comment-14700562 ] Stefania commented on CASSANDRA-10112: -- If we stashed corrupt sstable files regardless of transactions, then we would also fulfill the requirements of CASSANDRA-9812. > Apply disk_failure_policy to transaction logs > - > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement >Reporter: Stefania >Assignee: Stefania > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10112) Apply disk_failure_policy to transaction logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700537#comment-14700537 ] Stefania commented on CASSANDRA-10112: -- I think we should stash files when the {{disk_failure_policy}} is {{ignore}}, or we could add a new policy for this. I'm not sure for the case when we refuse to start though. Perhaps in this case we should add this functionality to the offline sstable utility tool and let the operator either clean-up or stash. > Apply disk_failure_policy to transaction logs > - > > Key: CASSANDRA-10112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 > Project: Cassandra > Issue Type: Improvement >Reporter: Stefania >Assignee: Stefania > > Transaction logs were introduced by CASSANDRA-7066 and are read during > start-up. In case of file system errors, such as disk corruption, we > currently log a panic error and leave the sstable files and transaction logs > as they are; this is to avoid rolling back a transaction (i.e. deleting > files) by mistake. > We should instead look at the {{disk_failure_policy}} and refuse to start > unless the failure policy is {{ignore}}. > We should also consider stashing files that cannot be read during startup, > either transaction logs or sstables, by moving them to a dedicated > sub-folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700533#comment-14700533 ] Stefania commented on CASSANDRA-7066: - I agree on stashing files with {{ignore}}, not sure for the case when we refuse to start though. Perhaps in this case we should add this functionality to the offline sstable utility tool and let the operator either clean-up or stash. I've opened CASSANDRA-10112, let's move the discussion there. [~benedict] could you commit the fix for two COVERITY defects? Commit [here|https://github.com/stef1927/cassandra/commit/c94d8a12dc43cd4705f1aa8cf384fb8c3290a5f9]. {code} ** CID 1316515: FindBugs: Internationalization (FB.DM_DEFAULT_ENCODING) /src/java/org/apache/cassandra/db/lifecycle/TransactionLog.java: 363 in org.apache.cassandra.db.lifecycle.TransactionLog$TransactionFile.readRecord(java.lang.String, boolean)() *** CID 1316515: FindBugs: Internationalization (FB.DM_DEFAULT_ENCODING) /src/java/org/apache/cassandra/db/lifecycle/TransactionLog.java: 363 in org.apache.cassandra.db.lifecycle.TransactionLog$TransactionFile.readRecord(java.lang.String, boolean)() 357 if (!matcher.matches() || matcher.groupCount() != 2) 358 { 359 handleReadRecordError(String.format("cannot parse line \"%s\"", line), isLast); 360 return Record.make(line, isLast); 361 } 362 >>> CID 1316515: FindBugs: Internationalization (FB.DM_DEFAULT_ENCODING) >>> Found reliance on default encoding: String.getBytes(). 363 byte[] bytes = matcher.group(1).getBytes(); 364 checksum.update(bytes, 0, bytes.length); 365 366 if (checksum.getValue() != Long.valueOf(matcher.group(2))) 367 handleReadRecordError(String.format("invalid line checksum %s for \"%s\"", matcher.group(2), line), isLast); 368 ** CID 1316514: FindBugs: Internationalization (FB.DM_DEFAULT_ENCODING) /src/java/org/apache/cassandra/db/lifecycle/TransactionLog.java: 231 in org.apache.cassandra.db.lifecycle.TransactionLog$Record.getBytes()() *** CID 1316514: FindBugs: Internationalization (FB.DM_DEFAULT_ENCODING) /src/java/org/apache/cassandra/db/lifecycle/TransactionLog.java: 231 in org.apache.cassandra.db.lifecycle.TransactionLog$Record.getBytes()() 225 { 226 return String.format("%s:[%s,%d,%d]", type.toString(), relativeFilePath, updateTime, numFiles); 227 } 228 229 public byte[] getBytes() 230 { >>> CID 1316514: FindBugs: Internationalization (FB.DM_DEFAULT_ENCODING) >>> Found reliance on default encoding: String.getBytes(). 231 return record.getBytes(); 232 } 233 234 public boolean verify(String parentFolder, boolean lastRecordIsCorrupt) 235 { 236 if (type != RecordType.REMOVE) {code} > Simplify (and unify) cleanup of compaction leftovers > > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania >Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10112) Apply disk_failure_policy to transaction logs
Stefania created CASSANDRA-10112: Summary: Apply disk_failure_policy to transaction logs Key: CASSANDRA-10112 URL: https://issues.apache.org/jira/browse/CASSANDRA-10112 Project: Cassandra Issue Type: Improvement Reporter: Stefania Assignee: Stefania Transaction logs were introduced by CASSANDRA-7066 and are read during start-up. In case of file system errors, such as disk corruption, we currently log a panic error and leave the sstable files and transaction logs as they are; this is to avoid rolling back a transaction (i.e. deleting files) by mistake. We should instead look at the {{disk_failure_policy}} and refuse to start unless the failure policy is {{ignore}}. We should also consider stashing files that cannot be read during startup, either transaction logs or sstables, by moving them to a dedicated sub-folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10109) Windows dtest 3.0: ttl_test.py failures
[ https://issues.apache.org/jira/browse/CASSANDRA-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700480#comment-14700480 ] Stefania commented on CASSANDRA-10109: -- This is bad news. As far as I understand the documentation, this means that on Windows we cannot list files in a directory atomically, third paragraph [here|http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#newDirectoryStream(java.nio.file.Path)]. So we could list some sstable temporary files but not the txn log file, later they get deleted along with their txn log file by a racing thread, and if we fail to list the txn log file we classify these sstable files incorrectly as final files. However, these files shouldn't exist any longer since the txn log is deleted last, so this would result in NoSuchFileExceptions when trying to read the files. I think we should check that all final files exist before returning them and repeat the process in case some files no longer exist. This should only be done when we don't have atomic listing. [~benedict] do you think this would be enough or do you see other potential races? > Windows dtest 3.0: ttl_test.py failures > --- > > Key: CASSANDRA-10109 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10109 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joshua McKenzie > Labels: Windows > Fix For: 3.0.x > > > ttl_test.py:TestTTL.update_column_ttl_with_default_ttl_test2 > ttl_test.py:TestTTL.update_multiple_columns_ttl_test > ttl_test.py:TestTTL.update_single_column_ttl_test > Errors locally are different than CI from yesterday. Yesterday on CI we have > timeouts and general node hangs. Today on all 3 tests when run locally I see: > {noformat} > Traceback (most recent call last): > File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown > raise AssertionError('Unexpected error in %s node log: %s' % (node.name, > errors)) > AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 > 16:53:43,120 NoSpamLogger.java:97 - This platform does not support atomic > directory streams (SecureDirectoryStream); race conditions when loading > sstable files could occurr'] > {noformat} > This traces back to the commit for CASSANDRA-7066 today by [~Stefania] and > [~benedict]. Stefania - care to take this ticket and also look further into > whether or not we're going to have issues with 7066 on Windows? That error > message certainly *sounds* like it's not a good thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10109) Windows dtest 3.0: ttl_test.py failures
[ https://issues.apache.org/jira/browse/CASSANDRA-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania reassigned CASSANDRA-10109: Assignee: Stefania > Windows dtest 3.0: ttl_test.py failures > --- > > Key: CASSANDRA-10109 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10109 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joshua McKenzie >Assignee: Stefania > Labels: Windows > Fix For: 3.0.x > > > ttl_test.py:TestTTL.update_column_ttl_with_default_ttl_test2 > ttl_test.py:TestTTL.update_multiple_columns_ttl_test > ttl_test.py:TestTTL.update_single_column_ttl_test > Errors locally are different than CI from yesterday. Yesterday on CI we have > timeouts and general node hangs. Today on all 3 tests when run locally I see: > {noformat} > Traceback (most recent call last): > File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown > raise AssertionError('Unexpected error in %s node log: %s' % (node.name, > errors)) > AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 > 16:53:43,120 NoSpamLogger.java:97 - This platform does not support atomic > directory streams (SecureDirectoryStream); race conditions when loading > sstable files could occurr'] > {noformat} > This traces back to the commit for CASSANDRA-7066 today by [~Stefania] and > [~benedict]. Stefania - care to take this ticket and also look further into > whether or not we're going to have issues with 7066 on Windows? That error > message certainly *sounds* like it's not a good thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
[ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700433#comment-14700433 ] Ariel Weisberg edited comment on CASSANDRA-8630 at 8/17/15 11:43 PM: - The difference in performance seems too be big to be explained by what is covered here. Maybe a native call like compression/decompression got slower due to additional copying? Flight recorder happily ignores native code. Flight recording of 8630 with compression, hot packages ||Package|Sample Count|Percentage(%)|| |org.apache.cassandra.db.rows|1,577|25.403| |org.apache.cassandra.utils|1,498|24.13| |org.apache.cassandra.utils.btree|670|10.793| |com.googlecode.concurrentlinkedhashmap|598|9.633| |java.util|585|9.423| |org.apache.cassandra.io.sstable|430|6.927| |org.apache.cassandra.db.partitions|183|2.948| |org.apache.cassandra.cache|162|2.61| |org.apache.cassandra.io.util|139|2.239| |org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$$Lambda$93|77|1.24| |org.apache.cassandra.db|74|1.192| Flight recording trunk, hot packages ||Package|Sample Count|Percentage(%)|| |org.apache.cassandra.utils|1,771|26.732| |org.apache.cassandra.db.rows|1,599|24.136| |com.googlecode.concurrentlinkedhashmap|631|9.525| |java.util|590|8.906| |org.apache.cassandra.utils.btree|565|8.528| |org.apache.cassandra.io.sstable|438|6.611| |org.apache.cassandra.io.util|330|4.981| |org.apache.cassandra.db.partitions|124|1.872| |org.apache.cassandra.cache|121|1.826| |org.apache.cassandra.io.sstable.format.big|105|1.585| |org.apache.cassandra.db|102|1.54| was (Author: aweisberg): Flight recording of 8630 with compression, hot packages ||Package|Sample Count|Percentage(%)|| |org.apache.cassandra.db.rows|1,577|25.403| |org.apache.cassandra.utils|1,498|24.13| |org.apache.cassandra.utils.btree|670|10.793| |com.googlecode.concurrentlinkedhashmap|598|9.633| |java.util|585|9.423| |org.apache.cassandra.io.sstable|430|6.927| |org.apache.cassandra.db.partitions|183|2.948| |org.apache.cassandra.cache|162|2.61| |org.apache.cassandra.io.util|139|2.239| |org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$$Lambda$93|77|1.24| |org.apache.cassandra.db|74|1.192| Flight recording trunk, hot packages ||Package|Sample Count|Percentage(%)|| |org.apache.cassandra.utils|1,771|26.732| |org.apache.cassandra.db.rows|1,599|24.136| |com.googlecode.concurrentlinkedhashmap|631|9.525| |java.util|590|8.906| |org.apache.cassandra.utils.btree|565|8.528| |org.apache.cassandra.io.sstable|438|6.611| |org.apache.cassandra.io.util|330|4.981| |org.apache.cassandra.db.partitions|124|1.872| |org.apache.cassandra.cache|121|1.826| |org.apache.cassandra.io.sstable.format.big|105|1.585| |org.apache.cassandra.db|102|1.54| > Faster sequential IO (on compaction, streaming, etc) > > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools >Reporter: Oleg Anastasyev >Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, > flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot > of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as > their matching write* are implemented with numerous calls of byte by byte > read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in > either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read and > SequencialWriter.write methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and > ColumnNameHelper.maxComponents, which were on my profiler's hotspot method > list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% > faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU > load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
[ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700433#comment-14700433 ] Ariel Weisberg commented on CASSANDRA-8630: --- Flight recording of 8630 with compression, hot packages ||Package|Sample Count|Percentage(%)|| |org.apache.cassandra.db.rows|1,577|25.403| |org.apache.cassandra.utils|1,498|24.13| |org.apache.cassandra.utils.btree|670|10.793| |com.googlecode.concurrentlinkedhashmap|598|9.633| |java.util|585|9.423| |org.apache.cassandra.io.sstable|430|6.927| |org.apache.cassandra.db.partitions|183|2.948| |org.apache.cassandra.cache|162|2.61| |org.apache.cassandra.io.util|139|2.239| |org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$$Lambda$93|77|1.24| |org.apache.cassandra.db|74|1.192| Flight recording trunk, hot packages ||Package|Sample Count|Percentage(%)|| |org.apache.cassandra.utils|1,771|26.732| |org.apache.cassandra.db.rows|1,599|24.136| |com.googlecode.concurrentlinkedhashmap|631|9.525| |java.util|590|8.906| |org.apache.cassandra.utils.btree|565|8.528| |org.apache.cassandra.io.sstable|438|6.611| |org.apache.cassandra.io.util|330|4.981| |org.apache.cassandra.db.partitions|124|1.872| |org.apache.cassandra.cache|121|1.826| |org.apache.cassandra.io.sstable.format.big|105|1.585| |org.apache.cassandra.db|102|1.54| > Faster sequential IO (on compaction, streaming, etc) > > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools >Reporter: Oleg Anastasyev >Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, > flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot > of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as > their matching write* are implemented with numerous calls of byte by byte > read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in > either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read and > SequencialWriter.write methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and > ColumnNameHelper.maxComponents, which were on my profiler's hotspot method > list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% > faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU > load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8671) Give compaction strategy more control over where sstables are created, including for flushing and streaming.
[ https://issues.apache.org/jira/browse/CASSANDRA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700430#comment-14700430 ] Blake Eggleston commented on CASSANDRA-8671: Here's the branch: https://github.com/bdeggleston/cassandra/tree/8671-2 and the last test runs: http://cassci.datastax.com/job/bdeggleston-8671-2-testall/2/ http://cassci.datastax.com/job/bdeggleston-8671-2-dtest/2/ Doesn't look like there's anything failing that isn't already failing on cassandra-3.0 > Give compaction strategy more control over where sstables are created, > including for flushing and streaming. > > > Key: CASSANDRA-8671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8671 > Project: Cassandra > Issue Type: Improvement >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 3.x > > Attachments: > 0001-C8671-creating-sstable-writers-for-flush-and-stream-.patch, > 8671-giving-compaction-strategies-more-control-over.txt > > > This would enable routing different partitions to different disks based on > some user defined parameters. > My initial take on how to do this would be to make an interface from > SSTableWriter, and have a table's compaction strategy do all SSTableWriter > instantiation. Compaction strategies could then implement their own > SSTableWriter implementations (which basically wrap one or more normal > sstablewriters) for compaction, flushing, and streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
[ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700401#comment-14700401 ] Ariel Weisberg commented on CASSANDRA-8630: --- ||Version|Time 1|Time 2|Time 3| |8630 uncompressed|197|204|198| |8630 compressed|263|262|261| |3.x uncompressed|199|198|198| |3.x compressed|200|198|198| My intuition is that the compressed case has something bad happening, and that there is no impact from the changes in the uncompressed case. That kind of suggests the time/bottleneck is elsewhere. I am looking at the flight recordings now. Did you measure on OS X or Linux? FYI I usually set the file read and write and contention thresholds to one millisecond. Doesn't seem to impact performance, but does provide a clearer picture. > Faster sequential IO (on compaction, streaming, etc) > > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools >Reporter: Oleg Anastasyev >Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, > flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot > of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as > their matching write* are implemented with numerous calls of byte by byte > read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in > either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read and > SequencialWriter.write methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and > ColumnNameHelper.maxComponents, which were on my profiler's hotspot method > list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% > faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU > load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700387#comment-14700387 ] Ariel Weisberg commented on CASSANDRA-9738: --- Good stuff. Coverage of the OHC key cache provider looks good. * How are the 2i paths tested? * The null case in makeVal isn't tested, maybe not that interesting * SerializationHeader forKeyCache is racy and can result in an undersize array clobbering a properly sized one. But... it doesn't retrieve the value it sets so odds are eventually it will work out to be the longer one. It works it's just intentionally racy. * In CacheService does that comment about singleton weigher even make sense anymore? * NIODataInputStream has a derived class DataInputBuffer that exposes the constructor you made public. * The string encoding and decoding helpers you wrote seem like they should be factored out somewhere else, maybe ByteBufferUtil? Also you don't specify a string encoding and there may be some issues with serialized size of non-latin characters lurking as well. * An enhancement we can file for later is to replace those strings with vints that reference a map of possible table names. For persistence definitely fully qualify, but in memory we can store more entries that way. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10111) reconnecting snitch can bypass cluster name check
[ https://issues.apache.org/jira/browse/CASSANDRA-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Burroughs updated CASSANDRA-10111: Summary: reconnecting snitch can bypass cluster name check (was: reconnecting snitch can bypass name check) > reconnecting snitch can bypass cluster name check > - > > Key: CASSANDRA-10111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10111 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.0.x >Reporter: Chris Burroughs > Labels: gossip > > Setup: > * Two clusters: A & B > * Both are two DC cluster > * Both use GossipingPropertyFileSnitch with different > listen_address/broadcast_address > A new node was added to cluster A with a broadcast_address of an existing > node in cluster B (due to an out of data DNS entry). Cluster B added all of > the nodes from cluster A, somehow bypassing the cluster name mismatch check > for this nodes. The first reference to cluster A nodes in cluster B logs is > when then were added: > {noformat} > INFO [GossipStage:1] 2015-08-17 15:08:33,858 Gossiper.java (line 983) Node > /8.37.70.168 is now part of the cluster > {noformat} > Cluster B nodes then tried to gossip to cluster A nodes, but cluster A kept > them out with 'ClusterName mismatch'. Cluster B however tried to send to > send reads/writes to cluster A and general mayhem ensued. > Obviously this is a Bad (TM) config that Should Not Be Done. However, since > the consequence of crazy merged clusters are really bad (the reason there is > the name mismatch check in the first place) I think the hole is reasonable to > plug. I'm not sure exactly what the code path is that skips the check in > GossipDigestSynVerbHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10111) reconnecting snitch can bypass name check
Chris Burroughs created CASSANDRA-10111: --- Summary: reconnecting snitch can bypass name check Key: CASSANDRA-10111 URL: https://issues.apache.org/jira/browse/CASSANDRA-10111 Project: Cassandra Issue Type: Bug Components: Core Environment: 2.0.x Reporter: Chris Burroughs Setup: * Two clusters: A & B * Both are two DC cluster * Both use GossipingPropertyFileSnitch with different listen_address/broadcast_address A new node was added to cluster A with a broadcast_address of an existing node in cluster B (due to an out of data DNS entry). Cluster B added all of the nodes from cluster A, somehow bypassing the cluster name mismatch check for this nodes. The first reference to cluster A nodes in cluster B logs is when then were added: {noformat} INFO [GossipStage:1] 2015-08-17 15:08:33,858 Gossiper.java (line 983) Node /8.37.70.168 is now part of the cluster {noformat} Cluster B nodes then tried to gossip to cluster A nodes, but cluster A kept them out with 'ClusterName mismatch'. Cluster B however tried to send to send reads/writes to cluster A and general mayhem ensued. Obviously this is a Bad (TM) config that Should Not Be Done. However, since the consequence of crazy merged clusters are really bad (the reason there is the name mismatch check in the first place) I think the hole is reasonable to plug. I'm not sure exactly what the code path is that skips the check in GossipDigestSynVerbHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9434) If a node loses schema_columns SSTables it could delete all secondary indexes from the schema
[ https://issues.apache.org/jira/browse/CASSANDRA-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700301#comment-14700301 ] Aleksey Yeschenko commented on CASSANDRA-9434: -- So, the good news is that this issue will not happen in 2.1, 2.2, or 3.0. In those we assume that this migration had been performed in 2.0 already. Furthermore, in 3.0 the indexes are kept in a totally separate table from columns. The bad news is that 2.0 is EOL and that I don't know a solid heuristic for determining whether or not we have this data missing. It's possible for a pre-upgrade 2.0 node to have completely empty {{system.schema_columns}} (sans system tables' columns themselves) table if the system had no {{REGULAR}} columns defined on any of the tables. > If a node loses schema_columns SSTables it could delete all secondary indexes > from the schema > - > > Key: CASSANDRA-9434 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9434 > Project: Cassandra > Issue Type: Bug >Reporter: Richard Low >Assignee: Aleksey Yeschenko > Fix For: 2.0.x > > > It is possible that a single bad node can delete all secondary indexes if it > restarts and cannot read its schema_columns SSTables. Here's a reproduction: > * Create a 2 node cluster (we saw it on 2.0.11) > * Create the schema: > {code} > create keyspace myks with replication = {'class':'SimpleStrategy', > 'replication_factor':1}; > use myks; > create table mytable (a text, b text, c text, PRIMARY KEY (a, b) ); > create index myindex on mytable(b); > {code} > NB index must be on clustering column to repro > * Kill one node > * Wipe its commitlog and system/schema_columns sstables. > * Start it again > * Run on this node > select index_name from system.schema_columns where keyspace_name = 'myks' and > columnfamily_name = 'mytable' and column_name = 'b'; > and you'll see the index is null. > * Run 'describe schema' on the other node. Sometimes it will not show the > index, but you might need to bounce for it to disappear. > I think the culprit is SystemKeyspace.copyAllAliasesToColumnsProper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8716) "java.util.concurrent.ExecutionException: java.lang.AssertionError: Memory was freed" when running cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700286#comment-14700286 ] Evin Callahan commented on CASSANDRA-8716: -- +1 for a workaround > "java.util.concurrent.ExecutionException: java.lang.AssertionError: Memory > was freed" when running cleanup > -- > > Key: CASSANDRA-8716 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8716 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Centos 6.6, Cassandra 2.0.12, Oracle JDK 1.7.0_67 >Reporter: Imri Zvik >Assignee: Robert Stupp >Priority: Minor > Labels: qa-resolved > Fix For: 2.0.13 > > Attachments: 8716.txt, system.log.gz > > > {code}Error occurred during cleanup > java.util.concurrent.ExecutionException: java.lang.AssertionError: Memory was > freed > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:234) > at > org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:272) > at > org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1115) > at > org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2177) > at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) > at sun.rmi.transport.Transport$1.run(Transport.java:177) > at sun.rmi.transport.Transport$1.run(Transport.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:173) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.AssertionError: Memory was freed > at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:259) > at org.apache.cassandra.io.util.Memory.getInt(Memory.java:211) > at > org.apache.cassandra.io.sstable.IndexSummary.getIndex(IndexSummary.java:79) > at > org.apache.cassandra.io.ss
[jira] [Commented] (CASSANDRA-6717) Modernize schema tables
[ https://issues.apache.org/jira/browse/CASSANDRA-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700242#comment-14700242 ] Sam Tunnicliffe commented on CASSANDRA-6717: I've fixed BatchLogManagerTest and a number of the thrift dtests. Also, updated the bundled java driver with the latest from the [JAVA-875|https://datastax-oss.atlassian.net/browse/JAVA-875] branch and the bundled python driver with the latest from [PYTHON-276|https://datastax-oss.atlassian.net/browse/PYTHON-276]. I'll commit to 3.0 when cassci is happy. For future reference, the command to build a source dist of the python driver for internal use during dev is {code}python setup.py egg_info -b-`git rev-parse --short HEAD` sdist --formats=zip{code} > Modernize schema tables > --- > > Key: CASSANDRA-6717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6717 > Project: Cassandra > Issue Type: Sub-task >Reporter: Sylvain Lebresne >Assignee: Sam Tunnicliffe > Labels: client-impacting, doc-impacting > Fix For: 3.0 beta 1 > > > There is a few problems/improvements that can be done with the way we store > schema: > # CASSANDRA-4988: as explained on the ticket, storing the comparator is now > redundant (or almost, we'd need to store whether the table is COMPACT or not > too, which we don't currently is easy and probably a good idea anyway), it > can be entirely reconstructed from the infos in schema_columns (the same is > true of key_validator and subcomparator, and replacing default_validator by a > COMPACT_VALUE column in all case is relatively simple). And storing the > comparator as an opaque string broke concurrent updates of sub-part of said > comparator (concurrent collection addition or altering 2 separate clustering > columns typically) so it's really worth removing it. > # CASSANDRA-4603: it's time to get rid of those ugly json maps. I'll note > that schema_keyspaces is a problem due to its use of COMPACT STORAGE, but I > think we should fix it once and for-all nonetheless (see below). > # For CASSANDRA-6382 and to allow indexing both map keys and values at the > same time, we'd need to be able to have more than one index definition for a > given column. > # There is a few mismatches in table options between the one stored in the > schema and the one used when declaring/altering a table which would be nice > to fix. The compaction, compression and replication maps are one already > mentioned from CASSANDRA-4603, but also for some reason > 'dclocal_read_repair_chance' in CQL is called just 'local_read_repair_chance' > in the schema table, and 'min/max_compaction_threshold' are column families > option in the schema but just compaction options for CQL (which makes more > sense). > None of those issues are major, and we could probably deal with them > independently but it might be simpler to just fix them all in one shot so I > wanted to sum them all up here. In particular, the fact that > 'schema_keyspaces' uses COMPACT STORAGE is annoying (for the replication map, > but it may limit future stuff too) which suggest we should migrate it to a > new, non COMPACT table. And while that's arguably a detail, it wouldn't hurt > to rename schema_columnfamilies to schema_tables for the years to come since > that's the prefered vernacular for CQL. > Overall, what I would suggest is to move all schema tables to a new keyspace, > named 'schema' for instance (or 'system_schema' but I prefer the shorter > version), and fix all the issues above at once. Since we currently don't > exchange schema between nodes of different versions, all we'd need to do that > is a one shot startup migration, and overall, I think it could be simpler for > clients to deal with one clear migration than to have to handle minor > individual changes all over the place. I also think it's somewhat cleaner > conceptually to have schema tables in their own keyspace since they are > replicated through a different mechanism than other system tables. > If we do that, we could, for instance, migrate to the following schema tables > (details up for discussion of course): > {noformat} > CREATE TYPE user_type ( > name text, > column_names list, > column_types list > ) > CREATE TABLE keyspaces ( > name text PRIMARY KEY, > durable_writes boolean, > replication map, > user_types map > ) > CREATE TYPE trigger_definition ( > name text, > options map > ) > CREATE TABLE tables ( > keyspace text, > name text, > id uuid, > table_type text, // COMPACT, CQL or SUPER > dropped_columns map, > triggers map, > // options > comment text, > compaction map, > compression map, > read_repair_chance double, > dclocal_read_repair_chance double, > gc_grace_seconds int, > caching text, > rows_per
[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)
[ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700234#comment-14700234 ] Ariel Weisberg commented on CASSANDRA-8630: --- I have an empty box I can run it on. Which compaction strategy are you taking those numbers from? When I run the test it does it 3 times once for each strategy. > Faster sequential IO (on compaction, streaming, etc) > > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools >Reporter: Oleg Anastasyev >Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, > flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot > of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as > their matching write* are implemented with numerous calls of byte by byte > read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in > either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read and > SequencialWriter.write methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and > ColumnNameHelper.maxComponents, which were on my profiler's hotspot method > list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% > faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU > load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10102) java.lang.UnsupportedOperationException after upgrade to 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-10102: --- Summary: java.lang.UnsupportedOperationException after upgrade to 3.0 (was: java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1) > java.lang.UnsupportedOperationException after upgrade to 3.0 > > > Key: CASSANDRA-10102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10102 > Project: Cassandra > Issue Type: Bug >Reporter: Russ Hatch > Attachments: node1.log, node2.log, node3.log > > > Upgrade tests are showing a potential issue. I'm seeing this during rolling > upgrades to 3.0 alpha 1, after one node has been upgraded to the alpha. > I will attach cassandra logs here, node1.log is where most of the failures > are seen. > {noformat} > ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,888 > CassandraDaemon.java:189 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.UnsupportedOperationException: null > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) > ~[main/:na] > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) > ~[main/:na] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) > ~[main/:na] > INFO [GossipStage:1] 2015-08-17 12:22:06,914 StorageService.java:1886 - Node > /127.0.0.2 state jump to normal > ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,915 > CassandraDaemon.java:189 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.UnsupportedOperationException: null > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) > ~[main/:na] > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) > ~[main/:na] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) > ~[main/:na] > {noformat} > Another exception showing in logs: > {noformat} > ERROR [SharedPool-Worker-1] 2015-08-17 12:22:19,358 ErrorMessage.java:336 - > Unexpected exception during request > java.lang.UnsupportedOperationException: Version is 9 > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serializedSize(PartitionUpdate.java:760) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:334) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:246) > ~[main/:na] > at > org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:166) > ~[main/:na] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:67) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:587) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:737) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:702) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1084) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:125) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:942) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:549) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:720) > ~[main/:na] > at > org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:613) > ~[main/:na] > at > org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:599) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryProcessor.processStat
[jira] [Created] (CASSANDRA-10110) Windows dtest 3.0: udtencoding_test.py:TestUDTEncoding.udt_test
Joshua McKenzie created CASSANDRA-10110: --- Summary: Windows dtest 3.0: udtencoding_test.py:TestUDTEncoding.udt_test Key: CASSANDRA-10110 URL: https://issues.apache.org/jira/browse/CASSANDRA-10110 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x Currently broken by CASSANDRA-7066 (thus depending on CASSANDRA-10109). Error message from CI yesterday was: {noformat} File "D:\Python27\lib\unittest\case.py", line 329, in run testMethod() File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\udtencoding_test.py", line 15, in udt_test cluster.populate(3).start() File "build\bdist.win-amd64\egg\ccmlib\cluster.py", line 249, in start p = node.start(update_pid=False, jvm_args=jvm_args, profile_options=profile_options) File "build\bdist.win-amd64\egg\ccmlib\node.py", line 447, in start common.check_socket_available(itf) File "build\bdist.win-amd64\egg\ccmlib\common.py", line 343, in check_socket_available raise UnavailableSocketError("Inet address %s:%s is not available: %s" % (addr, port, msg)) 'Inet address 127.0.0.1:7000 is not available: [Errno 10013] An attempt was made to access a socket in a way forbidden by its access permissions\n >> begin captured logging << \ndtest: DEBUG: cluster ccm directory: d:\\temp\\dtest-dpsz3i\n- >> end captured logging << -' {noformat} Failure history: [regression in build #17|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/lastCompletedBuild/testReport/udtencoding_test/TestUDTEncoding/udt_test/history/]. Doesn't look like there was any real change to explain that though. Env: Not sure if repro locally since CASSANDRA-7066 error is in the way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10109) Windows dtest 3.0: ttl_test.py failures
Joshua McKenzie created CASSANDRA-10109: --- Summary: Windows dtest 3.0: ttl_test.py failures Key: CASSANDRA-10109 URL: https://issues.apache.org/jira/browse/CASSANDRA-10109 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x ttl_test.py:TestTTL.update_column_ttl_with_default_ttl_test2 ttl_test.py:TestTTL.update_multiple_columns_ttl_test ttl_test.py:TestTTL.update_single_column_ttl_test Errors locally are different than CI from yesterday. Yesterday on CI we have timeouts and general node hangs. Today on all 3 tests when run locally I see: {noformat} Traceback (most recent call last): File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown raise AssertionError('Unexpected error in %s node log: %s' % (node.name, errors)) AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 16:53:43,120 NoSpamLogger.java:97 - This platform does not support atomic directory streams (SecureDirectoryStream); race conditions when loading sstable files could occurr'] {noformat} This traces back to the commit for CASSANDRA-7066 today by [~Stefania] and [~benedict]. Stefania - care to take this ticket and also look further into whether or not we're going to have issues with 7066 on Windows? That error message certainly *sounds* like it's not a good thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700217#comment-14700217 ] Aleksey Yeschenko commented on CASSANDRA-9917: -- I'm going to be a pedant and bring up another example, just because: you have nodes A and B, A has some hints/batches for B; A is down for 1.5 hours, then A comes up, but B goes down for 1.5 hours. No single node has been down for longer than max hint window, but, assuming gc gs/max hints window of 3 hours, none of the batches or hints have survived. You need repair. And with current MVs - to drop and recreate the MV? What I'm saying is that whatever we do, it's going to be broken. Also that {{max_hints_window_in_ms}} should not be part of any calculations whatsoever, as you can ultimately infer nothing from it. So let's just validate that it's not set to {{0}} and properly document the effects of gc gs in the MV documentation. > MVs should validate gc grace seconds on the tables involved > --- > > Key: CASSANDRA-9917 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Paulo Motta > Labels: materializedviews > Fix For: 3.0 beta 2 > > > For correctness reasons (potential resurrection of dropped values), batchlog > entries are TTLs with the lowest gc grace second of all the tables involved > in a batch. > It means that if gc gs is set to 0 in one of the tables, the batchlog entry > will be dead on arrival, and never replayed. > We should probably warn against such LOGGED writes taking place, in general, > but for MVs, we must validate that gc gs on the base table (and on the MV > table, if we should allow altering gc gs there at all), is never set too low, > or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10094) Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-10094: Attachment: 10094-2.2.txt Backport attached. Also available [here|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:10094-2.2]. Test already available: * [dtest results|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10094-2.2-dtest/lastCompletedBuild/testReport/] * [utest results|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10094-2.2-testall/lastCompletedBuild/testReport/] > Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM > failure > --- > > Key: CASSANDRA-10094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10094 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Paulo Motta > Labels: Windows > Fix For: 2.2.x > > Attachments: 10094-2.2.txt > > > Error: > {noformat} > junit.framework.AssertionFailedError: > at > org.apache.cassandra.db.CommitLogFailurePolicyTest.testCommitLogFailureBeforeInitialization_mustKillJVM(CommitLogFailurePolicyTest.java:149) > {noformat} > [Failure > History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogFailurePolicyTest/testCommitLogFailureBeforeInitialization_mustKillJVM/history/]: > Consistent since build #85 > Env: CI only. Cannot repro locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10108) Windows dtest 3.0: sstablesplit_test.py:TestSSTableSplit.split_test fails
Joshua McKenzie created CASSANDRA-10108: --- Summary: Windows dtest 3.0: sstablesplit_test.py:TestSSTableSplit.split_test fails Key: CASSANDRA-10108 URL: https://issues.apache.org/jira/browse/CASSANDRA-10108 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x Locally: {noformat} -- ma-28-big-Data.db- Exception in thread "main" java.lang.NoClassDefFoundError: org/supercsv/prefs/CsvPreference$Builder at org.apache.cassandra.config.Config.(Config.java:240) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:105) at org.apache.cassandra.service.StorageService.getPartitioner(StorageService.java:220) at org.apache.cassandra.service.StorageService.(StorageService.java:206) at org.apache.cassandra.service.StorageService.(StorageService.java:211) at org.apache.cassandra.schema.LegacySchemaTables.getSchemaPartitionsForTable(LegacySchemaTables.java:295) at org.apache.cassandra.schema.LegacySchemaTables.readSchemaFromSystemTables(LegacySchemaTables.java:210) at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:108) at org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:58) Caused by: java.lang.ClassNotFoundException: org.supercsv.prefs.CsvPreference$Builder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 9 more Number of sstables after split: 1. expected 21.0 {noformat} on CI: {noformat} 21.0 not less than or equal to 2 and [node1 ERROR] Exception calling "CompareTo" with "1" argument(s): "Object must be of type String." At D:\temp\dtest-i3xwjx\test\node1\conf\cassandra-env.ps1:336 char:9 + if ($env:JVM_VERSION.CompareTo("1.8.0_40" -eq -1)) + ~ + CategoryInfo : NotSpecified: (:) [], MethodInvocationException + FullyQualifiedErrorId : ArgumentException -- ma-28-big-Data.db- {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/lastCompletedBuild/testReport/sstablesplit_test/TestSSTableSplit/split_test/history/] Env: both CI and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10107) Windows dtest 3.0: TestScrub and TestScrubIndexes failures
[ https://issues.apache.org/jira/browse/CASSANDRA-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-10107: Summary: Windows dtest 3.0: TestScrub and TestScrubIndexes failures (was: Windows dtest 3.0: ) > Windows dtest 3.0: TestScrub and TestScrubIndexes failures > -- > > Key: CASSANDRA-10107 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10107 > Project: Cassandra > Issue Type: Sub-task > Components: dows dtest 3.0: TestScrub / TestScrubIndexes failures >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie > Labels: Windows > Fix For: 3.0.x > > > scrub_test.py:TestScrub.test_standalone_scrub > scrub_test.py:TestScrub.test_standalone_scrub_essential_files_only > scrub_test.py:TestScrubIndexes.test_standalone_scrub > Somewhat different messages between CI and local, but consistent on env. > Locally, I see: > {noformat} > dtest: DEBUG: ERROR 20:41:20 This platform does not support atomic directory > streams (SecureDirectoryStream); race conditions when loading sstable files > could occurr > {noformat} > Consistently fails, both on CI and locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10107) Windows dtest 3.0:
Joshua McKenzie created CASSANDRA-10107: --- Summary: Windows dtest 3.0: Key: CASSANDRA-10107 URL: https://issues.apache.org/jira/browse/CASSANDRA-10107 Project: Cassandra Issue Type: Sub-task Components: dows dtest 3.0: TestScrub / TestScrubIndexes failures Reporter: Joshua McKenzie Assignee: Joshua McKenzie Fix For: 3.0.x scrub_test.py:TestScrub.test_standalone_scrub scrub_test.py:TestScrub.test_standalone_scrub_essential_files_only scrub_test.py:TestScrubIndexes.test_standalone_scrub Somewhat different messages between CI and local, but consistent on env. Locally, I see: {noformat} dtest: DEBUG: ERROR 20:41:20 This platform does not support atomic directory streams (SecureDirectoryStream); race conditions when loading sstable files could occurr {noformat} Consistently fails, both on CI and locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10094) Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700177#comment-14700177 ] Joshua McKenzie commented on CASSANDRA-10094: - Backport and attach here, I'll review and commit. Sound good? > Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM > failure > --- > > Key: CASSANDRA-10094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10094 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Paulo Motta > Labels: Windows > Fix For: 2.2.x > > > Error: > {noformat} > junit.framework.AssertionFailedError: > at > org.apache.cassandra.db.CommitLogFailurePolicyTest.testCommitLogFailureBeforeInitialization_mustKillJVM(CommitLogFailurePolicyTest.java:149) > {noformat} > [Failure > History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogFailurePolicyTest/testCommitLogFailureBeforeInitialization_mustKillJVM/history/]: > Consistent since build #85 > Env: CI only. Cannot repro locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10055) High CPU load for Cassandra 2.1.8
[ https://issues.apache.org/jira/browse/CASSANDRA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700173#comment-14700173 ] vijay commented on CASSANDRA-10055: --- Benedict, i the jstack and top command are taken relatively close to each other, i will try to get more statistics on this and get back. Thanks > High CPU load for Cassandra 2.1.8 > - > > Key: CASSANDRA-10055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10055 > Project: Cassandra > Issue Type: Bug > Components: Config > Environment: Prod >Reporter: vijay > Attachments: dstst-lcdn.log, dstst-lcdn2.log, dstst-lcdn3.log, > dstst-lcdn4.log, dstst-lcdn5.log, dstst-lcdn6.log, js.log, js2.log, js3.log, > js4.log, js5.log, js6.log, top-bHn1-2.log, top-bHn1-3.log, top-bHn1-4.log, > top-bHn1-5.log, top-bHn1-6.log, top-bHn1.log > > > We are seeing High CPU Load about 80% to 100% in Cassandra 2.1.8 when doing > Data ingest, we did not had this issue in 2.0.x version of Cassandra > we tested this in different Cloud platforms and results are same. > CPU: Tested with M3 2xlarge AWS instances > Ingest rate: Injecting 1 million Inserts each insert is of 1000bytes > no other Operations are happening except inserts in Cassandra > let me know if more info is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10106) Windows dtest 3.0: TestRepair multiple failures
Joshua McKenzie created CASSANDRA-10106: --- Summary: Windows dtest 3.0: TestRepair multiple failures Key: CASSANDRA-10106 URL: https://issues.apache.org/jira/browse/CASSANDRA-10106 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x repair_test.py:TestRepair.dc_repair_test repair_test.py:TestRepair.local_dc_repair_test repair_test.py:TestRepair.simple_parallel_repair_test repair_test.py:TestRepair.simple_sequential_repair_test All failing w/the following error: {noformat} File "D:\Python27\lib\unittest\case.py", line 358, in run self.tearDown() File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\dtest.py", line 532, in tearDown raise AssertionError('Unexpected error in %s node log: %s' % (node.name, errors)) "Unexpected error in node3 node log: ['ERROR [STREAM-IN-/127.0.0.1] 2015-08-17 00:41:09,426 StreamSession.java:520 - [Stream #a69fc140-4478-11e5-a8ae-4f8718583077] Streaming error occurred java.io.IOException: An existing connection was forcibly closed by the remote host \\tat sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_45] \\tat sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_45] \\tat sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_45] \\tat sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_45] \\tat sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_45] \\tat org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53) ~[main/:na] \\tat org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261) ~[main/:na] \\tat java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]']\n >> begin captured logging << \ndtest: DEBUG: cluster ccm directory: d:\\temp\\dtest-3kmbjb\ndtest: DEBUG: Starting cluster..\ndtest: DEBUG: Inserting data...\ndtest: DEBUG: Checking data on node3...\ndtest: DEBUG: Checking data on node1...\ndtest: DEBUG: Checking data on node2...\ndtest: DEBUG: starting repair...\ndtest: DEBUG: Repair time: 5.3782098\ndtest: DEBUG: removing ccm cluster test at: d:\\temp\\dtest-3kmbjb\ndtest: DEBUG: clearing ssl stores from [d:\\temp\\dtest-3kmbjb] directory\n- >> end captured logging << -" {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/repair_test/TestRepair/dc_repair_test/history/] Env: ci and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10094) Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700138#comment-14700138 ] Paulo Motta commented on CASSANDRA-10094: - [~JoshuaMcKenzie] I originally created that patch for 2.2+ in the context of [CASSANDRA-8515|https://issues.apache.org/jira/browse/CASSANDRA-8515]. However there was a problem with the tests not executing correctly on cassci due to other problems. In the mean time, I noticed one of the tests was still not working on Windows and updated the original patch, but forgot to mention it on the JIRA ticket (sorry for that). After tests were passing on cassci, [~benedict] probably commited an older version of the patch. I suppose d2da7606abebd98b11f8b7ec692aa7dcf5388151 was basically to update the original commit to the updated patch, but it was only applied to 3.0+, it needs to be backported to 2.2. > Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM > failure > --- > > Key: CASSANDRA-10094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10094 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Paulo Motta > Labels: Windows > Fix For: 2.2.x > > > Error: > {noformat} > junit.framework.AssertionFailedError: > at > org.apache.cassandra.db.CommitLogFailurePolicyTest.testCommitLogFailureBeforeInitialization_mustKillJVM(CommitLogFailurePolicyTest.java:149) > {noformat} > [Failure > History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogFailurePolicyTest/testCommitLogFailureBeforeInitialization_mustKillJVM/history/]: > Consistent since build #85 > Env: CI only. Cannot repro locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10105) Windows dtest 3.0: TestOfflineTools failures
Joshua McKenzie created CASSANDRA-10105: --- Summary: Windows dtest 3.0: TestOfflineTools failures Key: CASSANDRA-10105 URL: https://issues.apache.org/jira/browse/CASSANDRA-10105 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Assignee: Joshua McKenzie Fix For: 3.0.x offline_tools_test.py:TestOfflineTools.sstablelevelreset_test offline_tools_test.py:TestOfflineTools.sstableofflinerelevel_test Both tests fail with the following: {noformat} Traceback (most recent call last): File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown raise AssertionError('Unexpected error in %s node log: %s' % (node.name, errors)) AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 15:55:05,060 NoSpamLogger.java:97 - This platform does not support atomic directory streams (SecureDirectoryStream); race conditions when loading sstable files could occurr'] {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/jmx_test/TestJMX/netstats_test/history/] Env: ci and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700124#comment-14700124 ] Blake Eggleston commented on CASSANDRA-9749: [~benedict] I think most of that discussion occurred in the first 10 or so comments on this ticket. At least I don't remember there being another discussion outside of it. > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10104) Windows dtest 3.0: jmx_test.py:TestJMX.netstats_test fails
Joshua McKenzie created CASSANDRA-10104: --- Summary: Windows dtest 3.0: jmx_test.py:TestJMX.netstats_test fails Key: CASSANDRA-10104 URL: https://issues.apache.org/jira/browse/CASSANDRA-10104 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x {noformat} Unexpected error in node1 node log: ['ERROR [HintedHandoff:2] 2015-08-16 23:14:04,419 CassandraDaemon.java:191 - Exception in thread Thread[HintedHandoff:2,1,main] org.apache.cassandra.exceptions.WriteFailureException: Operation failed - received 0 responses and 1 failures \tat org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:106) ~[main/:na] \tat org.apache.cassandra.db.HintedHandOffManager.checkDelivered(HintedHandOffManager.java:358) ~[main/:na] \tat org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:414) ~[main/:na] \tat org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:346) ~[main/:na] \tat org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:91) ~[main/:na] \tat org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:537) ~[main/:na] \tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_45] \tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_45] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]'] >> begin captured logging << dtest: DEBUG: cluster ccm directory: d:\temp\dtest-j1ttp3 dtest: DEBUG: Nodetool command 'D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra\bin\nodetool.bat -h localhost -p 7100 netstats' failed; exit status: 1; stdout: Starting NodeTool ; stderr: nodetool: Failed to connect to 'localhost:7100' - ConnectException: 'Connection refused: connect'. dtest: DEBUG: removing ccm cluster test at: d:\temp\dtest-j1ttp3 dtest: DEBUG: clearing ssl stores from [d:\temp\dtest-j1ttp3] directory - >> end captured logging << - {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/jmx_test/TestJMX/netstats_test/history/]. Looks to have regressed on build [#5|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/5/] which seems unlikely given the commit. Env: Both, though on a local run the test fails due to: {noformat} Traceback (most recent call last): File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown raise AssertionError('Unexpected error in %s node log: %s' % (node.name, errors)) AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 15:42:07,717 NoSpamLogger.java:97 - This platform does not support atomic directory streams (SecureDirectoryStream); race conditions when loading sstable files could occurr', 'ERROR [main] 2015-08-17 15:50:43,978 NoSpamLogger.java:97 - This platform does not support atomic directory streams (SecureDirectoryStream); race conditions when loading sstable files could occurr'] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10082) Transactional classes shouldn't also implement streams, channels, etc
[ https://issues.apache.org/jira/browse/CASSANDRA-10082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700117#comment-14700117 ] Blake Eggleston commented on CASSANDRA-10082: - I wouldn't say it's never ever the right thing to do. Though I would say it's not right for most AutoClosable use cases that intersect with Transactional implementations (especially OutputStream). In fact, if you're using a Transactional class as an OutputStream for the purpose of making a write a noop, you may be committing reviewer abuse :). Regarding the class living in SequentialWriter, it's not meant to be a general purpose wrapper, but a one off thing for SequentialWriter. I usually wouldn't create a generic solution unless it turns out to be a problem with more than just SequentialWriter. > Transactional classes shouldn't also implement streams, channels, etc > - > > Key: CASSANDRA-10082 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10082 > Project: Cassandra > Issue Type: Improvement >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Attachments: > 0001-replacing-SequentialWriter-OutputStream-extension-wi.patch > > > Since the close method on the Transactional interface means "abort if commit > hasn't been called", mixing Transactional and AutoCloseable interfaces where > close means "we're done here" is pretty much never the right thing to do. > The only class that does this is SequentialWriter. It's not used in a way > that causes a problem, but it's still a potential hazard for future > development. > The attached patch replaces the SequentialWriter OutputStream implementation > with a wrapper class that implements the expected behavior on close, and adds > a warning to the Transactional interface. It also adds a unit test that > demonstrates the problem without the fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10102) java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1
[ https://issues.apache.org/jira/browse/CASSANDRA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700110#comment-14700110 ] Russ Hatch commented on CASSANDRA-10102: [~iamaleksey] I'm still seeing some kind of issue post-upgrade on 3.0 HEAD, not sure if it's the same problem or not: one node shows: {noformat} ERROR [SharedPool-Worker-1] 2015-08-17 13:38:03,311 Message.java:611 - Unexpected exception during request; channel = [id: 0x68fac00f, /127.0.0.1:57115 => /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.db.ReadCommand$Serializer.serializedSize(ReadCommand.java:520) ~[main/:na] at org.apache.cassandra.db.ReadCommand$Serializer.serializedSize(ReadCommand.java:461) ~[main/:na] at org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:166) ~[main/:na] at org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:72) ~[main/:na] at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:583) ~[main/:na] at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:733) ~[main/:na] at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:676) ~[main/:na] at org.apache.cassandra.net.MessagingService.sendRRWithFailure(MessagingService.java:659) ~[main/:na] at org.apache.cassandra.service.AbstractReadExecutor.makeRequests(AbstractReadExecutor.java:103) ~[main/:na] at org.apache.cassandra.service.AbstractReadExecutor.makeDataRequests(AbstractReadExecutor.java:76) ~[main/:na] at org.apache.cassandra.service.AbstractReadExecutor$AlwaysSpeculatingReadExecutor.executeAsync(AbstractReadExecutor.java:323) ~[main/:na] at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.doInitialQueries(StorageProxy.java:1599) ~[main/:na] at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1554) ~[main/:na] at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1501) ~[main/:na] at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1420) ~[main/:na] at org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:457) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:232) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:202) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:72) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:204) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:470) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:447) ~[main/:na] at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:139) ~[main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507) [main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401) [main/:na] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [main/:na] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [main/:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] {noformat} and another node shows: {noformat} ERROR [HintedHandoff:2] 2015-08-17 13:38:07,612 CassandraDaemon.java:191 - Exception in thread Thread[HintedHandoff:2,1,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException at org.apache.cassandra.db.HintedHandOffManager.compact(HintedHandOffManager.java:281) ~[main/:na] at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:535) ~[main/:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
[jira] [Commented] (CASSANDRA-10094) Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700104#comment-14700104 ] Joshua McKenzie commented on CASSANDRA-10094: - The commit message on that isn't clear as to where it originates: {noformat} commit d2da7606abebd98b11f8b7ec692aa7dcf5388151 Author: Benedict Elliott Smith Date: Mon Aug 17 09:52:13 2015 +0100 fix CommitLogFailurePolicyTest {noformat} [~benedict]: What ticket was that for, if any? And could we get a backport of that fix to 2.2 in order to fix the failing test on that branch? > Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM > failure > --- > > Key: CASSANDRA-10094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10094 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Paulo Motta > Labels: Windows > Fix For: 2.2.x > > > Error: > {noformat} > junit.framework.AssertionFailedError: > at > org.apache.cassandra.db.CommitLogFailurePolicyTest.testCommitLogFailureBeforeInitialization_mustKillJVM(CommitLogFailurePolicyTest.java:149) > {noformat} > [Failure > History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogFailurePolicyTest/testCommitLogFailureBeforeInitialization_mustKillJVM/history/]: > Consistent since build #85 > Env: CI only. Cannot repro locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Witschey updated CASSANDRA-10089: - Reproduced In: 2.2.x, 3.0.x > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700098#comment-14700098 ] Jim Witschey commented on CASSANDRA-10089: -- The NPE [here|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastCompletedBuild/testReport/junit/consistency_test/TestConsistency/short_read_reversed_test/] looks similar, so this likely affects trunk as well. > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10043) A NullPointerException is thrown if the column name is unknown for an IN relation
[ https://issues.apache.org/jira/browse/CASSANDRA-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700095#comment-14700095 ] Benjamin Lerer commented on CASSANDRA-10043: [~snazy] could you review? > A NullPointerException is thrown if the column name is unknown for an IN > relation > - > > Key: CASSANDRA-10043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10043 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Attachments: 10043-2.2.txt, 10043-3.0.txt > > > {code} > cqlsh:test> create table newTable (a int, b int, c int, primary key(a, b)); > cqlsh:test> select * from newTable where d in (1, 2); > ServerError: message="java.lang.NullPointerException"> > {code} > The problem seems to occur only for {{IN}} restrictions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10043) A NullPointerException is thrown if the column name is unknown for an IN relation
[ https://issues.apache.org/jira/browse/CASSANDRA-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-10043: --- Attachment: 10043-3.0.txt 10043-2.2.txt The patches fix the problem and add some unit tests to verify the behaviour. * The results of the unit test for 2.2 are [here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-10043-2.2-dtest/lastCompletedBuild/testReport/] * The results of the Dtest for 2.2 are [here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-10043-2.2-dtest/lastCompletedBuild/testReport/] * The results of the unit test for 3.0 are [here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-10043-3.0-dtest/lastCompletedBuild/testReport/] * The results of the Dtest for 3.0 are [here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-10043-3.0-dtest/lastCompletedBuild/testReport/] > A NullPointerException is thrown if the column name is unknown for an IN > relation > - > > Key: CASSANDRA-10043 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10043 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Attachments: 10043-2.2.txt, 10043-3.0.txt > > > {code} > cqlsh:test> create table newTable (a int, b int, c int, primary key(a, b)); > cqlsh:test> select * from newTable where d in (1, 2); > ServerError: message="java.lang.NullPointerException"> > {code} > The problem seems to occur only for {{IN}} restrictions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700089#comment-14700089 ] Jonathan Ellis commented on CASSANDRA-9917: --- bq. we now need repair Which is why "low" gcgs should be defined as lower than max hint window, because that's what causes problems. > MVs should validate gc grace seconds on the tables involved > --- > > Key: CASSANDRA-9917 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Paulo Motta > Labels: materializedviews > Fix For: 3.0 beta 2 > > > For correctness reasons (potential resurrection of dropped values), batchlog > entries are TTLs with the lowest gc grace second of all the tables involved > in a batch. > It means that if gc gs is set to 0 in one of the tables, the batchlog entry > will be dead on arrival, and never replayed. > We should probably warn against such LOGGED writes taking place, in general, > but for MVs, we must validate that gc gs on the base table (and on the MV > table, if we should allow altering gc gs there at all), is never set too low, > or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9446) Failure detector should ignore local pauses per endpoint
[ https://issues.apache.org/jira/browse/CASSANDRA-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9446: -- Assignee: Stefania (was: Brandon Williams) > Failure detector should ignore local pauses per endpoint > > > Key: CASSANDRA-9446 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9446 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Stefania >Priority: Minor > Attachments: 9446.txt, 9644-v2.txt > > > In CASSANDRA-9183, we added a feature to ignore local pauses. But it will > only not mark 2 endpoints as down. > We should do this per endpoint as suggested by Brandon in CASSANDRA-9183. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10086) Add a "CLEAR" cqlsh command to clear the console
[ https://issues.apache.org/jira/browse/CASSANDRA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700070#comment-14700070 ] Jonathan Ellis commented on CASSANDRA-10086: I don't think we need it earlier than 3.0 since there is a simple alternative in ctrl-L as Paul notes. > Add a "CLEAR" cqlsh command to clear the console > > > Key: CASSANDRA-10086 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10086 > Project: Cassandra > Issue Type: Improvement >Reporter: Paul O'Fallon >Priority: Trivial > Labels: cqlsh, doc-impacting > Attachments: 10086.txt > > > It would be very helpful to have a "CLEAR" command to clear the cqlsh > console. I learned (after researching a patch for this) that lowercase > CTRL+L will clear the screen, but having a discrete command would make that > more obvious. To match the expectations of Windows users, an alias to "CLS" > would be nice as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700069#comment-14700069 ] Aleksey Yeschenko commented on CASSANDRA-9917: -- bq. True, but let me clarify: if no node in the cluster is down for longer than max hint window, and you have no hardware failures, and you have batch commitlog enabled, then you won't need repair. Fair? I might be misunderstanding the context, but no, in general this is not true. A request times out, a hint - or a batchlog entry gets written - a table in the mutation has a low gc gs - the batchlog/hint entry expires before it can be replayed, and we now need repair. bq. Especially since Paulo independently came up with the same value we use as default max hint window here. Right. Independently. > MVs should validate gc grace seconds on the tables involved > --- > > Key: CASSANDRA-9917 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Paulo Motta > Labels: materializedviews > Fix For: 3.0 beta 2 > > > For correctness reasons (potential resurrection of dropped values), batchlog > entries are TTLs with the lowest gc grace second of all the tables involved > in a batch. > It means that if gc gs is set to 0 in one of the tables, the batchlog entry > will be dead on arrival, and never replayed. > We should probably warn against such LOGGED writes taking place, in general, > but for MVs, we must validate that gc gs on the base table (and on the MV > table, if we should allow altering gc gs there at all), is never set too low, > or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8970) Allow custom time_format on cqlsh COPY TO
[ https://issues.apache.org/jira/browse/CASSANDRA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700067#comment-14700067 ] Jonathan Ellis commented on CASSANDRA-8970: --- bq. it becomes a "nice to have" if COPY FROM could interpret the default exported timestamps correctly Did anyone create a ticket for that? > Allow custom time_format on cqlsh COPY TO > - > > Key: CASSANDRA-8970 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8970 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Aaron Ploetz >Priority: Trivial > Labels: cqlsh > Fix For: 2.1.x > > Attachments: CASSANDRA-8970.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > When executing a COPY TO from cqlsh, the user is currently has no control > over the format of exported timestamp columns. If the user has indicated a > {{time_format}} in their cqlshrc file, that format will be used. Otherwise, > the system default format will be used. > The problem comes into play when the timestamp format used on a COPY TO, is > not valid when the data is sent back into Cassandra with a COPY FROM. > For instance, if a user has {{time_format = %Y-%m-%d %H:%M:%S%Z}} specified > in their cqlshrc, COPY TO will format timestamp columns like this: > {{userid|posttime|postcontent}} > {{0|2015-03-14 14:59:00CDT|rtyeryerweh}} > {{0|2015-03-14 14:58:00CDT|sdfsdfsdgfjdsgojr}} > {{0|2015-03-12 14:27:00CDT|sdgfjdsgojr}} > Executing a COPY FROM on that same file will produce an "unable to coerce to > formatted date(long)" error. > Right now, the only way to change the way timestamps are formatted is to exit > cqlsh, modify the {{time_format}} property in cqlshrc, and restart cqlsh. > The ability to specify a COPY option of TIME_FORMAT with a Python strftime > format, would allow the user to quickly alter the timestamp format for > export, without reconfiguring cqlsh. > {{aploetz@cqlsh:stackoverflow> COPY posts1 TO '/home/aploetz/posts1.csv' WITH > DELIMITER='|' AND HEADER=true AND TIME_FORMAT='%Y-%m-%d %H:%M:%S%z;}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10094) Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700063#comment-14700063 ] Paulo Motta commented on CASSANDRA-10094: - I'm able to reproduce it locally. The test is fixed after applying d2da7606abebd98b11f8b7ec692aa7dcf5388151, which was committed to 3.0+. * [dtest results|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10094-2.2-dtest/lastCompletedBuild/testReport/] * [utest results|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10094-2.2-testall/lastCompletedBuild/testReport/] > Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM > failure > --- > > Key: CASSANDRA-10094 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10094 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Paulo Motta > Labels: Windows > Fix For: 2.2.x > > > Error: > {noformat} > junit.framework.AssertionFailedError: > at > org.apache.cassandra.db.CommitLogFailurePolicyTest.testCommitLogFailureBeforeInitialization_mustKillJVM(CommitLogFailurePolicyTest.java:149) > {noformat} > [Failure > History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogFailurePolicyTest/testCommitLogFailureBeforeInitialization_mustKillJVM/history/]: > Consistent since build #85 > Env: CI only. Cannot repro locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700046#comment-14700046 ] Jonathan Ellis commented on CASSANDRA-9917: --- bq. It only affects the decision to write a hint in the first place (down for long than the window? stop writing hints)... It's most often true that if a node has been down for longer than max_hint_window_in_mx, it is going to have data missing, yes. But there are no guarantees that it being down for shorter than that means it doesn't. True, but let me clarify: if no node in the cluster is down for longer than max hint window, and you have no hardware failures, and you have batch commitlog enabled, then you won't need repair. Fair? I don't really see a difference vs min_batchlog_ttl. Especially since Paulo independently came up with the same value we use as default max hint window here. Let's not offer users more tuning knobs than they can meaningfully distinguish between. > MVs should validate gc grace seconds on the tables involved > --- > > Key: CASSANDRA-9917 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Paulo Motta > Labels: materializedviews > Fix For: 3.0 beta 2 > > > For correctness reasons (potential resurrection of dropped values), batchlog > entries are TTLs with the lowest gc grace second of all the tables involved > in a batch. > It means that if gc gs is set to 0 in one of the tables, the batchlog entry > will be dead on arrival, and never replayed. > We should probably warn against such LOGGED writes taking place, in general, > but for MVs, we must validate that gc gs on the base table (and on the MV > table, if we should allow altering gc gs there at all), is never set too low, > or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10102) java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1
[ https://issues.apache.org/jira/browse/CASSANDRA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700032#comment-14700032 ] Aleksey Yeschenko commented on CASSANDRA-10102: --- Can you test with the current cassandra-3.0 head? There was an issue with the alpha, and that was CASSANDRA-9704 not committed as planned. It is now, and this *should* work. > java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1 > --- > > Key: CASSANDRA-10102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10102 > Project: Cassandra > Issue Type: Bug >Reporter: Russ Hatch > Attachments: node1.log, node2.log, node3.log > > > Upgrade tests are showing a potential issue. I'm seeing this during rolling > upgrades to 3.0 alpha 1, after one node has been upgraded to the alpha. > I will attach cassandra logs here, node1.log is where most of the failures > are seen. > {noformat} > ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,888 > CassandraDaemon.java:189 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.UnsupportedOperationException: null > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) > ~[main/:na] > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) > ~[main/:na] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) > ~[main/:na] > INFO [GossipStage:1] 2015-08-17 12:22:06,914 StorageService.java:1886 - Node > /127.0.0.2 state jump to normal > ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,915 > CassandraDaemon.java:189 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.UnsupportedOperationException: null > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) > ~[main/:na] > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) > ~[main/:na] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) > ~[main/:na] > {noformat} > Another exception showing in logs: > {noformat} > ERROR [SharedPool-Worker-1] 2015-08-17 12:22:19,358 ErrorMessage.java:336 - > Unexpected exception during request > java.lang.UnsupportedOperationException: Version is 9 > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serializedSize(PartitionUpdate.java:760) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:334) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:246) > ~[main/:na] > at > org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:166) > ~[main/:na] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:67) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:587) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:737) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:702) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1084) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:125) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:942) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:549) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:720) > ~[main/:na] > at > org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:613) > ~[main/:na] > at > org.apache.cassandra.cql3.statements.ModificationStatement.execute(Modi
[jira] [Created] (CASSANDRA-10103) Windows dtest 3.0: incremental_repair_test.py:TestIncRepair.sstable_repairedset_test fails
Joshua McKenzie created CASSANDRA-10103: --- Summary: Windows dtest 3.0: incremental_repair_test.py:TestIncRepair.sstable_repairedset_test fails Key: CASSANDRA-10103 URL: https://issues.apache.org/jira/browse/CASSANDRA-10103 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x {noformat} File "D:\Python27\lib\unittest\case.py", line 329, in run testMethod() File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\incremental_repair_test.py", line 165, in sstable_repairedset_test self.assertGreaterEqual(len(uniquematches), 2) File "D:\Python27\lib\unittest\case.py", line 948, in assertGreaterEqual self.fail(self._formatMessage(msg, standardMsg)) File "D:\Python27\lib\unittest\case.py", line 410, in fail raise self.failureException(msg) '0 not greater than or equal to 2\n >> begin captured logging << \ndtest: DEBUG: cluster ccm directory: d:\\temp\\dtest-pq7lpx\ndtest: DEBUG: []\n- >> end captured logging << -' {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/hintedhandoff_test/TestHintedHandoffConfig/hintedhandoff_dc_disabled_test/history/] Env: both CI and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10102) java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1
[ https://issues.apache.org/jira/browse/CASSANDRA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-10102: --- Attachment: node3.log node2.log node1.log adding logs > java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1 > --- > > Key: CASSANDRA-10102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10102 > Project: Cassandra > Issue Type: Bug >Reporter: Russ Hatch > Attachments: node1.log, node2.log, node3.log > > > Upgrade tests are showing a potential issue. I'm seeing this during rolling > upgrades to 3.0 alpha 1, after one node has been upgraded to the alpha. > I will attach cassandra logs here, node1.log is where most of the failures > are seen. > {noformat} > ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,888 > CassandraDaemon.java:189 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.UnsupportedOperationException: null > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) > ~[main/:na] > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) > ~[main/:na] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) > ~[main/:na] > INFO [GossipStage:1] 2015-08-17 12:22:06,914 StorageService.java:1886 - Node > /127.0.0.2 state jump to normal > ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,915 > CassandraDaemon.java:189 - Exception in thread > Thread[MessagingService-Incoming-/127.0.0.1,5,main] > java.lang.UnsupportedOperationException: null > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) > ~[main/:na] > at > org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) > ~[main/:na] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[main/:na] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) > ~[main/:na] > {noformat} > Another exception showing in logs: > {noformat} > ERROR [SharedPool-Worker-1] 2015-08-17 12:22:19,358 ErrorMessage.java:336 - > Unexpected exception during request > java.lang.UnsupportedOperationException: Version is 9 > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serializedSize(PartitionUpdate.java:760) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:334) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:246) > ~[main/:na] > at > org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:166) > ~[main/:na] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:67) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:587) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:737) > ~[main/:na] > at > org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:702) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1084) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:125) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:942) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:549) > ~[main/:na] > at > org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:720) > ~[main/:na] > at > org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:613) > ~[main/:na] > at > org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:599) > ~[main/:na] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:204) > ~[main/:na]
[jira] [Created] (CASSANDRA-10102) java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1
Russ Hatch created CASSANDRA-10102: -- Summary: java.lang.UnsupportedOperationException after upgrade to 3.0 alpha1 Key: CASSANDRA-10102 URL: https://issues.apache.org/jira/browse/CASSANDRA-10102 Project: Cassandra Issue Type: Bug Reporter: Russ Hatch Upgrade tests are showing a potential issue. I'm seeing this during rolling upgrades to 3.0 alpha 1, after one node has been upgraded to the alpha. I will attach cassandra logs here, node1.log is where most of the failures are seen. {noformat} ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,888 CassandraDaemon.java:189 - Exception in thread Thread[MessagingService-Incoming-/127.0.0.1,5,main] java.lang.UnsupportedOperationException: null at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) ~[main/:na] at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) ~[main/:na] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[main/:na] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) ~[main/:na] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) ~[main/:na] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) ~[main/:na] INFO [GossipStage:1] 2015-08-17 12:22:06,914 StorageService.java:1886 - Node /127.0.0.2 state jump to normal ERROR [MessagingService-Incoming-/127.0.0.1] 2015-08-17 12:22:06,915 CassandraDaemon.java:189 - Exception in thread Thread[MessagingService-Incoming-/127.0.0.1,5,main] java.lang.UnsupportedOperationException: null at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:485) ~[main/:na] at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:444) ~[main/:na] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[main/:na] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) ~[main/:na] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) ~[main/:na] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) ~[main/:na] {noformat} Another exception showing in logs: {noformat} ERROR [SharedPool-Worker-1] 2015-08-17 12:22:19,358 ErrorMessage.java:336 - Unexpected exception during request java.lang.UnsupportedOperationException: Version is 9 at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serializedSize(PartitionUpdate.java:760) ~[main/:na] at org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:334) ~[main/:na] at org.apache.cassandra.db.Mutation$MutationSerializer.serializedSize(Mutation.java:246) ~[main/:na] at org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:166) ~[main/:na] at org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:67) ~[main/:na] at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:587) ~[main/:na] at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:737) ~[main/:na] at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:702) ~[main/:na] at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1084) ~[main/:na] at org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:125) ~[main/:na] at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:942) ~[main/:na] at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:549) ~[main/:na] at org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:720) ~[main/:na] at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:613) ~[main/:na] at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:599) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:204) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:470) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:447) ~[main/:na] at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:139) ~[main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507) [main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401) [main/:na]
[jira] [Created] (CASSANDRA-10101) Windows dtest 3.0: HintedHandoff tests failing
Joshua McKenzie created CASSANDRA-10101: --- Summary: Windows dtest 3.0: HintedHandoff tests failing Key: CASSANDRA-10101 URL: https://issues.apache.org/jira/browse/CASSANDRA-10101 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Assignee: Joshua McKenzie Fix For: 3.0.x hintedhandoff_test.py:TestHintedHandoffConfig.hintedhandoff_dc_disabled_test hintedhandoff_test.py:TestHintedHandoffConfig.hintedhandoff_dc_reenabled_test hintedhandoff_test.py:TestHintedHandoffConfig.hintedhandoff_disabled_test hintedhandoff_test.py:TestHintedHandoffConfig.hintedhandoff_enabled_test hintedhandoff_test.py:TestHintedHandoffConfig.nodetool_test All are failing with some variant of the following: {noformat} File "D:\Python27\lib\unittest\case.py", line 329, in run testMethod() File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\hintedhandoff_test.py", line 130, in hintedhandoff_dc_disabled_test self.assertEqual('Hinted handoff is running\nData center dc1 is disabled', res.rstrip()) File "D:\Python27\lib\unittest\case.py", line 513, in assertEqual assertion_func(first, second, msg=msg) File "D:\Python27\lib\unittest\case.py", line 506, in _baseAssertEqual raise self.failureException(msg) "'Hinted handoff is running\\nData center dc1 is disabled' != 'Starting NodeTool\\r\\nHinted handoff is running\\r\\nData center dc1 is disabled'\n >> begin captured logging << \ndtest: DEBUG: cluster ccm directory: d:\\temp\\dtest-pddrcf\n- >> end captured logging << -" {noformat} Failure history: consistent for all jobs Env: Both ci and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10100) Windows dtest 3.0: commitlog_test.py:TestCommitLog.stop_failure_policy_test fails
Joshua McKenzie created CASSANDRA-10100: --- Summary: Windows dtest 3.0: commitlog_test.py:TestCommitLog.stop_failure_policy_test fails Key: CASSANDRA-10100 URL: https://issues.apache.org/jira/browse/CASSANDRA-10100 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x {noformat} FAIL: stop_failure_policy_test (commitlog_test.TestCommitLog) -- Traceback (most recent call last): File "c:\src\cassandra-dtest\commitlog_test.py", line 258, in stop_failure_policy_test self.assertTrue(failure, "Cannot find the commitlog failure message in logs") AssertionError: Cannot find the commitlog failure message in logs >> begin captured logging << {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/commitlog_test/TestCommitLog/small_segment_size_test/history/] Env: Both CI and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9872) only check KeyCache when it is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699957#comment-14699957 ] Chris Burroughs commented on CASSANDRA-9872: I *think* this is correct (and simpler!) with all of the 3.0 branch changes. I looked into adding unit tests to `KeyCacheTest` but it's pretty end to end and I didn't see any thing existing tests that call `getCachedPosition` directly. > only check KeyCache when it is enabled > -- > > Key: CASSANDRA-9872 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9872 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Burroughs >Assignee: Chris Burroughs > Labels: cache, metrics > Attachments: j9872-2.0-v1.txt, j9872-3.0-v1.txt > > > If the KeyCache exists (because at least one column family is using it) we > currenlty check the key cache even for requests to column families where the > key cache is disabled. I think it would be better to only check the cache if > entries *could* be there. > * This will align the key cache with how the row cache behaves. > * This makes the key cache metrics much more useful. For example, > 'requests' becomes 'requests to things that could be in the key cache' and > not just 'total requests'. > * This migh be a micro-optimization saving a few metric update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699956#comment-14699956 ] Yuki Morishita commented on CASSANDRA-9882: --- +1 for fixup. Also created CASSANDRA-10099 to further discuss concurrency issue. > DTCS (maybe other strategies) can block flushing when there are lots of > sstables > > > Key: CASSANDRA-9882 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9882 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeremiah Jordan >Assignee: Marcus Eriksson > Labels: dtcs > Fix For: 2.1.9, 2.0.17, 2.2.1, 3.0 beta 1 > > > MemtableFlushWriter tasks can get blocked by Compaction > getNextBackgroundTask. This is in a wonky cluster with 200k sstables in the > CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we > are trying to make these new "smart" strategies that may take some time to > calculate what to do. > {noformat} > "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 > waiting for monitor entry [0x7ff78a667000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237) > - waiting to lock <0x0006fcdbbf60> (a > org.apache.cassandra.db.compaction.WrappingCompactionStrategy) > at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) > at > org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178) > at > org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) > at > org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475) > at > org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - <0x000743b3ac38> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b > waiting for monitor entry [0x7ff78b8ee000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237) > - waiting to lock <0x0006fcdbbf60> (a > org.apache.cassandra.db.compaction.WrappingCompactionStrategy) > at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) > at > org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178) > at > org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) > at > org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475) > at > org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 > runnable [0x7fecce3ea000] >java.lang.Thread.State: RUNNABLE > at > org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628) > at > com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206) > at > com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220) > at > com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74) > at > com.google.common.collect.ImmutableSet$Builder.build(ImmutableSet.java:531) > at com.google.common.collect.Sets$1.immutableCopy(Sets.java:606) > at > org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(Colu
[jira] [Created] (CASSANDRA-10099) Improve concurrency in CompactionStrategyManager
Yuki Morishita created CASSANDRA-10099: -- Summary: Improve concurrency in CompactionStrategyManager Key: CASSANDRA-10099 URL: https://issues.apache.org/jira/browse/CASSANDRA-10099 Project: Cassandra Issue Type: Improvement Reporter: Yuki Morishita Fix For: 3.x Continue discussion from CASSANDRA-9882. CompactionStrategyManager(WrappingCompactionStrategy for <3.0) tracks SSTable changes mainly for separating repaired / unrepaired SSTables (+ LCS manages level). This is blocking operation, and can lead to block of flush etc. when determining next background task takes longer. Explore the way to mitigate this concurrency issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10098) Windows dtest 3.0: commitlog_test.py:TestCommitLog.small_segment_size_test fails
Joshua McKenzie created CASSANDRA-10098: --- Summary: Windows dtest 3.0: commitlog_test.py:TestCommitLog.small_segment_size_test fails Key: CASSANDRA-10098 URL: https://issues.apache.org/jira/browse/CASSANDRA-10098 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x {noformat} File "D:\Python27\lib\unittest\case.py", line 329, in run testMethod() File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\tools.py", line 243, in wrapped f(obj) File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\commitlog_test.py", line 226, in small_segment_size_test self._commitlog_test(segment_size_in_mb, 62.5, 13, files_error=0.2) File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\commitlog_test.py", line 99, in _commitlog_test error=files_error) File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\assertions.py", line 62, in assert_almost_equal assert vmin > vmax * (1.0 - error) or vmin == vmax, "values not within %.2f%% of the max: %s" % (error * 100, args) 'values not within 20.00% of the max: (10, 13)\n >> begin captured logging << \ndtest: DEBUG: cluster ccm directory: d:\\temp\\dtest-qnguzs\n- >> end captured logging << -' {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/commitlog_test/TestCommitLog/small_segment_size_test/] env: Both ci and local -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9872) only check KeyCache when it is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Burroughs updated CASSANDRA-9872: --- Attachment: j9872-3.0-v1.txt > only check KeyCache when it is enabled > -- > > Key: CASSANDRA-9872 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9872 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Chris Burroughs >Assignee: Chris Burroughs > Labels: cache, metrics > Attachments: j9872-2.0-v1.txt, j9872-3.0-v1.txt > > > If the KeyCache exists (because at least one column family is using it) we > currenlty check the key cache even for requests to column families where the > key cache is disabled. I think it would be better to only check the cache if > entries *could* be there. > * This will align the key cache with how the row cache behaves. > * This makes the key cache metrics much more useful. For example, > 'requests' becomes 'requests to things that could be in the key cache' and > not just 'total requests'. > * This migh be a micro-optimization saving a few metric update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10097) Windows dtest 3.0: bootstrap_test.py:TestBootstrap.bootstrap_with_reset_bootstrap_state_test fails
Joshua McKenzie created CASSANDRA-10097: --- Summary: Windows dtest 3.0: bootstrap_test.py:TestBootstrap.bootstrap_with_reset_bootstrap_state_test fails Key: CASSANDRA-10097 URL: https://issues.apache.org/jira/browse/CASSANDRA-10097 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Fix For: 3.0.x {noformat} File "D:\Python27\lib\unittest\case.py", line 329, in run testMethod() File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\tools.py", line 243, in wrapped f(obj) File "D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra-dtest\bootstrap_test.py", line 184, in bootstrap_with_reset_bootstrap_state_test node3.watch_log_for("Resetting bootstrap progress to start fresh", from_mark=mark) File "build\bdist.win-amd64\egg\ccmlib\node.py", line 382, in watch_log_for raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + reads) {noformat} Failure history: [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/bootstrap_test/TestBootstrap/bootstrap_with_reset_bootstrap_state_test/history/] Env: both ci and locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9917) MVs should validate gc grace seconds on the tables involved
[ https://issues.apache.org/jira/browse/CASSANDRA-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-9917: - Reviewer: Aleksey Yeschenko (was: Marcus Eriksson) > MVs should validate gc grace seconds on the tables involved > --- > > Key: CASSANDRA-9917 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9917 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Paulo Motta > Labels: materializedviews > Fix For: 3.0 beta 2 > > > For correctness reasons (potential resurrection of dropped values), batchlog > entries are TTLs with the lowest gc grace second of all the tables involved > in a batch. > It means that if gc gs is set to 0 in one of the tables, the batchlog entry > will be dead on arrival, and never replayed. > We should probably warn against such LOGGED writes taking place, in general, > but for MVs, we must validate that gc gs on the base table (and on the MV > table, if we should allow altering gc gs there at all), is never set too low, > or else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10096) SerializationHelper should provide a rewindable in-order tester
Benedict created CASSANDRA-10096: Summary: SerializationHelper should provide a rewindable in-order tester Key: CASSANDRA-10096 URL: https://issues.apache.org/jira/browse/CASSANDRA-10096 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Priority: Minor Fix For: 3.x When deserializing a row we perform a logarithmic lookup on column name for every cell. There is also a lot of unnecessary indirection to reach this method call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9922) Add Materialized View WHERE schema support
[ https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699880#comment-14699880 ] Aleksey Yeschenko commented on CASSANDRA-9922: -- Basically, {{WHERE}} support will be a new feature, and only new nodes will be able to support it. Same way with 3.0 and materialized views - can't properly use them until your whole cluster is on 3.0. So putting this in 3.0.0 would be nice, and would be my preference - if we find time, but if not, I'm ready to deal with schema-level ugliness necessary later in 3.x. > Add Materialized View WHERE schema support > -- > > Key: CASSANDRA-9922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9922 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > Fix For: 3.x > > > In order to provide forward compatibility with the 3.x series, we should add > schema support for capturing the where clause of the MV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract
[ https://issues.apache.org/jira/browse/CASSANDRA-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699879#comment-14699879 ] Sylvain Lebresne commented on CASSANDRA-9901: - bq. One of the advantages of using an enum that I did not enumerate was the possibility of performing more efficient despatch for "mixed" clustering data Alright, fair enough. I guess one of the thing that put me off is that {{COMPARE_COMPARABLE}} sounds really weird/unintuitive to me. If we go with the {{compareValue}} change I discuss above, then I'd suggest renaming the enum to something like: {noformat} enum ComparisonType { UNCOMPARABLE, BYTE_ORDER, CUSTOM } {noformat} and the {{compareValue}} is discussed above could be instead {{compareCustom()}}. We'd also wouldn't need {{isByteOrderComparable}} I believe since it would be directly handled by the {{compare}} (which won't be a virtual call). bq. Admittedly I haven't confirmed this, but it looks fine to me I read that too quickly and missed the package check, my bad. I guess it's note entirely full-proof, but we probably can't do much better short of having a white-list which would be ugly so I'm fine with that. bq. I prefer to log more often than less, since there's more chance of it being spotted. I don't think we rebuild so often - just during schema changes, no? {{rebuild}} happens every time a {{CFMetaData}} is created and validated, which means at least on every startup and multiple times per schema change (since it's called during validation), and that's not counting the case I forget. A bit of context is also that I strongly suspect that while there is likely people already using custom types, I don't think there is all that many created new custom types now that we provide a relatively rich amount of types out of the box (that was not always the case). So that I'm nore worried about annoying people that have existing custom types, for which the message is basically useless since it's not currently actionable and they can't miss the change anyway since they'll have to update their own code. In fact, I'm not even really sure a warning is necessary in the first place. As said, for people already having a custom type, the warning is mostly annoyance. And for new user than might decide to write a custom type, I think being extra clear on the javadoc of the {{compareCustom()}} method that you should not implement it in newly created types would be fair enough warnings (we can additional add to the {{AbstractType}} javadoc that creating custom subclasses is frown upon nowadays). > Make AbstractType.isByteOrderComparable abstract > > > Key: CASSANDRA-9901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9901 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 3.0 beta 2 > > > I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably > sure we agreed to make this method abstract, put some javadoc explaining we > may require fields to yield true in the near future, and potentially log a > warning on startup if a user-defined type returns false. > This should make it into 3.0, IMO, so that we can look into migrating to > byte-order comparable types in the post-3.0 world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9922) Add Materialized View WHERE schema support
[ https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699871#comment-14699871 ] Aleksey Yeschenko commented on CASSANDRA-9922: -- Right, but it's not just about schema. If an older node doesn't have the code to handle {{WHERE}}, it won't be able to support those queries either way - even if there is schema support for it. > Add Materialized View WHERE schema support > -- > > Key: CASSANDRA-9922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9922 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > Fix For: 3.x > > > In order to provide forward compatibility with the 3.x series, we should add > schema support for capturing the where clause of the MV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9857) Deal with backward compatibilty issue of broken AbstractBounds serialization
[ https://issues.apache.org/jira/browse/CASSANDRA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9857: -- Fix Version/s: (was: 3.0.0 rc1) 3.0 beta 2 > Deal with backward compatibilty issue of broken AbstractBounds serialization > > > Key: CASSANDRA-9857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9857 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 3.0 beta 2 > > > This ticket is related to CASSANDRA-9856 and CASSANDRA-9775. Even if the > broken/incomplete serialization of {{AbstractBounds}} is not a problem per-se > for pre-3.0 versions, it's still a problem for trunk and even though it's > fixed by CASSANDRA-9775 for 3.0 nodes, it might be a problem for 3.0 nodes > talking to older nodes. > As the paging tests where those that exposed the problem (on trunk) in the > first place, it would be nice to modify said paging tests to work on mixed > version clustering so we can valid if it is a problem. If it is, then we'll > probably have to add redundant checks on trunk so they ignore anything the > 3.0 node sends incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9922) Add Materialized View WHERE schema support
[ https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699844#comment-14699844 ] Carl Yeksigian commented on CASSANDRA-9922: --- In order to actually support using the WHERE clause, we'll need to make sure that the nodes have been upgraded, otherwise old nodes won't be processing mutations in the same way as upgraded nodes. If this does slip, we'll already be preventing using WHERE clauses in the case we haven't upgraded all the nodes to support it. > Add Materialized View WHERE schema support > -- > > Key: CASSANDRA-9922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9922 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > Fix For: 3.x > > > In order to provide forward compatibility with the 3.x series, we should add > schema support for capturing the where clause of the MV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10076) Windows dtest 2.2: thrift_hsha_test.py:ThriftHSHATest.test_6285
[ https://issues.apache.org/jira/browse/CASSANDRA-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-10076: Assignee: Paulo Motta Reviewer: Joshua McKenzie > Windows dtest 2.2: thrift_hsha_test.py:ThriftHSHATest.test_6285 > --- > > Key: CASSANDRA-10076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10076 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joshua McKenzie >Assignee: Paulo Motta > Labels: Windows > Fix For: 2.2.x > > > [Error > text|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/61/testReport/thrift_hsha_test/ThriftHSHATest/test_6285/]: > {noformat} > Unexpected error in node1 node log: ['ERROR > [MessagingService-Outgoing-/127.0.0.2] 2015-08-13 18:27:05,264 > OutboundTcpConnection.java:318 - error writing to /127.0.0.2 > java.lang.RuntimeException: java.io.IOException: An established connection > was aborted by the software in your host machine \tat > org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:85) > ~[main/:na] \tat > org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:70) > ~[main/:na] \tat > org.apache.cassandra.db.Mutation$MutationSerializer.serialize(Mutation.java:286) > ~[main/:na] \tat > org.apache.cassandra.db.Mutation$MutationSerializer.serialize(Mutation.java:272) > ~[main/:na] \tat > org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:125) > ~[main/:na] \tat > org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:335) > [main/:na] \tat > org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:287) > [main/:na] \tat > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:221) > [main/:na] Caused by: java.io.IOException: An established connection was > aborted by the software in your host machine \tat > sun.nio.ch.SocketDispatcher.write0(Native Method) ~[na:1.8.0_51] \tat > sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) ~[na:1.8.0_51] > \tat sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_51] > \tat sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_51] \tat > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_51] > \tat > {noformat} > [Failure > History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/61/testReport/thrift_hsha_test/ThriftHSHATest/test_6285/history/] > (flaky) > Env: CI only. Passes locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned
[ https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-10068: -- Fix Version/s: (was: 3.0.0 rc1) 3.0 beta 2 > Batchlog replay fails with exception after a node is decommissioned > --- > > Key: CASSANDRA-10068 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10068 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Marcus Eriksson > Fix For: 3.0 beta 2 > > Attachments: n1.log, n2.log, n3.log, n4.log, n5.log > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > At the conclusion of the test, a batchlog replay is initiated through > nodetool and hits the following assertion due to a missing host ID: > https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 > A nodetool status on the node with failed batchlog replay shows the following > entry for the decommissioned node: > DN 10.0.0.5 ? 256 ? null > rack1 > On the unaffected nodes, there is no entry for the decommissioned node as > expected. > There are occasional hits of the same assertions for logs in other nodes; it > looks like the issue might occasionally resolve itself, but one node seems to > have the errant null entry indefinitely. > In logs for the nodes, this possibly unrelated exception also appears: > java.lang.RuntimeException: Trying to get the view natural endpoint on a > non-data replica > at > org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) > ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] > I have a running cluster with the issue on my machine; it is also repeatable. > Nothing stands out in the logs of the decommissioned node (n4) for me. The > logs of each node in the cluster are attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10095) Fix dtests on 3.0 branch on Windows
Joshua McKenzie created CASSANDRA-10095: --- Summary: Fix dtests on 3.0 branch on Windows Key: CASSANDRA-10095 URL: https://issues.apache.org/jira/browse/CASSANDRA-10095 Project: Cassandra Issue Type: Bug Reporter: Joshua McKenzie Assignee: Joshua McKenzie Fix For: 3.0.x Parent ticket to track subtasks for dtest failures on Windows on the 3.0 branch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9922) Add Materialized View WHERE schema support
[ https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699829#comment-14699829 ] Aleksey Yeschenko commented on CASSANDRA-9922: -- Would be nice to do this in rc1, but we really can add it later, even though it will be more painful. Not too painful. > Add Materialized View WHERE schema support > -- > > Key: CASSANDRA-9922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9922 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > Fix For: 3.x > > > In order to provide forward compatibility with the 3.x series, we should add > schema support for capturing the where clause of the MV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9922) Add Materialized View WHERE schema support
[ https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-9922: - Fix Version/s: (was: 3.0.0 rc1) 3.x > Add Materialized View WHERE schema support > -- > > Key: CASSANDRA-9922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9922 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > Fix For: 3.x > > > In order to provide forward compatibility with the 3.x series, we should add > schema support for capturing the where clause of the MV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699822#comment-14699822 ] Ariel Weisberg edited comment on CASSANDRA-9749 at 8/17/15 5:06 PM: I remember that conversation as well. Part of that was making an effort to distinguish between expected failures (end of log) and unexpected ones. I think there is going to be some pain there because it's really hard to tell the two apart. I am not dead set against stop as a default behavior for replay, I just don't think linking the two settings together -isn't- is a good idea. Extra negatives were not intentional. was (Author: aweisberg): I remember that conversation as well. Part of that was making an effort to distinguish between expected failures (end of log) and unexpected ones. I think there is going to be some pain there because it's really hard to tell the two apart. I am not dead set against stop as a default behavior for replay, I just don't think linking the two settings together -isn't- a good idea. Extra negatives were not intentional. > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699822#comment-14699822 ] Ariel Weisberg edited comment on CASSANDRA-9749 at 8/17/15 5:06 PM: I remember that conversation as well. Part of that was making an effort to distinguish between expected failures (end of log) and unexpected ones. I think there is going to be some pain there because it's really hard to tell the two apart. I am not dead set against stop as a default behavior for replay, I just don't think linking the two settings together -isn't- a good idea. Extra negatives were not intentional. was (Author: aweisberg): I remember that conversation as well. Part of that was making an effort to distinguish between expected failures (end of log) and unexpected ones. I think there is going to be some pain there because it's really hard to tell the two apart. I am not dead set against stop as a default behavior for replay, I just don't think linking the two settings together isn't a good idea. > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699826#comment-14699826 ] Benedict commented on CASSANDRA-9749: - bq. I just don't think linking the two settings together isn't a good idea. That was too many negatives for me to parse (and be confident you'd typed correctly) :) I'll note FTR I don't (and didn't) have a strong position on this. > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9505) Expose sparse formatting via JMX and/or sstablemetadata
[ https://issues.apache.org/jira/browse/CASSANDRA-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko resolved CASSANDRA-9505. -- Resolution: Not A Problem Agreed. This is a non-issue. > Expose sparse formatting via JMX and/or sstablemetadata > --- > > Key: CASSANDRA-9505 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9505 > Project: Cassandra > Issue Type: Improvement >Reporter: Jim Witschey > Fix For: 3.0.0 rc1 > > > It'd be helpful for us in TE if we could differentiate between data written > in the sparse and dense formats as described > [here|https://github.com/pcmanus/cassandra/blob/8099/guide_8099.md#storage-format-on-disk-and-on-wire]. > It'd help us to measure speed and space performance and to make sure the > format is chosen correctly and consistently. > I don't know if this would be best exposed through a JMX endpoint, > {{sstablemetadata}}, or both, but those seem like the most obvious exposure > points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699822#comment-14699822 ] Ariel Weisberg commented on CASSANDRA-9749: --- I remember that conversation as well. Part of that was making an effort to distinguish between expected failures (end of log) and unexpected ones. I think there is going to be some pain there because it's really hard to tell the two apart. I am not dead set against stop as a default behavior for replay, I just don't think linking the two settings together isn't a good idea. > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9892) Add support for unsandboxed UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699812#comment-14699812 ] Jonathan Ellis commented on CASSANDRA-9892: --- Let's push this to 3.2 rather than feature creeping 3.0. > Add support for unsandboxed UDF > --- > > Key: CASSANDRA-9892 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9892 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > From discussion on CASSANDRA-9402, > The approach postgresql takes is to distinguish between "trusted" (sandboxed) > and "untrusted" (anything goes) UDF languages. > Creating an untrusted language always requires superuser mode. Once that is > done, creating functions in it requires nothing special. > Personally I would be fine with this approach, but I think it would be more > useful to have the extra permission on creating the function, and also > wouldn't require adding explicit CREATE LANGUAGE. > So I'd suggest just providing different CQL permissions for trusted and > untrusted, i.e. if you have CREATE FUNCTION permission that allows you to > create sandboxed UDF, but you can only create unsandboxed if you have CREATE > UNTRUSTED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9892) Add support for unsandboxed UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9892: -- Fix Version/s: (was: 3.0.0 rc1) 3.x > Add support for unsandboxed UDF > --- > > Key: CASSANDRA-9892 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9892 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp >Priority: Minor > Fix For: 3.x > > > From discussion on CASSANDRA-9402, > The approach postgresql takes is to distinguish between "trusted" (sandboxed) > and "untrusted" (anything goes) UDF languages. > Creating an untrusted language always requires superuser mode. Once that is > done, creating functions in it requires nothing special. > Personally I would be fine with this approach, but I think it would be more > useful to have the extra permission on creating the function, and also > wouldn't require adding explicit CREATE LANGUAGE. > So I'd suggest just providing different CQL permissions for trusted and > untrusted, i.e. if you have CREATE FUNCTION permission that allows you to > create sandboxed UDF, but you can only create unsandboxed if you have CREATE > UNTRUSTED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10093) Invalid internal query for static compact tables
[ https://issues.apache.org/jira/browse/CASSANDRA-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699808#comment-14699808 ] Aleksey Yeschenko commented on CASSANDRA-10093: --- +1 so long as cassci is happy. > Invalid internal query for static compact tables > > > Key: CASSANDRA-10093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10093 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 3.0 beta 1 > > > When dealing with static compact table on the CQL side and we do a {{SELECT * > FROM table;}} query, we generate the wrong clustering filter. More precisely, > we create a name query that selects the {{EMPTY}} clustering, but that's an > invalid clustering since static compact table have 1 clustering column > (internally at least). What we really want to query is the static parts. > This is the reason for the failure of some dtests > ({{bootstrap_test:TestBootstrap.read_from_bootstrapped_node_test}} for > instance). More precisely, the invalid filter created breaks serialization, > which is why this is only really a problem on multi-node tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699803#comment-14699803 ] Benedict commented on CASSANDRA-9749: - I cannot find where the conversation happened, so perhaps it was on IRC, but the consensus had shifted since we last discussed this over a year ago. There was wide support for failing on startup if the commit log is corrupted, and printing an error message for the user to opt into continuing in the face of those errors. iirc, [~aweisberg], [~bdeggleston] and [~jjordan] were participants, amongst others, so perhaps they can corroborate this since I cannot find a reference. > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract
[ https://issues.apache.org/jira/browse/CASSANDRA-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699798#comment-14699798 ] Benedict commented on CASSANDRA-9901: - bq. What I could suggest, on top of If we were to say "instead of" and we stuck with the enum, I'd be with you. One of the advantages of using an enum that I did not enumerate was the possibility of performing more efficient despatch for "mixed" clustering data (i.e. with some byte comparable, some not), especially given we now always consult the boolean parameter from the class property, since the comparisons of each clustering column is performed in a different method call now (so if we have a different class property to consult as cheaply, we may as well do so). Having two virtual invocations instead of one on this path is a pretty significant burden we should avoid, however. bq. From a quick look, it looks like the patch will log warning for every internal type that is not byte comparable {code} +if (!getClass().getPackage().equals(AbstractType.class.getPackage())) +logger.warn("Type " + this + " is not comparable by its unsigned sequence of raw bytes. A future (major) release of Cassandra may remove support for such arbitrary comparisons, however upgrade steps will be provided to ensure a smooth transition."); {code} Admittedly I haven't confirmed this, but it looks fine to me, and I'll double check before we commit. bq. Also, logging in CFMetadaData.rebuild is going to be more noisy than necessary I prefer to log more often than less, since there's more chance of it being spotted. I don't think we rebuild _so_ often - just during schema changes, no? > Make AbstractType.isByteOrderComparable abstract > > > Key: CASSANDRA-9901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9901 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 3.0.0 rc1 > > > I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably > sure we agreed to make this method abstract, put some javadoc explaining we > may require fields to yield true in the near future, and potentially log a > warning on startup if a user-defined type returns false. > This should make it into 3.0, IMO, so that we can look into migrating to > byte-order comparable types in the post-3.0 world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699792#comment-14699792 ] Jonathan Ellis commented on CASSANDRA-7066: --- Assuming we apply disk_failure_policy to the corrupt xlog, then if we've started up despite that then either policy was ignore, or user manually moved the xlog and restarted without it. So IMO we should always move them aside since user either explicitly or implicitly wants that. > Simplify (and unify) cleanup of compaction leftovers > > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania >Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9749) CommitLogReplayer continues startup after encountering errors
[ https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699778#comment-14699778 ] Jonathan Ellis commented on CASSANDRA-9749: --- bq. Due to CASSANDRA-8515, the effective commit log failure policy in 3.0+ at time of replay is always 'die'. Hmm, was that intended? /cc [~pauloricardomg] [~benedict] > CommitLogReplayer continues startup after encountering errors > - > > Key: CASSANDRA-9749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9749 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Branimir Lambov > Fix For: 2.2.x > > Attachments: 9749-coverage.tgz > > > There are a few places where the commit log recovery method either skips > sections or just returns when it encounters errors. > Specifically if it can't read the header here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298 > Or if there are compressor problems here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314 > and here: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366 > Whether these are user-fixable or not, I think we should require more direct > user intervention (ie: fix what's wrong, or remove the bad file and restart) > since we're basically losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9974) Improve debuggability
[ https://issues.apache.org/jira/browse/CASSANDRA-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9974: -- Fix Version/s: 3.x > Improve debuggability > - > > Key: CASSANDRA-9974 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9974 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 3.x > > > While 8099 has brought a number of improvements, currently it is making > debugging a bit of a nightmare (for me at least). This slows down development > and test resolution, and so we should fix it sooner than later. This ticket > is intended to aggregate tickets that will improve this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract
[ https://issues.apache.org/jira/browse/CASSANDRA-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699767#comment-14699767 ] Sylvain Lebresne commented on CASSANDRA-9901: - >From a quick look, it looks like the patch will log warning for every internal >type that is not byte comparable, which is not what we want. Also, logging in >{{CFMetadaData.rebuild}} is going to be more noisy than necessary since that's >called reasonably often. Ideally we'd want to only warn when the type is used >in the first place. On a less important note, but I'm not a fan of using an enum. I'm not convinced it'll add clarity for the user, but on the other side, we don't validate that what the enum said is consistent with what the compare method does which feels error prone to me. I also find it more clunky (than just making {{isByteOrderComparable}} abstract) but that's probably more a question of personal taste. What I could suggest, on top of making {{isByteOrderComparable}} abstract, is to create some {{compareValue()}} (or some other name) that would be the existing {{compare()}}, and the {{compare()}} we actually used would basically be: {noformat} public final int compare(ByteBuffer b1, ByteBuffer b2) { return isByteBufferComparable() ? ByteBufferUtil.compareUnsigned(b1, b2) : compareValue(b1, b2); } {noformat} And {{compareValue}} would be abstract, throwing {{UnsupportedOperationException}} by default, and only the implementations that are not bytes comparable would have to provide it. > Make AbstractType.isByteOrderComparable abstract > > > Key: CASSANDRA-9901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9901 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict > Fix For: 3.0.0 rc1 > > > I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably > sure we agreed to make this method abstract, put some javadoc explaining we > may require fields to yield true in the near future, and potentially log a > warning on startup if a user-defined type returns false. > This should make it into 3.0, IMO, so that we can look into migrating to > byte-order comparable types in the post-3.0 world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9414) Windows utest 2.2: org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty intermittent failure
[ https://issues.apache.org/jira/browse/CASSANDRA-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-9414: --- Reviewer: Paulo Motta > Windows utest 2.2: org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty > intermittent failure > -- > > Key: CASSANDRA-9414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9414 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Minor > Labels: Windows > Fix For: 2.2.x > > > Failure is intermittent enough that bisect is proving to be more hassle than > it's worth. Seems pretty consistent in CI. > {noformat} > [junit] Testcase: > testDeleteIfNotDirty(org.apache.cassandra.db.CommitLogTest): Caused an > ERROR > [junit] java.nio.file.AccessDeniedException: > build\test\cassandra\commitlog;0\CommitLog-5-1431965988394.log > [junit] FSWriteError in > build\test\cassandra\commitlog;0\CommitLog-5-1431965988394.log > [junit] at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) > [junit] at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:148) > [junit] at > org.apache.cassandra.db.commitlog.CommitLogSegmentManager.recycleSegment(CommitLogSegmentManager.java:360) > [junit] at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:166) > [junit] at > org.apache.cassandra.db.commitlog.CommitLog.startUnsafe(CommitLog.java:416) > [junit] at > org.apache.cassandra.db.commitlog.CommitLog.resetUnsafe(CommitLog.java:389) > [junit] at > org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty(CommitLogTest.java:178) > [junit] Caused by: java.nio.file.AccessDeniedException: > build\test\cassandra\commitlog;0\CommitLog-5-1431965988394.log > [junit] at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) > [junit] at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > [junit] at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) > [junit] at > sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) > [junit] at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > [junit] at java.nio.file.Files.delete(Files.java:1126) > [junit] at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699744#comment-14699744 ] Benedict commented on CASSANDRA-7066: - bq. So the scenario is, we crash hard AND suffer xlog corruption so we don't know which sstables are in-progress? Right. bq. (Is offline scrub xlog-aware? It probably should be.) It is, but it hard fails on encountering a corrupted txn log; the operator can then manually delete that log if they so desire (or move it aside, stash it, whatever) What about the sstables though? Right now we just leave them all there, but the last "new" file may be partially written, which will end up crashing some read queries. So the question is if we just fail and alert the user, or if we try to establish that this is the case and stash those that are corrupted, or if we just always move them aside. > Simplify (and unify) cleanup of compaction leftovers > > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania >Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10094) Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure
Joshua McKenzie created CASSANDRA-10094: --- Summary: Windows utest 2.2: testCommitLogFailureBeforeInitialization_mustKillJVM failure Key: CASSANDRA-10094 URL: https://issues.apache.org/jira/browse/CASSANDRA-10094 Project: Cassandra Issue Type: Bug Reporter: Joshua McKenzie Assignee: Paulo Motta Fix For: 2.2.x Error: {noformat} junit.framework.AssertionFailedError: at org.apache.cassandra.db.CommitLogFailurePolicyTest.testCommitLogFailureBeforeInitialization_mustKillJVM(CommitLogFailurePolicyTest.java:149) {noformat} [Failure History|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogFailurePolicyTest/testCommitLogFailureBeforeInitialization_mustKillJVM/history/]: Consistent since build #85 Env: CI only. Cannot repro locally -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-9683: -- Fix Version/s: (was: 2.1.x) 2.0.17 2.1.9 > Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra > 2.1.7 > -- > > Key: CASSANDRA-9683 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 > Project: Cassandra > Issue Type: Bug > Environment: Ubuntu 12.04 (3.13 Kernel) * 3 > JDK: Oracle JDK 7 > RAM: 32GB > Cores 4 (+4 HT) >Reporter: Loic Lambiel >Assignee: Ariel Weisberg > Fix For: 2.1.9 > > Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, > os_load.png, pending_compactions.png, read_latency.png, schema.txt, > system.log, write_latency.png > > > After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, > the average load grows from 0.1-0.3 to 1.8. > Latencies did increase as well. > We see an increase of pending compactions, probably due to CASSANDRA-9592. > This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-9683: -- Fix Version/s: (was: 2.0.17) > Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra > 2.1.7 > -- > > Key: CASSANDRA-9683 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 > Project: Cassandra > Issue Type: Bug > Environment: Ubuntu 12.04 (3.13 Kernel) * 3 > JDK: Oracle JDK 7 > RAM: 32GB > Cores 4 (+4 HT) >Reporter: Loic Lambiel >Assignee: Ariel Weisberg > Fix For: 2.1.9 > > Attachments: cassandra-env.sh, cassandra.yaml, cfstats.txt, > os_load.png, pending_compactions.png, read_latency.png, schema.txt, > system.log, write_latency.png > > > After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, > the average load grows from 0.1-0.3 to 1.8. > Latencies did increase as well. > We see an increase of pending compactions, probably due to CASSANDRA-9592. > This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10084) Very slow performance streaming a large query from a single CF
[ https://issues.apache.org/jira/browse/CASSANDRA-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699733#comment-14699733 ] Benedict commented on CASSANDRA-10084: -- bq. A whole new column family should work, right? Yes, absolutely. bq. would upgrading to 3.0 fix this? I would be very surprised if it didn't. I won't promise, as there are a lot of unknowns, but given my assumptions about the problem, and the changes to 3.0: yes. bq. how soon is that? Not long, but I'd rather not give you our targets in case we slip :) > Very slow performance streaming a large query from a single CF > -- > > Key: CASSANDRA-10084 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10084 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.1.8 > 12GB EC2 instance > 12 node cluster > 32 concurrent reads > 32 concurrent writes > 6GB heap space >Reporter: Brent Haines > Attachments: cassandra.yaml > > > We have a relatively simple column family that we use to track event data > from different providers. We have been utilizing it for some time. Here is > what it looks like: > {code} > CREATE TABLE data.stories_by_text ( > ref_id timeuuid, > second_type text, > second_value text, > object_type text, > field_name text, > value text, > story_id timeuuid, > data map, > PRIMARY KEY ((ref_id, second_type, second_value, object_type, > field_name), value, story_id) > ) WITH CLUSTERING ORDER BY (value ASC, story_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = 'Searchable fields and actions in a story are indexed by > ref id which corresponds to a brand, app, app instance, or user.' > AND compaction = {'min_threshold': '4', 'cold_reads_to_omit': '0.0', > 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > {code} > We will, on a daily basis pull a query of the complete data for a given > index, it will look like this: > {code} > select * from stories_by_text where ref_id = > f0124740-2f5a-11e5-a113-03cdf3f3c6dc and second_type = 'Day' and second_value > = '20150812' and object_type = 'booshaka:user' and field_name = 'hashedEmail'; > {code} > In the past, we have been able to pull millions of records out of the CF in a > few seconds. We recently added the data column so that we could filter on > event data and provide more detailed analysis of activity for our reports. > The data map, declared with 'data map' is very small; only 2 or 3 > name/value pairs. > Since we have added this column, our streaming query performance has gone > straight to hell. I just ran the above query and it took 46 minutes to read > 86K rows and then it timed out. > I am uncertain what other data you need to see in order to diagnose this. We > are using STCS and are considering a change to Leveled Compaction. The table > is repaired nightly and the updates, which are at a very fast clip will only > impact the partition key for today, while the queries are for previous days > only. > To my knowledge these queries no longer finish ever. They time out, even > though I put a 60 second timeout on the read for the cluster. I can watch it > pause for 30 to 50 seconds many times during the stream. > Again, this only started happening when we added the data column. > Please let me know what else you need for this. It is having a very big > impact on our system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10056) Fix AggregationTest post-test error messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699732#comment-14699732 ] Benjamin Lerer commented on CASSANDRA-10056: LGTM > Fix AggregationTest post-test error messages > > > Key: CASSANDRA-10056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10056 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Trivial > Fix For: 2.2.x > > > AggregationTest prints error messages after test execution since some UDT > cannot be dropped. It's not critical to the tests themselves but makes the > log cleaner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9906) get_slice and multiget_slice failing on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699729#comment-14699729 ] Benjamin Lerer commented on CASSANDRA-9906: --- I will have another look at the patch as this change is probably wrong. The original {{ColumnFilter}} was not working properly and the thrift call was not returning anything. I changed it to make it work but I did not realize that I was returning more data than expected. > get_slice and multiget_slice failing on trunk > - > > Key: CASSANDRA-9906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9906 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Mike Adamson >Assignee: Benjamin Lerer >Priority: Blocker > Fix For: 3.0.0 rc1 > > Attachments: 9906.txt, dtest-CASSANDRA-9906.txt > > > {{get_slice}} and {{multiget_slice}} are failing on trunk with the following > error: > {noformat} > java.lang.AssertionError: null > at > org.apache.cassandra.db.filter.ClusteringIndexNamesFilter.(ClusteringIndexNamesFilter.java:53) > ~[cassandra-all-3.0.0.592.jar:3.0.0.592] > at > org.apache.cassandra.thrift.CassandraServer.toInternalFilter(CassandraServer.java:405) > ~[cassandra-all-3.0.0.592.jar:5.0.0-SNAPSHOT] > at > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:547) > ~[cassandra-all-3.0.0.592.jar:5.0.0-SNAPSHOT] > at > org.apache.cassandra.thrift.CassandraServer.multiget_slice(CassandraServer.java:348) > ~[cassandra-all-3.0.0.592.jar:5.0.0-SNAPSHOT] > at > org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Cassandra.java:3716) > ~[cassandra-thrift-3.0.0.592.jar:5.0.0-SNAPSHOT] > at > org.apache.cassandra.thrift.Cassandra$Processor$multiget_slice.getResult(Cassandra.java:3700) > ~[cassandra-thrift-3.0.0.592.jar:5.0.0-SNAPSHOT] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[libthrift-0.9.2.jar:0.9.2] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[libthrift-0.9.2.jar:0.9.2] > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204) > ~[cassandra-all-3.0.0.592.jar:5.0.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45] > {noformat} > The schema used for this was > {noformat} > create table test (k int, v int, primary key(k)) with compact storage; > {noformat} > and the code used for the call was > {noformat} > SlicePredicate predicate = new SlicePredicate(); > predicate.column_names = > Collections.singletonList(ByteBufferUtil.bytes("v")); > client.multiget_slice(Collections.singletonList(key), new > ColumnParent("test"), predicate, ConsistencyLevel.ONE); > {noformat} > The error is coming from this line in {{ClusteringIndexNamesFilter}} > {noformat} > assert !clusterings.contains(Clustering.STATIC_CLUSTERING); > {noformat} > which is failing the assertion because column 'v' is static. > Apologies for the line mismatches in {{ClusteringIndexNamesFilter}} I had > some debug statements in the code to help track down what was happening -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699727#comment-14699727 ] Jonathan Ellis commented on CASSANDRA-7066: --- bq. we can (and probably will) leave incomplete sstables So the scenario is, we crash hard AND suffer xlog corruption so we don't know which sstables are in-progress? I don't think any operator will realistically be able to do anything useful with a xlog file that C* can't read. On the other hand, it could help prove or disprove that it was actual corruption and not a C* bug. So on balance I would lean towards stashing it. (Is offline scrub xlog-aware? It probably should be.) > Simplify (and unify) cleanup of compaction leftovers > > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania >Priority: Minor > Labels: benedict-to-commit, compaction > Fix For: 3.0 alpha 1 > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9414) Windows utest 2.2: org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty intermittent failure
[ https://issues.apache.org/jira/browse/CASSANDRA-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699717#comment-14699717 ] Joshua McKenzie commented on CASSANDRA-9414: Going to hijack this ticket (since it's quite possibly the descendant of the original flaky test failure). Error: {noformat} Error Message java.nio.file.AccessDeniedException: build\test\cassandra\commitlog;69\CommitLog-5-1439816200722.log Stacktrace FSWriteError in build\test\cassandra\commitlog;69\CommitLog-5-1439816200722.log at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:132) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:149) at org.apache.cassandra.db.commitlog.CommitLogSegmentManager.recycleSegment(CommitLogSegmentManager.java:359) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:167) at org.apache.cassandra.db.commitlog.CommitLog.startUnsafe(CommitLog.java:439) at org.apache.cassandra.db.commitlog.CommitLog.resetUnsafe(CommitLog.java:412) at org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty(CommitLogTest.java:186) Caused by: java.nio.file.AccessDeniedException: build\test\cassandra\commitlog;69\CommitLog-5-1439816200722.log at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1126) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:126) {noformat} Consistency: [Flaky|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db/CommitLogTest/testDeleteIfNotDirty/history/] Env: CI only. Cannot repro locally. > Windows utest 2.2: org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty > intermittent failure > -- > > Key: CASSANDRA-9414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9414 > Project: Cassandra > Issue Type: Bug >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Minor > Labels: Windows > Fix For: 2.2.x > > > Failure is intermittent enough that bisect is proving to be more hassle than > it's worth. Seems pretty consistent in CI. > {noformat} > [junit] Testcase: > testDeleteIfNotDirty(org.apache.cassandra.db.CommitLogTest): Caused an > ERROR > [junit] java.nio.file.AccessDeniedException: > build\test\cassandra\commitlog;0\CommitLog-5-1431965988394.log > [junit] FSWriteError in > build\test\cassandra\commitlog;0\CommitLog-5-1431965988394.log > [junit] at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) > [junit] at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:148) > [junit] at > org.apache.cassandra.db.commitlog.CommitLogSegmentManager.recycleSegment(CommitLogSegmentManager.java:360) > [junit] at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:166) > [junit] at > org.apache.cassandra.db.commitlog.CommitLog.startUnsafe(CommitLog.java:416) > [junit] at > org.apache.cassandra.db.commitlog.CommitLog.resetUnsafe(CommitLog.java:389) > [junit] at > org.apache.cassandra.db.CommitLogTest.testDeleteIfNotDirty(CommitLogTest.java:178) > [junit] Caused by: java.nio.file.AccessDeniedException: > build\test\cassandra\commitlog;0\CommitLog-5-1431965988394.log > [junit] at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) > [junit] at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > [junit] at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) > [junit] at > sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) > [junit] at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > [junit] at java.nio.file.Files.delete(Files.java:1126) > [junit] at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10084) Very slow performance streaming a large query from a single CF
[ https://issues.apache.org/jira/browse/CASSANDRA-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699716#comment-14699716 ] Brent Haines commented on CASSANDRA-10084: -- Ah. I was afraid of that. We'd probably create a new table with the desired format, direct our processing to the new table and write a storm topology to migrate data over. A whole new column family should work, right? I'll try to capture the profile today. This is on a large cluster, but if I set the fetch size up high I should be able to keep the query on a single box long enough to capture data. Appreciate the help. If we can mitigate this to a reasonable point, would upgrading to 3.0 fix this? It would be favorable to keep things the way they are, muddle through it and then upgrade when the time comes (how soon is that?) and live happily ever after. ;) > Very slow performance streaming a large query from a single CF > -- > > Key: CASSANDRA-10084 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10084 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.1.8 > 12GB EC2 instance > 12 node cluster > 32 concurrent reads > 32 concurrent writes > 6GB heap space >Reporter: Brent Haines > Attachments: cassandra.yaml > > > We have a relatively simple column family that we use to track event data > from different providers. We have been utilizing it for some time. Here is > what it looks like: > {code} > CREATE TABLE data.stories_by_text ( > ref_id timeuuid, > second_type text, > second_value text, > object_type text, > field_name text, > value text, > story_id timeuuid, > data map, > PRIMARY KEY ((ref_id, second_type, second_value, object_type, > field_name), value, story_id) > ) WITH CLUSTERING ORDER BY (value ASC, story_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = 'Searchable fields and actions in a story are indexed by > ref id which corresponds to a brand, app, app instance, or user.' > AND compaction = {'min_threshold': '4', 'cold_reads_to_omit': '0.0', > 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > {code} > We will, on a daily basis pull a query of the complete data for a given > index, it will look like this: > {code} > select * from stories_by_text where ref_id = > f0124740-2f5a-11e5-a113-03cdf3f3c6dc and second_type = 'Day' and second_value > = '20150812' and object_type = 'booshaka:user' and field_name = 'hashedEmail'; > {code} > In the past, we have been able to pull millions of records out of the CF in a > few seconds. We recently added the data column so that we could filter on > event data and provide more detailed analysis of activity for our reports. > The data map, declared with 'data map' is very small; only 2 or 3 > name/value pairs. > Since we have added this column, our streaming query performance has gone > straight to hell. I just ran the above query and it took 46 minutes to read > 86K rows and then it timed out. > I am uncertain what other data you need to see in order to diagnose this. We > are using STCS and are considering a change to Leveled Compaction. The table > is repaired nightly and the updates, which are at a very fast clip will only > impact the partition key for today, while the queries are for previous days > only. > To my knowledge these queries no longer finish ever. They time out, even > though I put a 60 second timeout on the read for the cluster. I can watch it > pause for 30 to 50 seconds many times during the stream. > Again, this only started happening when we added the data column. > Please let me know what else you need for this. It is having a very big > impact on our system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)