[jira] [Commented] (CASSANDRA-9265) Add checksum to saved cache files
[ https://issues.apache.org/jira/browse/CASSANDRA-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641451#comment-14641451 ] Daniel Chia commented on CASSANDRA-9265: [~aweisberg] I'm interested in taking a stab at this (I've seen developers run into corrupted caches in 2.0 somewhat frequently on their dev boxes), but I'd like some guidance on where do you think we should be saving the checksums. Should we modify the saved cache file format, or store the checksums in a separate file? It seems to me that if we're targeting 3.x, we might as well put the checksum in the same file. Add checksum to saved cache files - Key: CASSANDRA-9265 URL: https://issues.apache.org/jira/browse/CASSANDRA-9265 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Fix For: 3.x Saved caches are not covered by a checksum. We should at least emit a checksum. My suggestion is a large checksum of the whole file (convenient offline validation), and then smaller per record checksums after each record is written (possibly a subset of the incrementally maintained larger checksum). I wouldn't go for anything fancy to try to recover from corruption since it is just a saved cache. If corruption is detected while reading I would just have it bail out. I would rather have less code to review and test in this instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641466#comment-14641466 ] Robert Stupp commented on CASSANDRA-9889: - But having CREATE UNTRUSTED/UNFENCED for Java-UDFs means to bypass the sandbox (security-manager, class/package access control and async-execution (to detect things like _while (true) {}_) - plus CASSANDRA-9890 for Java UDFs). For script languages it is at least a basic level of protection against obvious things. Requiring that permission for script-UDFs would effectively always disable the sandbox for them. What about another option - a CREATE SCRIPTED permission? Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9898) cqlsh crashes if it load a utf-8 file.
[ https://issues.apache.org/jira/browse/CASSANDRA-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuharu Goto updated CASSANDRA-9898: - Attachment: cassandra-2.2-9898.txt cqlsh crashes if it load a utf-8 file. -- Key: CASSANDRA-9898 URL: https://issues.apache.org/jira/browse/CASSANDRA-9898 Project: Cassandra Issue Type: Bug Components: Tools Environment: linux, os x yosemite. Reporter: Yasuharu Goto Assignee: Yasuharu Goto Priority: Minor Attachments: cassandra-2.1-9898.txt, cassandra-2.2-9898.txt cqlsh crashes when it load a cql script file encoded in utf-8. This is a reproduction procedure. {quote} $cat ./test.cql // 日本語のコメント use system; select * from system.peers; $cqlsh --version cqlsh 5.0.1 $cqlsh -f ./test.cql Traceback (most recent call last): File ./cqlsh, line 2459, in module main(*read_options(sys.argv[1:], os.environ)) File ./cqlsh, line 2451, in main shell.cmdloop() File ./cqlsh, line 940, in cmdloop line = self.get_input_line(self.prompt) File ./cqlsh, line 909, in get_input_line self.lastcmd = self.stdin.readline() File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 675, in readline return self.reader.readline(size) File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 530, in readline data = self.read(readsize, firstline=True) File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 477, in read newchars, decodedbytes = self.decode(data, self.errors) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9258) Range movement causes CPU performance impact
[ https://issues.apache.org/jira/browse/CASSANDRA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641714#comment-14641714 ] Benedict commented on CASSANDRA-9258: - This looks like it fell through the cracks a little. Assigning to myself so I can keep an eye on it, and provide a patch once I have some free time. This may not be for a few months, so if somebody else wants to take it sooner, please feel 100% free to assign it to yourself. Range movement causes CPU performance impact -- Key: CASSANDRA-9258 URL: https://issues.apache.org/jira/browse/CASSANDRA-9258 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.1.4 Reporter: Rick Branson Assignee: Benedict Fix For: 2.1.x Observing big CPU latency regressions when doing range movements on clusters with many tens of thousands of vnodes. See CPU usage increase by ~80% when a single node is being replaced. Top methods are: 1) Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/dht/ComparableObjectToken;.compareTo 2) Lcom/google/common/collect/AbstractMapBasedMultimap;.wrapCollection in Lcom/google/common/collect/AbstractMapBasedMultimap$AsMap$AsMapIterator;.next 3) Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Lorg/apache/cassandra/dht/Range;.contains Here's a sample stack from a thread dump: {code} Thrift:50673 daemon prio=10 tid=0x7f2f20164800 nid=0x3a04af runnable [0x7f2d878d] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Range.isWrapAround(Range.java:260) at org.apache.cassandra.dht.Range.contains(Range.java:51) at org.apache.cassandra.dht.Range.contains(Range.java:110) at org.apache.cassandra.locator.TokenMetadata.pendingEndpointsFor(TokenMetadata.java:916) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:775) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:541) at org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:616) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1101) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1083) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:976) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3996) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3980) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9896) Add ability to disable commitlog recycling
[ https://issues.apache.org/jira/browse/CASSANDRA-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641720#comment-14641720 ] Benedict commented on CASSANDRA-9896: - I've pushed a trivial patch [here|https://github.com/belliottsmith/cassandra/tree/9896] This also switches off recycling by default. So, effectively, we've removed it. Just with minimal code changes. Since this affects more than just batch log, this seems to me to be the best course of action. Add ability to disable commitlog recycling -- Key: CASSANDRA-9896 URL: https://issues.apache.org/jira/browse/CASSANDRA-9896 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brandon Williams Assignee: Benedict Fix For: 2.1.x See CASSANDRA-9533 for background, specifically the graphs I linked. Benedict suggests this is due the commitlog recycling and I agree, so the simplest solution is to be able to disable that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9888) BTreeBackedRow and ComplexColumnData
[ https://issues.apache.org/jira/browse/CASSANDRA-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641596#comment-14641596 ] Benedict commented on CASSANDRA-9888: - Updated btree tests to cover new functionality, and fixed a bug with BTree.transformAndFilter. BTreeBackedRow and ComplexColumnData Key: CASSANDRA-9888 URL: https://issues.apache.org/jira/browse/CASSANDRA-9888 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 I found ArrayBackedRow a little hard to follow, especially around building, so I've converted it to BTreeBackedRow, along with ComplexColumnData. Both now rely on BTree.Builder, which introduces a little extra functionality to permit these classes to be implemented more declaratively. Transformations of these classes are also now uniform and more declarative, also depending on some new functionality in BTree that permits applying a transformation/filtration to an existing btree (this could be optimised at a later date, but should suffice for now). The result is IMO both clearer and should scale more gracefully to larger numbers of columns and complex cells. This hasn't taken all of the possible improvements of the back of this change to their natural conclusion, as we are somewhat time pressed and I would prefer to get the ball rolling with this first round. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9258) Range movement causes CPU performance impact
[ https://issues.apache.org/jira/browse/CASSANDRA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict reassigned CASSANDRA-9258: --- Assignee: Benedict Range movement causes CPU performance impact -- Key: CASSANDRA-9258 URL: https://issues.apache.org/jira/browse/CASSANDRA-9258 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.1.4 Reporter: Rick Branson Assignee: Benedict Fix For: 2.1.x Observing big CPU latency regressions when doing range movements on clusters with many tens of thousands of vnodes. See CPU usage increase by ~80% when a single node is being replaced. Top methods are: 1) Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/dht/ComparableObjectToken;.compareTo 2) Lcom/google/common/collect/AbstractMapBasedMultimap;.wrapCollection in Lcom/google/common/collect/AbstractMapBasedMultimap$AsMap$AsMapIterator;.next 3) Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Lorg/apache/cassandra/dht/Range;.contains Here's a sample stack from a thread dump: {code} Thrift:50673 daemon prio=10 tid=0x7f2f20164800 nid=0x3a04af runnable [0x7f2d878d] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Range.isWrapAround(Range.java:260) at org.apache.cassandra.dht.Range.contains(Range.java:51) at org.apache.cassandra.dht.Range.contains(Range.java:110) at org.apache.cassandra.locator.TokenMetadata.pendingEndpointsFor(TokenMetadata.java:916) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:775) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:541) at org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:616) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1101) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1083) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:976) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3996) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3980) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9900) Anticompaction can mix old and new data with DTCS in 2.2+
Marcus Eriksson created CASSANDRA-9900: -- Summary: Anticompaction can mix old and new data with DTCS in 2.2+ Key: CASSANDRA-9900 URL: https://issues.apache.org/jira/browse/CASSANDRA-9900 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.2.x Attachments: 0001-avoid-mixing-new-and-old-data-in-anticompaction-with.patch since CASSANDRA-6851 we group sstables before running anticompaction on them to avoid increasing sstable count. We should not do this for DTCS as it can mix new and old data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9898) cqlsh crashes if it load a utf-8 file.
[ https://issues.apache.org/jira/browse/CASSANDRA-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-9898: -- Assignee: Yasuharu Goto (was: Yuki Morishita) cqlsh crashes if it load a utf-8 file. -- Key: CASSANDRA-9898 URL: https://issues.apache.org/jira/browse/CASSANDRA-9898 Project: Cassandra Issue Type: Bug Components: Tools Environment: linux, os x yosemite. Reporter: Yasuharu Goto Assignee: Yasuharu Goto Priority: Minor Attachments: cassandra-2.1-9898.txt cqlsh crashes when it load a cql script file encoded in utf-8. This is a reproduction procedure. {quote} $cat ./test.cql // 日本語のコメント use system; select * from system.peers; $cqlsh --version cqlsh 5.0.1 $cqlsh -f ./test.cql Traceback (most recent call last): File ./cqlsh, line 2459, in module main(*read_options(sys.argv[1:], os.environ)) File ./cqlsh, line 2451, in main shell.cmdloop() File ./cqlsh, line 940, in cmdloop line = self.get_input_line(self.prompt) File ./cqlsh, line 909, in get_input_line self.lastcmd = self.stdin.readline() File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 675, in readline return self.reader.readline(size) File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 530, in readline data = self.read(readsize, firstline=True) File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 477, in read newchars, decodedbytes = self.decode(data, self.errors) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9259) Bulk Reading from Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641721#comment-14641721 ] Benedict commented on CASSANDRA-9259: - FTR, I very much favour the streaming compaction approach. Compaction should be just about our most optimised code path. If we cannot make it fast enough, nothing will be. If it isn't currently fast enough, we should make it faster. CASSANDRA-8630 and CASSANDRA-9500 are both related. Bulk Reading from Cassandra --- Key: CASSANDRA-9259 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Hess Assignee: Ariel Weisberg This ticket is following on from the 2015 NGCC. This ticket is designed to be a place for discussing and designing an approach to bulk reading. The goal is to have a bulk reading path for Cassandra. That is, a path optimized to grab a large portion of the data for a table (potentially all of it). This is a core element in the Spark integration with Cassandra, and the speed at which Cassandra can deliver bulk data to Spark is limiting the performance of Spark-plus-Cassandra operations. This is especially of importance as Cassandra will (likely) leverage Spark for internal operations (for example CASSANDRA-8234). The core CQL to consider is the following: SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) X AND Token(partitionKey) = Y Here, we choose X and Y to be contained within one token range (perhaps considering the primary range of a node without vnodes, for example). This query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk operations via Spark (or other processing frameworks - ETL, etc). There are a few causes (e.g., inefficient paging). There are a few approaches that could be considered. First, we consider a new Streaming Compaction approach. The key observation here is that a bulk read from Cassandra is a lot like a major compaction, though instead of outputting a new SSTable we would output CQL rows to a stream/socket/etc. This would be similar to a CompactionTask, but would strip out some unnecessary things in there (e.g., some of the indexing, etc). Predicates and projections could also be encapsulated in this new StreamingCompactionTask, for example. Another approach would be an alternate storage format. For example, we might employ Parquet (just as an example) to store the same data as in the primary Cassandra storage (aka SSTables). This is akin to Global Indexes (an alternate storage of the same data optimized for a particular query). Then, Cassandra can choose to leverage this alternate storage for particular CQL queries (e.g., range scans). These are just 2 suggestions to get the conversation going. One thing to note is that it will be useful to have this storage segregated by token range so that when you extract via these mechanisms you do not get replications-factor numbers of copies of the data. That will certainly be an issue for some Spark operations (e.g., counting). Thus, we will want per-token-range storage (even for single disks), so this will likely leverage CASSANDRA-6696 (though, we'll want to also consider the single disk case). It is also worth discussing what the success criteria is here. It is unlikely to be as fast as EDW or HDFS performance (though, that is still a good goal), but being within some percentage of that performance should be set as success. For example, 2x as long as doing bulk operations on HDFS with similar node count/size/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract
[ https://issues.apache.org/jira/browse/CASSANDRA-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641775#comment-14641775 ] Aleksey Yeschenko commented on CASSANDRA-9901: -- Yep, that was the agreement. Log a warning if a custom non-boc type is used as a clustering column. Make AbstractType.isByteOrderComparable abstract Key: CASSANDRA-9901 URL: https://issues.apache.org/jira/browse/CASSANDRA-9901 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0.0 rc1 I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably sure we agreed to make this method abstract, put some javadoc explaining we may require fields to yield true in the near future, and potentially log a warning on startup if a user-defined type returns false. This should make it into 3.0, IMO, so that we can look into migrating to byte-order comparable types in the post-3.0 world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641771#comment-14641771 ] Jonathan Ellis commented on CASSANDRA-9889: --- bq. Requiring that permission for script-UDFs would effectively always disable the sandbox for them. Well, it's acknowledging reality, which is that if you allow users to create scripted UDFs then you need to trust them not to do something dumb. Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9896) Add ability to disable commitlog recycling
[ https://issues.apache.org/jira/browse/CASSANDRA-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641498#comment-14641498 ] Benedict commented on CASSANDRA-9896: - That work around is fine, but necessitates huge segments else you'll incur more flushes (and as a result eventually more compaction). Sounds like we're all on the same page though, since it is a pretty trivial feature to add. Add ability to disable commitlog recycling -- Key: CASSANDRA-9896 URL: https://issues.apache.org/jira/browse/CASSANDRA-9896 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brandon Williams Assignee: Benedict Fix For: 2.1.x See CASSANDRA-9533 for background, specifically the graphs I linked. Benedict suggests this is due the commitlog recycling and I agree, so the simplest solution is to be able to disable that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9720) half open tcp connections to cassandra cluster nodes cause 100% cpu load
[ https://issues.apache.org/jira/browse/CASSANDRA-9720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641499#comment-14641499 ] Benedict commented on CASSANDRA-9720: - [~piavlo]: any news on this? I'm convinced it's a genuine bug, so if you could help with some follow up information we can get it squashed. half open tcp connections to cassandra cluster nodes cause 100% cpu load Key: CASSANDRA-9720 URL: https://issues.apache.org/jira/browse/CASSANDRA-9720 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alexander Piavlo Assignee: Benedict cassandra 2.1.5 We spotted that few of the nodes in our cluster got sudden cpu 100% spike which never ended. It's not a GC and not increased reads/writes nodes. What we saw is that those nodes that have 100% cpu load all have some connections (file descriptios) with can't identify protocol which indicate those must be unprolery handled abrupt connections by cassandra process. http://stackoverflow.com/questions/7911840/seeing-too-many-lsof-cant-identify-protocol We are pretty sure what triggered this is the spark cassandra connector which sudenly started to get stuck in early discovery of cassandra nodes before running any stages We had to restart the affected cassandra processes to get the cpu back to normal ps. we had similar issues some time ago with earlier version, of 2.1.x cassandra branch, and ended up solving the problerm by upgrading from spark1.2.1 to spark1.3.1 and also upgrading spark datastax connecor accordingly. Now looks like the problem is back with 99.9% same symptoms ps2. We have observed previously several java/cassandra unrelated processes (mainly in php-cli) go crazy with cpu then they had can't identify protocol symphoms -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9898) cqlsh crashes if it load a utf-8 file.
[ https://issues.apache.org/jira/browse/CASSANDRA-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuharu Goto updated CASSANDRA-9898: - Assignee: Yuki Morishita cqlsh crashes if it load a utf-8 file. -- Key: CASSANDRA-9898 URL: https://issues.apache.org/jira/browse/CASSANDRA-9898 Project: Cassandra Issue Type: Bug Components: Tools Environment: linux, os x yosemite. Reporter: Yasuharu Goto Assignee: Yuki Morishita Priority: Minor Attachments: cassandra-2.1-9898.txt cqlsh crashes when it load a cql script file encoded in utf-8. This is a reproduction procedure. {quote} $cat ./test.cql // 日本語のコメント use system; select * from system.peers; $cqlsh --version cqlsh 5.0.1 $cqlsh -f ./test.cql Traceback (most recent call last): File ./cqlsh, line 2459, in module main(*read_options(sys.argv[1:], os.environ)) File ./cqlsh, line 2451, in main shell.cmdloop() File ./cqlsh, line 940, in cmdloop line = self.get_input_line(self.prompt) File ./cqlsh, line 909, in get_input_line self.lastcmd = self.stdin.readline() File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 675, in readline return self.reader.readline(size) File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 530, in readline data = self.read(readsize, firstline=True) File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py, line 477, in read newchars, decodedbytes = self.decode(data, self.errors) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign
[ https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641503#comment-14641503 ] Sam Tunnicliffe commented on CASSANDRA-9459: I've pushed a branch [here|https://github.com/beobal/cassandra/tree/9459-wip] with some of the proposed api changes for this ticket. This is a fairly large patch, so I'll try to summarise the main changes below, but the key places to look at are in the {{org.apache.cassandra.index}} package, in particular. * {{o.a.c.index.Index}} * {{o.a.c.index.SecondaryIndexManager}} * {{o.a.c.index.internal.CassandraIndexer}} This patch is most definitely a work in progress, but I'd appreciate some feedback, especially on the general approach and high level API changes. [~sbtourist], [~adelapena] [~xedin] in particular, I know you are likely have opinions on this, which would be good to hear. h3. Flattened class hierarchy Instead of: {noformat} SecondaryIndex ___| || PerRowSecondaryIndex PerColumnSecondaryIndex | AbstractSimplePerColumnSecondaryIndex ___|___ | | KeysIndexCompositesIndex |___ || | CompositesIndexOnX CompositesIndexOnY CompositesIndexOnZ {noformat} We just have a single {{Index}} interface, with 2 inner interfaces {{Indexer}} and {{Searcher}}. The specific differences between indexes on different types of columns in composite tables (i.e. all the {{CompositesIndexOnX}} implementations) have been abstracted into a set of stateless functions, defined in the {{ColumnIndexFunctions}} interface with implementations for use with the various column types. As such, there is now just single {{Index}} implementation for all built-in indexes, {{CassandraIndex}} (I'm not sold on this name, but it follows precedent set by {{CassandraAuthorizer}} and {{CassandraRoleManager}}). A nice side effect is that {{KEYS}} indexes (for thrift/compact tables and, in CASSANDRA-8103, static column indexes) also fit into this pattern, so no need for another specialisation there. There are still separate searcher implementations for {{KEYS}} and {{COMPOSITES}} indexes, but there's a lot more commonality between them now (not as a result of this patch, that's an artifact of CASSANDRA-8099). h3. Event driven, partition scoped updates Instead of delivering updates to an index implementation per-partition (as previously with PRSI) or per-cell (PSCI), the write component of the index api is more closely aligned to a partition update of the underlying base data. More specifically, when a partition is updated (either via a regular write, or during compaction) a series of events are (or may be) fired. An {{Index}} implementation is required to provide an event listener, whose interface is defined in {{Index.Indexer}}, to handle these events. The granularity of these events maps to a PartitionUpdate, so there are events that are fired on * partition delete * range tombstone * row inserted * row updated * row removed h3. Caveats/Missing/TBD/etc * A major thing missing in this branch is CASSANDRA-7771 (multiple indexes per-column). Along with that, the plan is also to introduce true per-row indexes, where the index is not necessarily linked to *any* specific column. So until we start hashing that out a bit better, the way SIM represents the collection of Indexes is tbd. * Related to that, once we've settled on how to define an Index's relationship with a Row (moving that out of ColumnDefinition), we can revisit caching lookup optimisation in SIM. Right now, every time we look up an index we do and filter of all the registered indexes for the table. We can definitely improve this and will do so ASAP. * The mechanism by which we select indexes at query time remains pretty restrictive. The query clauses being represented as a list of {{RowFilter.Expression}} means only AND conjunctions are supported. This limits the scope for query optimisation and makes it difficult to extend search capabilities in the future, like adding support for OR for example. I'd like to move to something more expressive to give us scope to improve this area in future tickets. * The validation methods on Index need some work. Basically these were simply copied from the existing implementation, but they ought to be reworked to combine them into a single {{validate(partition_update)}} or at least into {{validate(partitionkey)}} and {{validate(row)}}. * The index transaction classes in
[jira] [Created] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract
Benedict created CASSANDRA-9901: --- Summary: Make AbstractType.isByteOrderComparable abstract Key: CASSANDRA-9901 URL: https://issues.apache.org/jira/browse/CASSANDRA-9901 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0.0 rc1 I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably sure we agreed to make this method abstract, put some javadoc explaining we may require fields to yield true in the near future, and potentially log a warning on startup if a user-defined type returns false. This should make it into 3.0, IMO, so that we can look into migrating to byte-order comparable types in the post-3.0 world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)