[jira] [Commented] (CASSANDRA-9265) Add checksum to saved cache files

2015-07-25 Thread Daniel Chia (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641451#comment-14641451
 ] 

Daniel Chia commented on CASSANDRA-9265:


[~aweisberg] I'm interested in taking a stab at this (I've seen developers run 
into corrupted caches in 2.0 somewhat frequently on their dev boxes), but I'd 
like some guidance on where do you think we should be saving the checksums. 
Should we modify the saved cache file format, or store the checksums in a 
separate file?

It seems to me that if we're targeting 3.x, we might as well put the checksum 
in the same file.

 Add checksum to saved cache files
 -

 Key: CASSANDRA-9265
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9265
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
 Fix For: 3.x


 Saved caches are not covered by a checksum. We should at least emit a 
 checksum. My suggestion is a large checksum of the whole file (convenient 
 offline validation), and then smaller per record checksums after each record 
 is written (possibly a subset of the incrementally maintained larger 
 checksum).
 I wouldn't go for anything fancy to try to recover from corruption since it 
 is just a saved cache. If corruption is detected while reading I would just 
 have it bail out. I would rather have less code to review and test in this 
 instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default

2015-07-25 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641466#comment-14641466
 ] 

Robert Stupp commented on CASSANDRA-9889:
-

But having CREATE UNTRUSTED/UNFENCED for Java-UDFs means to bypass the sandbox 
(security-manager, class/package access control and async-execution  (to detect 
things like _while (true) {}_) - plus CASSANDRA-9890 for Java UDFs).
For script languages it is at least a basic level of protection against obvious 
things. Requiring that permission for script-UDFs would effectively always 
disable the sandbox for them.

What about another option - a CREATE SCRIPTED permission?

 Disable scripted UDFs by default
 

 Key: CASSANDRA-9889
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889
 Project: Cassandra
  Issue Type: Improvement
Reporter: Robert Stupp
Assignee: Robert Stupp
Priority: Minor
 Fix For: 3.0.0 rc1


 (Follow-up to CASSANDRA-9402)
 TL;DR this ticket is about to add an other config option to enable scripted 
 UDFs.
 Securing Java-UDFs is much easier than scripted UDFs.
 The secure execution of scripted UDFs heavily relies on how secure a 
 particular script provider implementation is. Nashorn is probably pretty good 
 at this - but (as discussed offline with [~iamaleksey]) we are not certain. 
 This becomes worse with other JSR-223 providers (which need to be installed 
 by the user anyway).
 E.g.:
 {noformat}
 # Enables use of scripted UDFs.
 # Java UDFs are always enabled, if enable_user_defined_functions is true.
 # Enable this option to be able to use UDFs with language javascript or any 
 custom JSR-223 provider.
 enable_scripted_user_defined_functions: false
 {noformat}
 TBH: I would feel more comfortable to have this one. But we should review 
 this along with enable_user_defined_functions for 4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9898) cqlsh crashes if it load a utf-8 file.

2015-07-25 Thread Yasuharu Goto (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuharu Goto updated CASSANDRA-9898:
-
Attachment: cassandra-2.2-9898.txt

 cqlsh crashes if it load a utf-8 file.
 --

 Key: CASSANDRA-9898
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9898
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: linux, os x yosemite.
Reporter: Yasuharu Goto
Assignee: Yasuharu Goto
Priority: Minor
 Attachments: cassandra-2.1-9898.txt, cassandra-2.2-9898.txt


 cqlsh crashes when it load a cql script file encoded in utf-8.
 This is a reproduction procedure.
 {quote}
 $cat ./test.cql
 // 日本語のコメント
 use system;
 select * from system.peers;
 $cqlsh --version
 cqlsh 5.0.1
 $cqlsh -f ./test.cql
 Traceback (most recent call last):
   File ./cqlsh, line 2459, in module
 main(*read_options(sys.argv[1:], os.environ))
   File ./cqlsh, line 2451, in main
 shell.cmdloop()
   File ./cqlsh, line 940, in cmdloop
 line = self.get_input_line(self.prompt)
   File ./cqlsh, line 909, in get_input_line
 self.lastcmd = self.stdin.readline()
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 675, in readline
 return self.reader.readline(size)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 530, in readline
 data = self.read(readsize, firstline=True)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 477, in read
 newchars, decodedbytes = self.decode(data, self.errors)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9258) Range movement causes CPU performance impact

2015-07-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641714#comment-14641714
 ] 

Benedict commented on CASSANDRA-9258:
-

This looks like it fell through the cracks a little. Assigning to myself so I 
can keep an eye on it, and provide a patch once I have some free time. This may 
not be for a few months, so if somebody else wants to take it sooner, please 
feel 100% free to assign it to yourself.

 Range movement causes CPU  performance impact
 --

 Key: CASSANDRA-9258
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9258
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 2.1.4
Reporter: Rick Branson
Assignee: Benedict
 Fix For: 2.1.x


 Observing big CPU  latency regressions when doing range movements on 
 clusters with many tens of thousands of vnodes. See CPU usage increase by 
 ~80% when a single node is being replaced.
 Top methods are:
 1) Ljava/math/BigInteger;.compareTo in 
 Lorg/apache/cassandra/dht/ComparableObjectToken;.compareTo 
 2) Lcom/google/common/collect/AbstractMapBasedMultimap;.wrapCollection in 
 Lcom/google/common/collect/AbstractMapBasedMultimap$AsMap$AsMapIterator;.next
 3) Lorg/apache/cassandra/db/DecoratedKey;.compareTo in 
 Lorg/apache/cassandra/dht/Range;.contains
 Here's a sample stack from a thread dump:
 {code}
 Thrift:50673 daemon prio=10 tid=0x7f2f20164800 nid=0x3a04af runnable 
 [0x7f2d878d]
java.lang.Thread.State: RUNNABLE
   at org.apache.cassandra.dht.Range.isWrapAround(Range.java:260)
   at org.apache.cassandra.dht.Range.contains(Range.java:51)
   at org.apache.cassandra.dht.Range.contains(Range.java:110)
   at 
 org.apache.cassandra.locator.TokenMetadata.pendingEndpointsFor(TokenMetadata.java:916)
   at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:775)
   at 
 org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:541)
   at 
 org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:616)
   at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1101)
   at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1083)
   at 
 org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:976)
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3996)
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3980)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9896) Add ability to disable commitlog recycling

2015-07-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641720#comment-14641720
 ] 

Benedict commented on CASSANDRA-9896:
-

I've pushed a trivial patch 
[here|https://github.com/belliottsmith/cassandra/tree/9896]

This also switches off recycling by default. So, effectively, we've removed it. 
Just with minimal code changes.

Since this affects more than just batch log, this seems to me to be the best 
course of action.

 Add ability to disable commitlog recycling
 --

 Key: CASSANDRA-9896
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9896
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Benedict
 Fix For: 2.1.x


 See CASSANDRA-9533 for background, specifically the graphs I linked.  
 Benedict suggests this is due the commitlog recycling and I agree, so the 
 simplest solution is to be able to disable that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9888) BTreeBackedRow and ComplexColumnData

2015-07-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641596#comment-14641596
 ] 

Benedict commented on CASSANDRA-9888:
-

Updated btree tests to cover new functionality, and fixed a bug with 
BTree.transformAndFilter.

 BTreeBackedRow and ComplexColumnData
 

 Key: CASSANDRA-9888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9888
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0 beta 1


 I found ArrayBackedRow a little hard to follow, especially around building, 
 so I've converted it to BTreeBackedRow, along with ComplexColumnData. Both 
 now rely on BTree.Builder, which introduces a little extra functionality to 
 permit these classes to be implemented more declaratively. Transformations of 
 these classes are also now uniform and more declarative, also depending on 
 some new functionality in BTree that permits applying a 
 transformation/filtration to an existing btree (this could be optimised at a 
 later date, but should suffice for now).
 The result is IMO both clearer and should scale more gracefully to larger 
 numbers of columns and complex cells.
 This hasn't taken all of the possible improvements of the back of this change 
 to their natural conclusion, as we are somewhat time pressed and I would 
 prefer to get the ball rolling with this first round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-9258) Range movement causes CPU performance impact

2015-07-25 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-9258:
---

Assignee: Benedict

 Range movement causes CPU  performance impact
 --

 Key: CASSANDRA-9258
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9258
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 2.1.4
Reporter: Rick Branson
Assignee: Benedict
 Fix For: 2.1.x


 Observing big CPU  latency regressions when doing range movements on 
 clusters with many tens of thousands of vnodes. See CPU usage increase by 
 ~80% when a single node is being replaced.
 Top methods are:
 1) Ljava/math/BigInteger;.compareTo in 
 Lorg/apache/cassandra/dht/ComparableObjectToken;.compareTo 
 2) Lcom/google/common/collect/AbstractMapBasedMultimap;.wrapCollection in 
 Lcom/google/common/collect/AbstractMapBasedMultimap$AsMap$AsMapIterator;.next
 3) Lorg/apache/cassandra/db/DecoratedKey;.compareTo in 
 Lorg/apache/cassandra/dht/Range;.contains
 Here's a sample stack from a thread dump:
 {code}
 Thrift:50673 daemon prio=10 tid=0x7f2f20164800 nid=0x3a04af runnable 
 [0x7f2d878d]
java.lang.Thread.State: RUNNABLE
   at org.apache.cassandra.dht.Range.isWrapAround(Range.java:260)
   at org.apache.cassandra.dht.Range.contains(Range.java:51)
   at org.apache.cassandra.dht.Range.contains(Range.java:110)
   at 
 org.apache.cassandra.locator.TokenMetadata.pendingEndpointsFor(TokenMetadata.java:916)
   at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:775)
   at 
 org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:541)
   at 
 org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:616)
   at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1101)
   at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1083)
   at 
 org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:976)
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3996)
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3980)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9900) Anticompaction can mix old and new data with DTCS in 2.2+

2015-07-25 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-9900:
--

 Summary: Anticompaction can mix old and new data with DTCS in 2.2+
 Key: CASSANDRA-9900
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9900
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.2.x
 Attachments: 
0001-avoid-mixing-new-and-old-data-in-anticompaction-with.patch

since CASSANDRA-6851 we group sstables before running anticompaction on them to 
avoid increasing sstable count.

We should not do this for DTCS as it can mix new and old data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9898) cqlsh crashes if it load a utf-8 file.

2015-07-25 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-9898:
--
Assignee: Yasuharu Goto  (was: Yuki Morishita)

 cqlsh crashes if it load a utf-8 file.
 --

 Key: CASSANDRA-9898
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9898
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: linux, os x yosemite.
Reporter: Yasuharu Goto
Assignee: Yasuharu Goto
Priority: Minor
 Attachments: cassandra-2.1-9898.txt


 cqlsh crashes when it load a cql script file encoded in utf-8.
 This is a reproduction procedure.
 {quote}
 $cat ./test.cql
 // 日本語のコメント
 use system;
 select * from system.peers;
 $cqlsh --version
 cqlsh 5.0.1
 $cqlsh -f ./test.cql
 Traceback (most recent call last):
   File ./cqlsh, line 2459, in module
 main(*read_options(sys.argv[1:], os.environ))
   File ./cqlsh, line 2451, in main
 shell.cmdloop()
   File ./cqlsh, line 940, in cmdloop
 line = self.get_input_line(self.prompt)
   File ./cqlsh, line 909, in get_input_line
 self.lastcmd = self.stdin.readline()
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 675, in readline
 return self.reader.readline(size)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 530, in readline
 data = self.read(readsize, firstline=True)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 477, in read
 newchars, decodedbytes = self.decode(data, self.errors)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9259) Bulk Reading from Cassandra

2015-07-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641721#comment-14641721
 ] 

Benedict commented on CASSANDRA-9259:
-

FTR, I very much favour the streaming compaction approach. Compaction should be 
just about our most optimised code path. If we cannot make it fast enough, 
nothing will be. If it isn't currently fast enough, we should make it faster.

CASSANDRA-8630 and CASSANDRA-9500 are both related.

 Bulk Reading from Cassandra
 ---

 Key: CASSANDRA-9259
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter:  Brian Hess
Assignee: Ariel Weisberg

 This ticket is following on from the 2015 NGCC.  This ticket is designed to 
 be a place for discussing and designing an approach to bulk reading.
 The goal is to have a bulk reading path for Cassandra.  That is, a path 
 optimized to grab a large portion of the data for a table (potentially all of 
 it).  This is a core element in the Spark integration with Cassandra, and the 
 speed at which Cassandra can deliver bulk data to Spark is limiting the 
 performance of Spark-plus-Cassandra operations.  This is especially of 
 importance as Cassandra will (likely) leverage Spark for internal operations 
 (for example CASSANDRA-8234).
 The core CQL to consider is the following:
 SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey)  X AND 
 Token(partitionKey) = Y
 Here, we choose X and Y to be contained within one token range (perhaps 
 considering the primary range of a node without vnodes, for example).  This 
 query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk 
 operations via Spark (or other processing frameworks - ETL, etc).  There are 
 a few causes (e.g., inefficient paging).
 There are a few approaches that could be considered.  First, we consider a 
 new Streaming Compaction approach.  The key observation here is that a bulk 
 read from Cassandra is a lot like a major compaction, though instead of 
 outputting a new SSTable we would output CQL rows to a stream/socket/etc.  
 This would be similar to a CompactionTask, but would strip out some 
 unnecessary things in there (e.g., some of the indexing, etc). Predicates and 
 projections could also be encapsulated in this new StreamingCompactionTask, 
 for example.
 Another approach would be an alternate storage format.  For example, we might 
 employ Parquet (just as an example) to store the same data as in the primary 
 Cassandra storage (aka SSTables).  This is akin to Global Indexes (an 
 alternate storage of the same data optimized for a particular query).  Then, 
 Cassandra can choose to leverage this alternate storage for particular CQL 
 queries (e.g., range scans).
 These are just 2 suggestions to get the conversation going.
 One thing to note is that it will be useful to have this storage segregated 
 by token range so that when you extract via these mechanisms you do not get 
 replications-factor numbers of copies of the data.  That will certainly be an 
 issue for some Spark operations (e.g., counting).  Thus, we will want 
 per-token-range storage (even for single disks), so this will likely leverage 
 CASSANDRA-6696 (though, we'll want to also consider the single disk case).
 It is also worth discussing what the success criteria is here.  It is 
 unlikely to be as fast as EDW or HDFS performance (though, that is still a 
 good goal), but being within some percentage of that performance should be 
 set as success.  For example, 2x as long as doing bulk operations on HDFS 
 with similar node count/size/etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract

2015-07-25 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641775#comment-14641775
 ] 

Aleksey Yeschenko commented on CASSANDRA-9901:
--

Yep, that was the agreement. Log a warning if a custom non-boc type is used as 
a clustering column.

 Make AbstractType.isByteOrderComparable abstract
 

 Key: CASSANDRA-9901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0.0 rc1


 I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably 
 sure we agreed to make this method abstract, put some javadoc explaining we 
 may require fields to yield true in the near future, and potentially log a 
 warning on startup if a user-defined type returns false.
 This should make it into 3.0, IMO, so that we can look into migrating to 
 byte-order comparable types in the post-3.0 world.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default

2015-07-25 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641771#comment-14641771
 ] 

Jonathan Ellis commented on CASSANDRA-9889:
---

bq. Requiring that permission for script-UDFs would effectively always disable 
the sandbox for them.

Well, it's acknowledging reality, which is that if you allow users to create 
scripted UDFs then you need to trust them not to do something dumb.

 Disable scripted UDFs by default
 

 Key: CASSANDRA-9889
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889
 Project: Cassandra
  Issue Type: Improvement
Reporter: Robert Stupp
Assignee: Robert Stupp
Priority: Minor
 Fix For: 3.0.0 rc1


 (Follow-up to CASSANDRA-9402)
 TL;DR this ticket is about to add an other config option to enable scripted 
 UDFs.
 Securing Java-UDFs is much easier than scripted UDFs.
 The secure execution of scripted UDFs heavily relies on how secure a 
 particular script provider implementation is. Nashorn is probably pretty good 
 at this - but (as discussed offline with [~iamaleksey]) we are not certain. 
 This becomes worse with other JSR-223 providers (which need to be installed 
 by the user anyway).
 E.g.:
 {noformat}
 # Enables use of scripted UDFs.
 # Java UDFs are always enabled, if enable_user_defined_functions is true.
 # Enable this option to be able to use UDFs with language javascript or any 
 custom JSR-223 provider.
 enable_scripted_user_defined_functions: false
 {noformat}
 TBH: I would feel more comfortable to have this one. But we should review 
 this along with enable_user_defined_functions for 4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9896) Add ability to disable commitlog recycling

2015-07-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641498#comment-14641498
 ] 

Benedict commented on CASSANDRA-9896:
-

That work around is fine, but necessitates huge segments else you'll incur more 
flushes (and as a result eventually more compaction). Sounds like we're all on 
the same page though, since it is a pretty trivial feature to add.

 Add ability to disable commitlog recycling
 --

 Key: CASSANDRA-9896
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9896
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Benedict
 Fix For: 2.1.x


 See CASSANDRA-9533 for background, specifically the graphs I linked.  
 Benedict suggests this is due the commitlog recycling and I agree, so the 
 simplest solution is to be able to disable that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9720) half open tcp connections to cassandra cluster nodes cause 100% cpu load

2015-07-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641499#comment-14641499
 ] 

Benedict commented on CASSANDRA-9720:
-

[~piavlo]: any news on this? I'm convinced it's a genuine bug, so if you could 
help with some follow up information we can get it squashed.

 half open tcp connections to cassandra cluster nodes cause 100% cpu load
 

 Key: CASSANDRA-9720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9720
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Alexander Piavlo
Assignee: Benedict

 cassandra 2.1.5
 We spotted that few of the nodes in our cluster got sudden cpu 100% spike 
 which never ended. It's not a GC and not increased reads/writes nodes.
 What we saw is that those nodes that have 100% cpu load
 all have some connections (file descriptios) with can't identify protocol
 which indicate those must be unprolery handled abrupt connections by 
 cassandra process.
 http://stackoverflow.com/questions/7911840/seeing-too-many-lsof-cant-identify-protocol
 We are pretty sure what triggered this is the spark cassandra connector
 which sudenly started to get stuck in early discovery of cassandra nodes 
 before running any stages
 We had to restart the affected cassandra processes to get the cpu back to 
 normal
 ps. we had similar issues some time ago with earlier version, of 2.1.x 
 cassandra branch, and ended up solving the problerm by upgrading from 
 spark1.2.1 to spark1.3.1 and also upgrading spark datastax connecor 
 accordingly. Now looks like the problem is back with 99.9% same symptoms
 ps2. We have observed previously several java/cassandra unrelated processes 
 (mainly in php-cli) go crazy with cpu then they had can't identify protocol 
 symphoms



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9898) cqlsh crashes if it load a utf-8 file.

2015-07-25 Thread Yasuharu Goto (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuharu Goto updated CASSANDRA-9898:
-
Assignee: Yuki Morishita

 cqlsh crashes if it load a utf-8 file.
 --

 Key: CASSANDRA-9898
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9898
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: linux, os x yosemite.
Reporter: Yasuharu Goto
Assignee: Yuki Morishita
Priority: Minor
 Attachments: cassandra-2.1-9898.txt


 cqlsh crashes when it load a cql script file encoded in utf-8.
 This is a reproduction procedure.
 {quote}
 $cat ./test.cql
 // 日本語のコメント
 use system;
 select * from system.peers;
 $cqlsh --version
 cqlsh 5.0.1
 $cqlsh -f ./test.cql
 Traceback (most recent call last):
   File ./cqlsh, line 2459, in module
 main(*read_options(sys.argv[1:], os.environ))
   File ./cqlsh, line 2451, in main
 shell.cmdloop()
   File ./cqlsh, line 940, in cmdloop
 line = self.get_input_line(self.prompt)
   File ./cqlsh, line 909, in get_input_line
 self.lastcmd = self.stdin.readline()
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 675, in readline
 return self.reader.readline(size)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 530, in readline
 data = self.read(readsize, firstline=True)
   File 
 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py,
  line 477, in read
 newchars, decodedbytes = self.decode(data, self.errors)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign

2015-07-25 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641503#comment-14641503
 ] 

Sam Tunnicliffe commented on CASSANDRA-9459:



I've pushed a branch [here|https://github.com/beobal/cassandra/tree/9459-wip] 
with some of the proposed api changes for this ticket. 
This is a fairly large patch, so I'll try to summarise the main changes below, 
but the key places to look at are in the {{org.apache.cassandra.index}} 
package, in particular.

* {{o.a.c.index.Index}}
* {{o.a.c.index.SecondaryIndexManager}}
* {{o.a.c.index.internal.CassandraIndexer}}

This patch is most definitely a work in progress, but I'd appreciate some 
feedback, especially on the general approach and high level API changes. 
[~sbtourist], [~adelapena]  [~xedin] in particular, I know you are likely have 
opinions on this, which would be good to hear.


h3. Flattened class hierarchy

Instead of:

{noformat}
SecondaryIndex
   ___|
  ||
  PerRowSecondaryIndex   PerColumnSecondaryIndex
   |
  AbstractSimplePerColumnSecondaryIndex
___|___
   |   | 
KeysIndexCompositesIndex
   |___
  ||   |
   CompositesIndexOnX   CompositesIndexOnY  
CompositesIndexOnZ 
{noformat}

We just have a single {{Index}} interface, with 2 inner interfaces {{Indexer}} 
and {{Searcher}}.
The specific differences between indexes on different types of columns in 
composite tables (i.e. all the {{CompositesIndexOnX}} implementations) have 
been abstracted into a set of stateless functions, defined in the 
{{ColumnIndexFunctions}} interface  with implementations for use with the 
various column types. As such, there is now just single {{Index}} 
implementation for all built-in indexes, {{CassandraIndex}} (I'm not sold on 
this name, but it follows precedent set by {{CassandraAuthorizer}} and 
{{CassandraRoleManager}}). 
A nice side effect is that {{KEYS}} indexes (for thrift/compact tables and, in 
CASSANDRA-8103, static column indexes) also fit into this pattern, so no need 
for another specialisation there. There are still separate searcher 
implementations for {{KEYS}} and {{COMPOSITES}} indexes, but there's a lot more 
commonality between them now (not as a result of this patch, that's an artifact 
of CASSANDRA-8099).

h3. Event driven, partition scoped updates

Instead of delivering updates to an index implementation per-partition (as 
previously with PRSI) or per-cell (PSCI), the write component of the index api 
is more closely aligned to a partition update of the underlying base data.

More specifically, when a partition is updated (either via a regular write, or 
during compaction) a series of events are (or may be) fired. An {{Index}} 
implementation is required to provide an event listener, whose interface is 
defined in {{Index.Indexer}}, to handle these events. The granularity of these 
events maps to a PartitionUpdate, so there are events that are fired on 
* partition delete
* range tombstone
* row inserted
* row updated
* row removed 

h3. Caveats/Missing/TBD/etc

* A major thing missing in this branch is CASSANDRA-7771 (multiple indexes 
per-column). Along with that, the plan is also to introduce true per-row 
indexes, where the index is not necessarily linked to *any* specific column. So 
until we start hashing that out a bit better, the way SIM represents the 
collection of Indexes is tbd.
* Related to that, once we've settled on how to define an Index's relationship 
with a Row (moving that out of ColumnDefinition), we can revisit caching  
lookup optimisation in SIM. Right now, every time we look up an index we do and 
filter of all the registered indexes for the table. We can definitely improve 
this and will do so ASAP.
* The mechanism by which we select indexes at query time remains pretty 
restrictive. The query clauses being represented as a list of 
{{RowFilter.Expression}} means only AND conjunctions are supported. This limits 
the scope for query optimisation and makes it difficult to extend search 
capabilities in the future, like adding support for OR for example. I'd like to 
move to something more expressive to give us scope to improve this area in 
future tickets.
* The validation methods on Index need some work. Basically these were simply 
copied from the existing implementation, but they ought to be reworked to 
combine them into a single {{validate(partition_update)}} or at least into 
{{validate(partitionkey)}} and {{validate(row)}}.
* The index transaction classes in 

[jira] [Created] (CASSANDRA-9901) Make AbstractType.isByteOrderComparable abstract

2015-07-25 Thread Benedict (JIRA)
Benedict created CASSANDRA-9901:
---

 Summary: Make AbstractType.isByteOrderComparable abstract
 Key: CASSANDRA-9901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0.0 rc1


I can't recall _precisely_ what was agreed at the NGCC, but I'm reasonably sure 
we agreed to make this method abstract, put some javadoc explaining we may 
require fields to yield true in the near future, and potentially log a warning 
on startup if a user-defined type returns false.

This should make it into 3.0, IMO, so that we can look into migrating to 
byte-order comparable types in the post-3.0 world.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)