date:20150220

Benedict created CASSANDRA-8840:
---

 Summary: Classify our Assertions (like 
com.google.base.Preconditions)
 Key: CASSANDRA-8840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8840
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor


I raised this on IRC, then dropped it due to opposition, but it's possible the 
opposition was due to my conflation of the act of classification with the 
disabling of some of the assertions. The two aren't wed, and I think it _would_ 
improve readability significantly by itself, as Ariel reminded me with his use 
of google's Preconditions class in CASSANDRA-8692.

I would prefer to use our own version of this class, that we can force the 
\@Inline compiler hint onto, so that we have no negative performance 
implications. Also, we can then introduce a class of data corruption checks, 
etc. I think this would aid readability, and also permit easier analysis of the 
codebase via IDE (right now it's very hard to say what data corruption checks 
we actually perform, for instance).

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8839) DatabaseDescriptor throws NPE when rpc_interface is used

2015-02-20 Thread Jan Kesten (JIRA)

Jan Kesten created CASSANDRA-8839:
-

 Summary: DatabaseDescriptor throws NPE when rpc_interface is used
 Key: CASSANDRA-8839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8839
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: 2.1.3
Reporter: Jan Kesten


Copy from mail to dev mailinglist. 

When using

- listen_interface instead of listen_address
- rpc_interface instead of rpc_address

starting 2.1.3 throws an NPE:

{code}
ERROR [main] 2015-02-20 07:50:09,661 DatabaseDescriptor.java:144 - Fatal error 
during configuration loading
java.lang.NullPointerException: null
at 
org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
at 
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:133)
 ~[apache-cassandra-2.1.3.jar:2.1.3]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) 
[apache-cassandra-2.1.3.jar:2.1.3]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) 
[apache-cassandra-2.1.3.jar:2.1.3]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) 
[apache-cassandra-2.1.3.jar:2.1.3]
{code}

Occurs on debian package as well as in tar.gz distribution. 

{code}
/* Local IP, hostname or interface to bind RPC server to */
if(conf.rpc_address !=null conf.rpc_interface !=null)
{
throw newConfigurationException(Set rpc_address OR rpc_interface, not 
both);
}
else if(conf.rpc_address !=null)
{
try
{
rpcAddress = InetAddress.getByName(conf.rpc_address);
}
catch(UnknownHostException e)
{
throw newConfigurationException(Unknown host in rpc_address + 
conf.rpc_address);
}
}
else if(conf.rpc_interface !=null)
{
listenAddress = 
getNetworkInterfaceAddress(conf.rpc_interface,rpc_interface);
}
else
{
rpcAddress = FBUtilities.getLocalAddress();
}
{code}

I think that listenAddress in the second else block is an error. In my case 
rpc_interface is eth0, so listenAddress gets set, and rpcAddress remains unset. 
The result is NPE in line 411:

{code}
if(rpcAddress.isAnyLocalAddress())
{code}

After changing rpc_interface to rpc_address everything works as expected.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8692) Coalesce intra-cluster network messages

[
https://issues.apache.org/jira/browse/CASSANDRA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328732#comment-14328732
]

Benedict commented on CASSANDRA-8692:
-

[~aweisberg] thanks for clarifying that method call. And I like the use of
Preconditions. But I'm not sure the single precondition check requires a
whole extra nested method call? I realised when I woke up the thing that was
bugging me was the inconsistency between the first call to drainTo() which
_did_ assume the list was empty (by not subtracting out.size()) and the
proceeding calls which assumed it had not been empty. So, so long as they are
consistent I'm happy. Although I still prefer the way it is currently... I just
find the extra method for one precondition check a bit ugly. I'll let you make
the final call on this though.

Coalesce intra-cluster network messages
---

Key: CASSANDRA-8692
URL: https://issues.apache.org/jira/browse/CASSANDRA-8692
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
Fix For: 2.1.4

Attachments: batching-benchmark.png

While researching CASSANDRA-8457 we found that it is effective and can be
done without introducing additional latency at low concurrency/throughput.
The patch from that was used and found to be useful in a real life scenario
so I propose we implement this in 2.1 in addition to 3.0.
The change set is a single file and is small enough to be reviewable.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8839) DatabaseDescriptor throws NPE when rpc_interface is used


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328733#comment-14328733
 ] 

Sylvain Lebresne commented on CASSANDRA-8839:
-

Probably just a copy-paste typo from the original patch that is easily fixed, 
but it does suggest the original patch hasn't been properly tested so let's 
make sure we add a test for this.

 DatabaseDescriptor throws NPE when rpc_interface is used
 

 Key: CASSANDRA-8839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8839
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: 2.1.3
Reporter: Jan Kesten
 Fix For: 2.1.4


 Copy from mail to dev mailinglist. 
 When using
 - listen_interface instead of listen_address
 - rpc_interface instead of rpc_address
 starting 2.1.3 throws an NPE:
 {code}
 ERROR [main] 2015-02-20 07:50:09,661 DatabaseDescriptor.java:144 - Fatal 
 error during configuration loading
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:133)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465)
  [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 {code}
 Occurs on debian package as well as in tar.gz distribution. 
 {code}
 /* Local IP, hostname or interface to bind RPC server to */
 if(conf.rpc_address !=null conf.rpc_interface !=null)
 {
 throw newConfigurationException(Set rpc_address OR rpc_interface, not 
 both);
 }
 else if(conf.rpc_address !=null)
 {
 try
 {
 rpcAddress = InetAddress.getByName(conf.rpc_address);
 }
 catch(UnknownHostException e)
 {
 throw newConfigurationException(Unknown host in rpc_address + 
 conf.rpc_address);
 }
 }
 else if(conf.rpc_interface !=null)
 {
 listenAddress = 
 getNetworkInterfaceAddress(conf.rpc_interface,rpc_interface);
 }
 else
 {
 rpcAddress = FBUtilities.getLocalAddress();
 }
 {code}
 I think that listenAddress in the second else block is an error. In my case 
 rpc_interface is eth0, so listenAddress gets set, and rpcAddress remains 
 unset. The result is NPE in line 411:
 {code}
 if(rpcAddress.isAnyLocalAddress())
 {code}
 After changing rpc_interface to rpc_address everything works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8839) DatabaseDescriptor throws NPE when rpc_interface is used


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-8839:

Assignee: Ariel Weisberg

 DatabaseDescriptor throws NPE when rpc_interface is used
 

 Key: CASSANDRA-8839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8839
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: 2.1.3
Reporter: Jan Kesten
Assignee: Ariel Weisberg
 Fix For: 2.1.4


 Copy from mail to dev mailinglist. 
 When using
 - listen_interface instead of listen_address
 - rpc_interface instead of rpc_address
 starting 2.1.3 throws an NPE:
 {code}
 ERROR [main] 2015-02-20 07:50:09,661 DatabaseDescriptor.java:144 - Fatal 
 error during configuration loading
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:133)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465)
  [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 {code}
 Occurs on debian package as well as in tar.gz distribution. 
 {code}
 /* Local IP, hostname or interface to bind RPC server to */
 if(conf.rpc_address !=null conf.rpc_interface !=null)
 {
 throw newConfigurationException(Set rpc_address OR rpc_interface, not 
 both);
 }
 else if(conf.rpc_address !=null)
 {
 try
 {
 rpcAddress = InetAddress.getByName(conf.rpc_address);
 }
 catch(UnknownHostException e)
 {
 throw newConfigurationException(Unknown host in rpc_address + 
 conf.rpc_address);
 }
 }
 else if(conf.rpc_interface !=null)
 {
 listenAddress = 
 getNetworkInterfaceAddress(conf.rpc_interface,rpc_interface);
 }
 else
 {
 rpcAddress = FBUtilities.getLocalAddress();
 }
 {code}
 I think that listenAddress in the second else block is an error. In my case 
 rpc_interface is eth0, so listenAddress gets set, and rpcAddress remains 
 unset. The result is NPE in line 411:
 {code}
 if(rpcAddress.isAnyLocalAddress())
 {code}
 After changing rpc_interface to rpc_address everything works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8839) DatabaseDescriptor throws NPE when rpc_interface is used


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-8839:

Fix Version/s: 2.1.4

 DatabaseDescriptor throws NPE when rpc_interface is used
 

 Key: CASSANDRA-8839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8839
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: 2.1.3
Reporter: Jan Kesten
 Fix For: 2.1.4


 Copy from mail to dev mailinglist. 
 When using
 - listen_interface instead of listen_address
 - rpc_interface instead of rpc_address
 starting 2.1.3 throws an NPE:
 {code}
 ERROR [main] 2015-02-20 07:50:09,661 DatabaseDescriptor.java:144 - Fatal 
 error during configuration loading
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:133)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465)
  [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 {code}
 Occurs on debian package as well as in tar.gz distribution. 
 {code}
 /* Local IP, hostname or interface to bind RPC server to */
 if(conf.rpc_address !=null conf.rpc_interface !=null)
 {
 throw newConfigurationException(Set rpc_address OR rpc_interface, not 
 both);
 }
 else if(conf.rpc_address !=null)
 {
 try
 {
 rpcAddress = InetAddress.getByName(conf.rpc_address);
 }
 catch(UnknownHostException e)
 {
 throw newConfigurationException(Unknown host in rpc_address + 
 conf.rpc_address);
 }
 }
 else if(conf.rpc_interface !=null)
 {
 listenAddress = 
 getNetworkInterfaceAddress(conf.rpc_interface,rpc_interface);
 }
 else
 {
 rpcAddress = FBUtilities.getLocalAddress();
 }
 {code}
 I think that listenAddress in the second else block is an error. In my case 
 rpc_interface is eth0, so listenAddress gets set, and rpcAddress remains 
 unset. The result is NPE in line 411:
 {code}
 if(rpcAddress.isAnyLocalAddress())
 {code}
 After changing rpc_interface to rpc_address everything works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8677) rpc_interface and listen_interface generate NPE on startup when specified interface doesn't exist


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328728#comment-14328728
 ] 

Sylvain Lebresne commented on CASSANDRA-8677:
-

Note: we'll fix that in CASSANDRA-8839.

 rpc_interface and listen_interface generate NPE on startup when specified 
 interface doesn't exist
 -

 Key: CASSANDRA-8677
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8677
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 3.0, 2.1.3

 Attachments: 8677-2.1.patch, 8677.patch


 This is just a buggy UI bit.
 Initially the error I got was this which is redundant and not well formatted.
 {noformat}
 ERROR 20:12:55 Exception encountered during startup
 java.lang.ExceptionInInitializerError: null
 Fatal configuration error; unable to start. See log for stacktrace.
   at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:108)
  ~[main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:122) 
 [main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479)
  [main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) 
 [main/:na]
 java.lang.ExceptionInInitializerError: null
 Fatal configuration error; unable to start. See log for stacktrace.
   at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:108)
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:122)
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479)
   at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571)
 Exception encountered during startup: null
 Fatal configuration error; unable to start. See log for stacktrace.
 ERROR 20:12:55 Exception encountered during startup
 java.lang.ExceptionInInitializerError: null
 Fatal configuration error; unable to start. See log for stacktrace.
   at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:108)
  ~[main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:122) 
 [main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479)
  [main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) 
 [main/:na]
 {noformat}
 This has no description of the error that occurred. After logging the 
 exception.
 {noformat}
 java.lang.NullPointerException: null
   at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:347)
  ~[main/:na]
   at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:102)
  ~[main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:122) 
 [main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:479)
  [main/:na]
   at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:571) 
 [main/:na]
 {noformat}
 Exceptions thrown in the DatabaseDescriptor should log in a useful way.
 This particular error should generate a message without a stack trace since 
 it is easily recognized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8839) DatabaseDescriptor throws NPE when rpc_interface is used

2015-02-20 Thread Jan Kesten (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328729#comment-14328729
 ] 

Jan Kesten commented on CASSANDRA-8839:
---

I checked out 2.1.2 source seems to be a copypaste error while applying 
getNetworkInterfaceAddress from CASSANDRA-8677 :

{code:title=line387}
listenAddress = getNetworkInterfaceAddress(conf.rpc_interface, 
rpc_interface);
{code}

{code:title=line350}
listenAddress = getNetworkInterfaceAddress(conf.listen_interface, 
listen_interface);
{code}

 DatabaseDescriptor throws NPE when rpc_interface is used
 

 Key: CASSANDRA-8839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8839
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: 2.1.3
Reporter: Jan Kesten
 Fix For: 2.1.4


 Copy from mail to dev mailinglist. 
 When using
 - listen_interface instead of listen_address
 - rpc_interface instead of rpc_address
 starting 2.1.3 throws an NPE:
 {code}
 ERROR [main] 2015-02-20 07:50:09,661 DatabaseDescriptor.java:144 - Fatal 
 error during configuration loading
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:133)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465)
  [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 {code}
 Occurs on debian package as well as in tar.gz distribution. 
 {code}
 /* Local IP, hostname or interface to bind RPC server to */
 if(conf.rpc_address !=null conf.rpc_interface !=null)
 {
 throw newConfigurationException(Set rpc_address OR rpc_interface, not 
 both);
 }
 else if(conf.rpc_address !=null)
 {
 try
 {
 rpcAddress = InetAddress.getByName(conf.rpc_address);
 }
 catch(UnknownHostException e)
 {
 throw newConfigurationException(Unknown host in rpc_address + 
 conf.rpc_address);
 }
 }
 else if(conf.rpc_interface !=null)
 {
 listenAddress = 
 getNetworkInterfaceAddress(conf.rpc_interface,rpc_interface);
 }
 else
 {
 rpcAddress = FBUtilities.getLocalAddress();
 }
 {code}
 I think that listenAddress in the second else block is an error. In my case 
 rpc_interface is eth0, so listenAddress gets set, and rpcAddress remains 
 unset. The result is NPE in line 411:
 {code}
 if(rpcAddress.isAnyLocalAddress())
 {code}
 After changing rpc_interface to rpc_address everything works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-6538) Provide a read-time CQL function to display the data size of columns and rows

2015-02-20 Thread Anuja Mandlecha (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuja Mandlecha updated CASSANDRA-6538:
---
Attachment: CodeSnippet.txt

 Provide a read-time CQL function to display the data size of columns and rows
 -

 Key: CASSANDRA-6538
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6538
 Project: Cassandra
  Issue Type: Improvement
Reporter: Johnny Miller
Priority: Minor
  Labels: cql
 Attachments: 6538.patch, CodeSnippet.txt, sizeFzt.PNG


 It would be extremely useful to be able to work out the size of rows and 
 columns via CQL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328898#comment-14328898
 ] 

Alan Boudreault commented on CASSANDRA-8366:


[~krummas] The patch looks good to me. I agree that there is still a slightly 
data overhead, but there is a huge improvement. I've written a dtest for this 
ticket: https://github.com/riptano/cassandra-dtest/pull/171 and here are the 
results:

Result Without the patch (notice the execution time):
{code}
dtest: DEBUG: Total Load size: 10.96GB
-  end captured logging  -
--
Ran 1 test in 5999.675s   

FAILED (failures=1)
{code}

Result with the patch (The test tolerate 25% of data overhead, so maximum 
5.5GB):
{code}
--
Ran 1 test in 2313.278s

OK
{code}

Running repairs with the patch is a lot faster. Is there a ticket already 
created about the sstables will not get anticompacted? 

Thanks for the work!

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Marcus Eriksson
 Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
 results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
 results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 restart once more
 UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.05 GB256 ?

[jira] [Commented] (CASSANDRA-6538) Provide a read-time CQL function to display the data size of columns and rows

2015-02-20 Thread Anuja Mandlecha (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328895#comment-14328895
 ] 

Anuja Mandlecha commented on CASSANDRA-6538:


I looked into the code and as per you requirement of allowing only blob types 
as input to sizeof() we will have to check the datatype of providedArgs against 
BytesType which is used for BLOB using the condition
{code}
if(fun.name().equalsIgnoreCase(sizeof) 
!args.get(0).getType().equals(BytesType.instance))
throw new InvalidRequestException(String.format(Type 
error: %s cannot be passed as argument 0 of function %s of type blob, 
args.get(0), fun.name()));
{code}
But as we can see in attached CodeSnippet.txt, to apply the same condition in 
validateTpes() where other validations are being done, we need getType() of 
providedArgs which we cannot get here since it is an object of class that 
implements interface AssignementTestable.
Hence to resolve this there can be two approaches,
1. Add one more argument of type Function to isAssignable() and check for the 
functionName and valid types and return value accordingly.
2. Use the if condition before calling validateTypes() (i.e. in makeSelector() 
of Selection class)
Note: Approach 1 can result into a lot of code change since the function 
declaration is getting changed.
Please let me know your thoughts on this.

 Provide a read-time CQL function to display the data size of columns and rows
 -

 Key: CASSANDRA-6538
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6538
 Project: Cassandra
  Issue Type: Improvement
Reporter: Johnny Miller
Priority: Minor
  Labels: cql
 Attachments: 6538.patch, sizeFzt.PNG


 It would be extremely useful to be able to work out the size of rows and 
 columns via CQL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6538) Provide a read-time CQL function to display the data size of columns and rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328900#comment-14328900
 ] 

Sylvain Lebresne commented on CASSANDRA-6538:
-

bq. we will have to check the datatype of providedArgs against BytesType

Well, no. As long as {{BytesType}} is declared as the type of argument, the 
type system ensure it will only be called on that. Really, what I'm suggesting 
is simply that you remove the change to {{Selection}} from your patch. 
Actually, that and the fact that you shouldn't use {{bb.array().length}} (as 
this return the length of the array backing the ByteBuffer but 1) that's *not* 
in general the same as the length of the ByteBuffer and 2) it's assuming that 
the ByteBuffer is backed by an array which is not an assumption it should 
make). Instead, just use {{ByteBuffer.remaining()}}.

 Provide a read-time CQL function to display the data size of columns and rows
 -

 Key: CASSANDRA-6538
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6538
 Project: Cassandra
  Issue Type: Improvement
Reporter: Johnny Miller
Priority: Minor
  Labels: cql
 Attachments: 6538.patch, CodeSnippet.txt, sizeFzt.PNG


 It would be extremely useful to be able to work out the size of rows and 
 columns via CQL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8841) single_file_split_test fails on 2.1

Alan Boudreault created CASSANDRA-8841:
--

 Summary: single_file_split_test fails on 2.1
 Key: CASSANDRA-8841
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8841
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Marcus Eriksson
Priority: Minor
 Fix For: 2.1.4


In CASSANDRA-8623, we fix an issue about getting Data component is missing 
errors when splitting multiple sstable one at the time. I wrote a dtest for 
that, which work properly to test that error. However, it seems that the 
CompactionExecutor is failing. It's not the same error, but looks related.  

Test: 
https://github.com/riptano/cassandra-dtest/blob/master/sstablesplit_test.py#L68

Output: 
http://cassci.datastax.com/job/cassandra-2.1_dtest/726/testReport/junit/sstablesplit_test/TestSSTableSplit/single_file_split_test/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8655) Exception on upgrade to trunk


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329029#comment-14329029
 ] 

Philip Thompson commented on CASSANDRA-8655:


I don't know. The test no longer makes it that far because of a different 
error. I'll have to open another ticket.

 Exception on upgrade to trunk
 -

 Key: CASSANDRA-8655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8655
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Thompson
Assignee: Aleksey Yeschenko
 Fix For: 3.0


 The dtest 
 upgrade_through_versions_test.TestUpgrade_from_cassandra_2_1_latest_tag_to_trunk_HEAD.upgrade_test_mixed
  is failing with the following exception:
 {code}
 ERROR [Thread-10] 2015-01-20 14:12:44,117 CassandraDaemon.java:170 - 
 Exception in thread Thread[Thread-10,5,main]
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.db.SliceFromReadCommandSerializer.deserialize(SliceFromReadCommand.java:153)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:157)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:131)
  ~[main/:na]
 at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
 ~[main/:na]
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168)
  ~[main/:na]
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150)
  ~[main/:na]
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82)
  ~[main/:na]
 {code}
 It is trying to execute a simple SELECT k,v FROM cf WHERE k=X query on a 
 trunk node after upgrading from 2.1-HEAD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8692) Coalesce intra-cluster network messages

2015-02-20 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329736#comment-14329736
 ] 

Ariel Weisberg commented on CASSANDRA-8692:
---

You might want to verify what workloads and cluster configurations you are 
going to run. Not all are going to benefit as much or at all from this change 
and the number of configurations that benefit will increase after 
CASSANDRA-8789. What's also as important is that it doesn't cause a regression 
in workloads that don't benefit.

 Coalesce intra-cluster network messages
 ---

 Key: CASSANDRA-8692
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8692
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.4

 Attachments: batching-benchmark.png


 While researching CASSANDRA-8457 we found that it is effective and can be 
 done without introducing additional latency at low concurrency/throughput.
 The patch from that was used and found to be useful in a real life scenario 
 so I propose we implement this in 2.1 in addition to 3.0.
 The change set is a single file and is small enough to be reviewable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8836) Factor out CRC32Ex into a separate maven dependency


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329620#comment-14329620
 ] 

T Jake Luciani commented on CASSANDRA-8836:
---

Opened  https://issues.sonatype.org/browse/OSSRH-13972

 Factor out CRC32Ex into a separate maven dependency
 ---

 Key: CASSANDRA-8836
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8836
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: T Jake Luciani
 Fix For: 3.0

 Attachments: 8836.patch


 The current arrangement works from the CLI, but is inconvenient for developer 
 using Java 7 from an IDE. They have to configure the override class the way 
 build.xml does when compiling.
 If we refactored http://pastebin.com/Z5NAEhzr and the interface it needs to 
 compile http://pastebin.com/tCEvuETA into a separate maven dependency and 
 removed CRC32Ex from CRC32Factory it wouldn't trip up IDEs. They would just 
 add all the jars under lib and move on with life.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8769) Extend cassandra-stress to be slightly more configurable

2015-02-20 Thread Anthony Cozzie (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329637#comment-14329637
 ] 

Anthony Cozzie commented on CASSANDRA-8769:
---

Attached a second patch which includes [~benedict]'s tweaks and requested 
changes.

 Extend cassandra-stress to be slightly more configurable
 

 Key: CASSANDRA-8769
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8769
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Anthony Cozzie
Assignee: Anthony Cozzie
Priority: Minor
 Fix For: 2.1.4

 Attachments: stress-extensions-patch-v2.txt, 
 stress-extensions-patch.txt


 Some simple extensions to cassandra stress:
   * Configurable warm up iterations
   * Output results by command type for USER (e.g. 5000 ops/sec, 1000 inserts, 
 1000 reads, 3000 range reads)
   * Count errors when ignore flag is set
   * Configurable truncate for more consistent results
 Patch attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8769) Extend cassandra-stress to be slightly more configurable

2015-02-20 Thread Anthony Cozzie (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Cozzie updated CASSANDRA-8769:
--
Attachment: stress-extensions-patch-v2.txt

 Extend cassandra-stress to be slightly more configurable
 

 Key: CASSANDRA-8769
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8769
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Anthony Cozzie
Assignee: Anthony Cozzie
Priority: Minor
 Fix For: 2.1.4

 Attachments: stress-extensions-patch-v2.txt, 
 stress-extensions-patch.txt


 Some simple extensions to cassandra stress:
   * Configurable warm up iterations
   * Output results by command type for USER (e.g. 5000 ops/sec, 1000 inserts, 
 1000 reads, 3000 range reads)
   * Count errors when ignore flag is set
   * Configurable truncate for more consistent results
 Patch attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8843) STCS getMaximalTask checking wrong set is empty.

2015-02-20 Thread Jeremiah Jordan (JIRA)

Jeremiah Jordan created CASSANDRA-8843:
--

 Summary: STCS getMaximalTask checking wrong set is empty.
 Key: CASSANDRA-8843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8843
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jeremiah Jordan
Assignee: Marcus Eriksson
Priority: Minor


Pretty sure this should be isEmpty(filteredSSTables) not isEmpty(sstables).

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L333



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

cassandra git commit: Make = optional in CREATE/ALTER ROLE statements

2015-02-20 Thread aleksey

Repository: cassandra
Updated Branches:
  refs/heads/trunk 9fdbb6559 - ee5e48795


Make = optional in CREATE/ALTER ROLE statements


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ee5e4879
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ee5e4879
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ee5e4879

Branch: refs/heads/trunk
Commit: ee5e48795583f525dca50d3535167f26cc0bc8ca
Parents: 9fdbb65
Author: Sam Tunnicliffe s...@beobal.com
Authored: Fri Feb 6 19:06:03 2015 +
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Feb 20 14:43:54 2015 -0800

--
 CHANGES.txt  | 2 +-
 pylib/cqlshlib/cql3handling.py   | 4 ++--
 src/java/org/apache/cassandra/cql3/Cql.g | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ee5e4879/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 07f3448..6a3d059 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,5 @@
 3.0
- * Add role based access control (CASSANDRA-7653, 8650, 7216)
+ * Add role based access control (CASSANDRA-7653, 8650, 7216, 8760)
  * Avoid accessing partitioner through StorageProxy (CASSANDRA-8244, 8268)
  * Upgrade Metrics library and remove depricated metrics (CASSANDRA-5657)
  * Serializing Row cache alternative, fully off heap (CASSANDRA-7438)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ee5e4879/pylib/cqlshlib/cql3handling.py
--
diff --git a/pylib/cqlshlib/cql3handling.py b/pylib/cqlshlib/cql3handling.py
index 9c4e633..6837439 100644
--- a/pylib/cqlshlib/cql3handling.py
+++ b/pylib/cqlshlib/cql3handling.py
@@ -1191,8 +1191,8 @@ syntax_rules += r'''
   ( SUPERUSER | NOSUPERUSER )?
   ( LOGIN | NOLOGIN )?
;
-roleProperty ::= PASSWORD stringLiteral
- | OPTIONS mapLiteral
+roleProperty ::= PASSWORD =? stringLiteral
+ | OPTIONS =? mapLiteral
  ;
 
 dropRoleStatement ::= DROP ROLE rolename

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ee5e4879/src/java/org/apache/cassandra/cql3/Cql.g
--
diff --git a/src/java/org/apache/cassandra/cql3/Cql.g 
b/src/java/org/apache/cassandra/cql3/Cql.g
index 5d5c868..d941bc6 100644
--- a/src/java/org/apache/cassandra/cql3/Cql.g
+++ b/src/java/org/apache/cassandra/cql3/Cql.g
@@ -1023,8 +1023,8 @@ roleOptions[RoleOptions opts]
 ;
 
 roleOption[RoleOptions opts]
-:  k=K_PASSWORD v=STRING_LITERAL { opts.put($k.text, $v.text); }
-|  k=K_OPTIONS  m=mapLiteral { 
opts.put(IRoleManager.Option.OPTIONS.name(), convertPropertyMap(m)); }
+:  k=K_PASSWORD '='? v=STRING_LITERAL { opts.put($k.text, $v.text); }
+|  k=K_OPTIONS '='? m=mapLiteral { 
opts.put(IRoleManager.Option.OPTIONS.name(), convertPropertyMap(m)); }
 ;
 
 /** DEFINITIONS **/

[jira] [Created] (CASSANDRA-8844) Change Data Capture (CDC)

2015-02-20 Thread Tupshin Harper (JIRA)

Tupshin Harper created CASSANDRA-8844:
-

 Summary: Change Data Capture (CDC)
 Key: CASSANDRA-8844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Tupshin Harper
 Fix For: 3.1


In databases, change data capture (CDC) is a set of software design patterns 
used to determine (and track) the data that has changed so that action can be 
taken using the changed data. Also, Change data capture (CDC) is an approach to 
data integration that is based on the identification, capture and delivery of 
the changes made to enterprise data sources.
-Wikipedia

As Cassandra is increasingly being used as the Source of Record (SoR) for 
mission critical data in large enterprises, it is increasingly being called 
upon to act as the central hub of traffic and data flow to other systems. In 
order to try to address the general need, we (cc [~brianmhess]), propose 
implementing a simple data logging mechanism to enable per-table CDC patterns.

h2. The goals:
# Use CQL as the primary ingestion mechanism, in order to leverage its 
Consistency Level semantics, and in order to treat it as the single 
reliable/durable SoR for the data.
# To provide a mechanism for implementing good and reliable 
(deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
continuous semi-realtime feeds of mutations going into a Cassandra cluster.
# To eliminate the developmental and operational burden of users so that they 
don't have to do dual writes to other systems.
# For users that are currently doing batch export from a Cassandra system, give 
them the opportunity to make that realtime with a minimum of coding.

The mechanism:
We propose a durable logging mechanism that functions similar to a commitlog, 
with the following nuances:
- Takes place on every node, not just the coordinator, so RF number of copies 
are logged.
- Separate log per table.
- Per-table configuration. Only tables that are specified as CDC_LOG would do 
any logging.
- Per DC. We are trying to keep the complexity to a minimum to make this an 
easy enhancement, but most likely use cases would prefer to only implement CDC 
logging in one (or a subset) of the DCs that are being replicated to
- In the critical path of ConsistencyLevel acknowledgment. Just as with the 
commitlog, failure to write to the CDC log should fail that node's write. If 
that means the requested consistency level was not met, then clients *should* 
experience UnavailableExceptions.
- Be written in a Row-centric manner such that it is easy for consumers to 
reconstitute rows atomically.
- Written in a simple format designed to be consumed *directly* by daemons 
written in non JVM languages

h2. Nice-to-haves
I strongly suspect that the following features will be asked for, but I also 
believe that they can be deferred for a subsequent release, and to guage actual 
interest.
- Multiple logs per table. This would make it easy to have multiple 
subscribers to a single table's changes. A workaround would be to create a 
forking daemon listener, but that's not a great answer.
- Log filtering. Being able to apply filters, including UDF-based filters would 
make Casandra a much more versatile feeder into other systems, and again, 
reduce complexity that would otherwise need to be built into the daemons.

h2. Format and Consumption
- Cassandra would only write to the CDC log, and never delete from it. 
- Cleaning up consumed logfiles would be the client daemon's responibility
- Logfile size should probably be configurable.
- Logfiles should be named with a predictable naming schema, making it triivial 
to process them in order.
- Daemons should be able to checkpoint their work, and resume from where they 
left off. This means they would have to leave some file artifact in the CDC 
log's directory.
- A sophisticated daemon should be able to be written that could 
-- Catch up, in written-order, even when it is multiple logfiles behind in 
processing
-- Be able to continuously tail the most recent logfile and get 
low-latency(ms?) access to the data as it is written.

h2. Alternate approach
In order to make consuming a change log easy and efficient to do with low 
latency, the following could supplement the approach outlined above
- Instead of writing to a logfile, by default, Cassandra could expose a socket 
for a daemon to connect to, and from which it could pull each row.
- Cassandra would have a limited buffer for storing rows, should the listener 
become backlogged, but it would immediately spill to disk in that case, never 
incurring large in-memory costs.

h2. Additional consumption possibility
With all of the above, still relevant:
- instead (or in addition to) using the other logging mechanisms, use CQL 
transport itself as a logger.
- Extend the CQL protoocol slightly so

[jira] [Updated] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-7296:
---
Assignee: Jeff Jirsa

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper
Assignee: Jeff Jirsa

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-02-20 Thread Jeff Jirsa (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328280#comment-14328280
]

Jeff Jirsa edited comment on CASSANDRA-5791 at 2/21/15 4:53 AM:

Thanks for the feedback.

On whether or not a missing digest indicates corruption. In the case of a
missing digest, does it make more sense to imply --extended and verify atoms?
Doing that at least verifies the inline checksums for compressed sstables?

Most of the remaining nits are 100% valid, and due to me basing this on the
scrub path without eliminating all of the obsolete code. Cleaning up to
address. Only nit that seems inconsistent: sstableverify.bat ability to specify
CASSANDRA_MAIN is consistent with other similar tools (sstablescrub,
sstableupgrade, sstableloader, sstablekeys)

Updated for nits :

https://github.com/jeffjirsa/cassandra/compare/cassandra-5791 and/or
https://github.com/jeffjirsa/cassandra/compare/cassandra-5791.diff

was (Author: jjirsa):
Thanks for the feedback.

A nodetool command to validate all sstables in a node
-

Key: CASSANDRA-5791
URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
Attachments: cassandra-5791.patch-2

CUrrently there is no nodetool command to validate all sstables on disk. The
only way to do this is to run a repair and see if it succeeds. But we cannot
repair the system keyspace.
Also we can run upgrade sstables but that re writes all the sstables.
This command should check the hash of all sstables and return whether all
data is readable all not. This should NOT care about consistency.
The compressed sstables do not have hash so not sure how it will work there.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2015-02-20 Thread Aleksey Yeschenko (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329910#comment-14329910
]

Aleksey Yeschenko commented on CASSANDRA-8844:
--

So, ultimately you need a way to implement per-table replication-event
synchronous listeners. Like triggers, but without the ability to make
additional changes, and invoked on every receiving node, not just the
coordinator.

That has been requested many times before and is ultimately doable. Providing
anything but the API itself is arguably out of scope for Cassandra-proper,
though.

Change Data Capture (CDC)
-

Key: CASSANDRA-8844
URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Tupshin Harper
Fix For: 3.1

In databases, change data capture (CDC) is a set of software design patterns
used to determine (and track) the data that has changed so that action can be
taken using the changed data. Also, Change data capture (CDC) is an approach
to data integration that is based on the identification, capture and delivery
of the changes made to enterprise data sources.
-Wikipedia
As Cassandra is increasingly being used as the Source of Record (SoR) for
mission critical data in large enterprises, it is increasingly being called
upon to act as the central hub of traffic and data flow to other systems. In
order to try to address the general need, we (cc [~brianmhess]), propose
implementing a simple data logging mechanism to enable per-table CDC patterns.
h2. The goals:
# Use CQL as the primary ingestion mechanism, in order to leverage its
Consistency Level semantics, and in order to treat it as the single
reliable/durable SoR for the data.
# To provide a mechanism for implementing good and reliable
(deliver-at-least-once with possible mechanisms for deliver-exactly-once )
continuous semi-realtime feeds of mutations going into a Cassandra cluster.
# To eliminate the developmental and operational burden of users so that they
don't have to do dual writes to other systems.
# For users that are currently doing batch export from a Cassandra system,
give them the opportunity to make that realtime with a minimum of coding.
The mechanism:
We propose a durable logging mechanism that functions similar to a commitlog,
with the following nuances:
- Takes place on every node, not just the coordinator, so RF number of copies
are logged.
- Separate log per table.
- Per-table configuration. Only tables that are specified as CDC_LOG would do
any logging.
- Per DC. We are trying to keep the complexity to a minimum to make this an
easy enhancement, but most likely use cases would prefer to only implement
CDC logging in one (or a subset) of the DCs that are being replicated to
- In the critical path of ConsistencyLevel acknowledgment. Just as with the
commitlog, failure to write to the CDC log should fail that node's write. If
that means the requested consistency level was not met, then clients *should*
experience UnavailableExceptions.
- Be written in a Row-centric manner such that it is easy for consumers to
reconstitute rows atomically.
- Written in a simple format designed to be consumed *directly* by daemons
written in non JVM languages
h2. Nice-to-haves
I strongly suspect that the following features will be asked for, but I also
believe that they can be deferred for a subsequent release, and to guage
actual interest.
- Multiple logs per table. This would make it easy to have multiple
subscribers to a single table's changes. A workaround would be to create a
forking daemon listener, but that's not a great answer.
- Log filtering. Being able to apply filters, including UDF-based filters
would make Casandra a much more versatile feeder into other systems, and
again, reduce complexity that would otherwise need to be built into the
daemons.
h2. Format and Consumption
- Cassandra would only write to the CDC log, and never delete from it.
- Cleaning up consumed logfiles would be the client daemon's responibility
- Logfile size should probably be configurable.
- Logfiles should be named with a predictable naming schema, making it
triivial to process them in order.
- Daemons should be able to checkpoint their work, and resume from where they
left off. This means they would have to leave some file artifact in the CDC
log's directory.
- A sophisticated daemon should be able to be written that could
-- Catch up, in written-order, even when it is multiple logfiles behind in
processing
-- Be able to continuously tail the most recent logfile and get
low-latency(ms?) access to the data as it is written.
h2. Alternate approach
In order to make consuming a change log easy and

[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2015-02-20 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329916#comment-14329916
 ] 

Jeff Jirsa commented on CASSANDRA-7296:
---

Untested patch available at 
https://github.com/jeffjirsa/cassandra/compare/cassandra-7296.diff 
CQLSH requires python driver updated/patch at 
https://github.com/jeffjirsa/python-driver/compare/coordinator-only.diff 

I have no idea if there's interest in actually merging this (that is, if the 
project actually wants CL.COORDINATOR_ONLY). I can see use cases where people 
might want it. I'm not sure if it's worth the added complexity on the project. 
If someone confirms there's interest, I'll do more thorough testing. 

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)

2015-02-20 Thread Tupshin Harper (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tupshin Harper updated CASSANDRA-8844:
--
Description:
In databases, change data capture (CDC) is a set of software design patterns
used to determine (and track) the data that has changed so that action can be
taken using the changed data. Also, Change data capture (CDC) is an approach to
data integration that is based on the identification, capture and delivery of
the changes made to enterprise data sources.
-Wikipedia

As Cassandra is increasingly being used as the Source of Record (SoR) for
mission critical data in large enterprises, it is increasingly being called
upon to act as the central hub of traffic and data flow to other systems. In
order to try to address the general need, we (cc [~brianmhess]), propose
implementing a simple data logging mechanism to enable per-table CDC patterns.

h2. The goals:
# Use CQL as the primary ingestion mechanism, in order to leverage its
Consistency Level semantics, and in order to treat it as the single
reliable/durable SoR for the data.
# To provide a mechanism for implementing good and reliable
(deliver-at-least-once with possible mechanisms for deliver-exactly-once )
continuous semi-realtime feeds of mutations going into a Cassandra cluster.
# To eliminate the developmental and operational burden of users so that they
don't have to do dual writes to other systems.
# For users that are currently doing batch export from a Cassandra system, give
them the opportunity to make that realtime with a minimum of coding.

The mechanism:
We propose a durable logging mechanism that functions similar to a commitlog,
with the following nuances:
- Takes place on every node, not just the coordinator, so RF number of copies
are logged.
- Separate log per table.
- Per-table configuration. Only tables that are specified as CDC_LOG would do
any logging.
- Per DC. We are trying to keep the complexity to a minimum to make this an
easy enhancement, but most likely use cases would prefer to only implement CDC
logging in one (or a subset) of the DCs that are being replicated to
- In the critical path of ConsistencyLevel acknowledgment. Just as with the
commitlog, failure to write to the CDC log should fail that node's write. If
that means the requested consistency level was not met, then clients *should*
experience UnavailableExceptions.
- Be written in a Row-centric manner such that it is easy for consumers to
reconstitute rows atomically.
- Written in a simple format designed to be consumed *directly* by daemons
written in non JVM languages

h2. Nice-to-haves
I strongly suspect that the following features will be asked for, but I also
believe that they can be deferred for a subsequent release, and to guage actual
interest.
- Multiple logs per table. This would make it easy to have multiple
subscribers to a single table's changes. A workaround would be to create a
forking daemon listener, but that's not a great answer.
- Log filtering. Being able to apply filters, including UDF-based filters would
make Casandra a much more versatile feeder into other systems, and again,
reduce complexity that would otherwise need to be built into the daemons.

h2. Format and Consumption
- Cassandra would only write to the CDC log, and never delete from it.
- Cleaning up consumed logfiles would be the client daemon's responibility
- Logfile size should probably be configurable.
- Logfiles should be named with a predictable naming schema, making it triivial
to process them in order.
- Daemons should be able to checkpoint their work, and resume from where they
left off. This means they would have to leave some file artifact in the CDC
log's directory.
- A sophisticated daemon should be able to be written that could
-- Catch up, in written-order, even when it is multiple logfiles behind in
processing
-- Be able to continuously tail the most recent logfile and get
low-latency(ms?) access to the data as it is written.

h2. Alternate approach
In order to make consuming a change log easy and efficient to do with low
latency, the following could supplement the approach outlined above
- Instead of writing to a logfile, by default, Cassandra could expose a socket
for a daemon to connect to, and from which it could pull each row.
- Cassandra would have a limited buffer for storing rows, should the listener
become backlogged, but it would immediately spill to disk in that case, never
incurring large in-memory costs.

h2. Additional consumption possibility
With all of the above, still relevant:
- instead (or in addition to) using the other logging mechanisms, use CQL
transport itself as a logger.
- Extend the CQL protoocol slightly so that rows of data can be return to a
listener that didn't explicit make a query, but instead registered itself with
Cassandra as a listener for a particular event type,

[jira] [Commented] (CASSANDRA-8832) SSTableRewriter.abort() should be more robust to failure


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329877#comment-14329877
 ] 

Alan Boudreault commented on CASSANDRA-8832:


With latest branch cassandra-2.1, I was getting error:
{code}
Aborted cleaning up atleast one column family in keyspace r1, check server logs 
for more information.
...
{code}

I confirm that this patch fixes my issue. Thanks!

 SSTableRewriter.abort() should be more robust to failure
 

 Key: CASSANDRA-8832
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8832
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1.4


 This fixes a bug introduced in CASSANDRA-8124 that attempts to open early 
 during abort, introducing a failure risk. This patch further preempts 
 CASSANDRA-8690 to wrap every rollback action in a try/catch block, so that 
 any internal assertion checks do not actually worsen the state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8160) CF level option to call posix_fadvise for sstables on creation and startup

2015-02-20 Thread Matt Stump (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329875#comment-14329875
]

Matt Stump commented on CASSANDRA-8160:
---

For several large low-latency deployments the OS wasn't aggressively moving
SSTables into the cache despite available memory. In these instances the data
set was smaller than available memory and the cluster was sized for the read
ops/second. Forcing the SSTables into the buffer cache by cating them out to
/dev/null had an immediate impact on the 95% latency metric. This was an effort
to force the OS to aggressively pre-cache. I was experimenting with
FADV_SEQUENTIAL vs WILLNEED because the documentation and implementation
differences between releases is unclear. I'm leaving the patch here in it's
current state because I get pulled away often and hadn't had the time to finish
the necessary testing to justify it's inclusion. I'll try to finish the work if
I can.

CF level option to call posix_fadvise for sstables on creation and startup
--

Key: CASSANDRA-8160
URL: https://issues.apache.org/jira/browse/CASSANDRA-8160
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Matt Stump
Assignee: Branimir Lambov
Priority: Minor
Fix For: 2.1.4

Attachments: trunk-8160.txt

We should have a CF level configuration with will result in posix_fadvise
being called for sstables for that CF. It should be called on node startup
and for new sstables. This should be configurable per CF to allow for some
CFs to be prioritized above others. Not sure if we should use
POSIX_FADV_SEQUENTIAL or POSIX_FADV_WILLNEED.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8835) 100% CPU spikes from disruptor thrift server

2015-02-20 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329950#comment-14329950
 ] 

Pavel Yaskevich commented on CASSANDRA-8835:


So I did similar thing to Netty, it detects if selector returned N times in a 
row with empty selection of keys then replaces selector to newly allocated one 
and migrates keys there, work is pushed to 
https://github.com/xedin/disruptor_thrift_server/tree/100_percent_cpu_epoll 
(that also has patch for complete read operation reverted), so [~rbranson] 
please give it a try and let me know.

 100% CPU spikes from disruptor thrift server
 

 Key: CASSANDRA-8835
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8835
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 2.0.11, hsha, JDK7u65, ~6,000 connections per machine
Reporter: Rick Branson
Assignee: Pavel Yaskevich

 Seeing big CPU spikes (to 100%) inside of the TDisruptorServer when running 
 hsha. perf top loaded with JVM symbols shows this one floating to the top for 
 anywhere between 15 and 90 seconds and then falling down:
 Lcom/thinkaurelius/thrift/TDisruptorServer$SelectorThread;.selectorIterationComplete
  in Lcom/thinkaurelius/thrift/TDisruptorServer$AbstractSelectorThread;.select



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8834) Top partitions reporting wrong cardinality

2015-02-20 Thread Chris Lohfink (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-8834:
-
Since Version: 2.1.3

 Top partitions reporting wrong cardinality
 --

 Key: CASSANDRA-8834
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8834
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Chris Lohfink
Assignee: Chris Lohfink
 Attachments: cardinality.patch


 It always reports a cardinality of 1.  Patch also includes a try/catch around 
 the conversion of partition keys that isn't always handled well in thrift cfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-8835) 100% CPU spikes from disruptor thrift server


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-8835:
---

Assignee: Pavel Yaskevich

 100% CPU spikes from disruptor thrift server
 

 Key: CASSANDRA-8835
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8835
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 2.0.11, hsha, JDK7u65, ~6,000 connections per machine
Reporter: Rick Branson
Assignee: Pavel Yaskevich

 Seeing big CPU spikes (to 100%) inside of the TDisruptorServer when running 
 hsha. perf top loaded with JVM symbols shows this one floating to the top for 
 anywhere between 15 and 90 seconds and then falling down:
 Lcom/thinkaurelius/thrift/TDisruptorServer$SelectorThread;.selectorIterationComplete
  in Lcom/thinkaurelius/thrift/TDisruptorServer$AbstractSelectorThread;.select



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8842) Upgrade java-driver used for stres

T Jake Luciani created CASSANDRA-8842:
-

 Summary: Upgrade java-driver used for stres
 Key: CASSANDRA-8842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8842
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.4


There are a number of java-driver issues I've been hitting while using stress 
on large clusters.  These issues are fixed in the later driver releases. Mainly 
race conditions.

https://github.com/datastax/java-driver/blob/2.0/driver-core/CHANGELOG.rst#2010



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8842) Upgrade java-driver used for stress


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-8842:
--
Summary: Upgrade java-driver used for stress  (was: Upgrade java-driver 
used for stres)

 Upgrade java-driver used for stress
 ---

 Key: CASSANDRA-8842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8842
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.4


 There are a number of java-driver issues I've been hitting while using stress 
 on large clusters.  These issues are fixed in the later driver releases. 
 Mainly race conditions.
 https://github.com/datastax/java-driver/blob/2.0/driver-core/CHANGELOG.rst#2010



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8839) DatabaseDescriptor throws NPE when rpc_interface is used

2015-02-20 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329042#comment-14329042
 ] 

Ariel Weisberg commented on CASSANDRA-8839:
---

It's broken on trunk as well. I'll add it to DatabaseDescriptorTest.

 DatabaseDescriptor throws NPE when rpc_interface is used
 

 Key: CASSANDRA-8839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8839
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: 2.1.3
Reporter: Jan Kesten
Assignee: Ariel Weisberg
 Fix For: 2.1.4


 Copy from mail to dev mailinglist. 
 When using
 - listen_interface instead of listen_address
 - rpc_interface instead of rpc_address
 starting 2.1.3 throws an NPE:
 {code}
 ERROR [main] 2015-02-20 07:50:09,661 DatabaseDescriptor.java:144 - Fatal 
 error during configuration loading
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:411)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:133)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465)
  [apache-cassandra-2.1.3.jar:2.1.3]
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) 
 [apache-cassandra-2.1.3.jar:2.1.3]
 {code}
 Occurs on debian package as well as in tar.gz distribution. 
 {code}
 /* Local IP, hostname or interface to bind RPC server to */
 if(conf.rpc_address !=null conf.rpc_interface !=null)
 {
 throw newConfigurationException(Set rpc_address OR rpc_interface, not 
 both);
 }
 else if(conf.rpc_address !=null)
 {
 try
 {
 rpcAddress = InetAddress.getByName(conf.rpc_address);
 }
 catch(UnknownHostException e)
 {
 throw newConfigurationException(Unknown host in rpc_address + 
 conf.rpc_address);
 }
 }
 else if(conf.rpc_interface !=null)
 {
 listenAddress = 
 getNetworkInterfaceAddress(conf.rpc_interface,rpc_interface);
 }
 else
 {
 rpcAddress = FBUtilities.getLocalAddress();
 }
 {code}
 I think that listenAddress in the second else block is an error. In my case 
 rpc_interface is eth0, so listenAddress gets set, and rpcAddress remains 
 unset. The result is NPE in line 411:
 {code}
 if(rpcAddress.isAnyLocalAddress())
 {code}
 After changing rpc_interface to rpc_address everything works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8834) Top partitions reporting wrong cardinality

2015-02-20 Thread Chris Lohfink (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329034#comment-14329034
 ] 

Chris Lohfink commented on CASSANDRA-8834:
--

patch is against 2.1 branch

 Top partitions reporting wrong cardinality
 --

 Key: CASSANDRA-8834
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8834
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Chris Lohfink
Assignee: Chris Lohfink
 Attachments: cardinality.patch


 It always reports a cardinality of 1.  Patch also includes a try/catch around 
 the conversion of partition keys that isn't always handled well in thrift cfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329104#comment-14329104
 ] 

Brandon Williams commented on CASSANDRA-8838:
-

Worth noting though that even if we resume streaming, the node won't be fully 
complete at the end, since it will have missed any writes for it while it was 
down. Perhaps people with large nodes may not care though depending on their CL.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8842) Upgrade java-driver used for stress


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-8842:
--
Reviewer: Benedict  (was: Aleksey Yeschenko)

 Upgrade java-driver used for stress
 ---

 Key: CASSANDRA-8842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8842
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.4


 There are a number of java-driver issues I've been hitting while using stress 
 on large clusters.  These issues are fixed in the later driver releases. 
 Mainly race conditions.
 https://github.com/datastax/java-driver/blob/2.0/driver-core/CHANGELOG.rst#2010



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-8568) Impose new API on data tracker modifications that makes correct usage obvious and imposes safety


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-8568:
---

Assignee: Benedict

 Impose new API on data tracker modifications that makes correct usage obvious 
 and imposes safety
 

 Key: CASSANDRA-8568
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8568
 Project: Cassandra
  Issue Type: Bug
Reporter: Benedict
Assignee: Benedict

 DataTracker has become a bit of a quagmire, and not at all obvious to 
 interface with, with many subtly different modifiers. I suspect it is still 
 subtly broken, especially around error recovery.
 I propose piggy-backing on CASSANDRA-7705 to offer RAII (and GC-enforced, for 
 those situations where a try/finally block isn't possible) objects that have 
 transactional behaviour, and with few simple declarative methods that can be 
 composed simply to provide all of the functionality we currently need.
 See CASSANDRA-8399 for context



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329134#comment-14329134
 ] 

Benedict edited comment on CASSANDRA-7066 at 2/20/15 5:09 PM:
--

So, I'm pretty sure this is still buggy. It also doesn't integrate well with 
the macro compaction actions. I'd like to propose a slight change to how we 
write sstables that would integrate better with when we make changes, and that 
would also eliminate TMP and TMPLINK files: basically write a mini transaction 
log file prior to any modification of the liveset, that outlines what we are 
doing, then delete it once we're done. So:

* if we're flushing a memtable we write that the new file is in progress
* if we're compacting multiple files into one, we write that the new file(s) 
are in progress, then when they're done, we write a new log file saying we're 
swapping these files (as a checkpoint), then clear the in progress log file 
and write that we're deleting the old files, followed by immediately 
promoting the new ones and deleting our swapping log entry

On startup we just read every log file in any CF directory, and take the 
appropriate action: any in progress or deleting files are deleted; if there 
is a swapping log entry, we just pick up where we left off, deleting the old.

This would dovetail well with the changes I plan for CASSANDRA-8568. [~krummas] 
[~JoshuaMcKenzie] [~yukim] [~carlyeks] thoughts?


was (Author: benedict):
So, I'm pretty sure this is still buggy. It also doesn't integrate well with 
the macro compaction actions. I'd like to propose a slight change to how we 
write sstables that would integrate better with when we make changes, and that 
would also eliminate TMP and TMPLINK files: basically write a mini transaction 
log file prior to any modification of the liveset, that outlines what we are 
doing, then delete it once we're done. So:

* if we're flushing a memtable we write that the new file is in progress
* if we're compacting multiple files into one, we write that the new file(s) 
are in progress, then when they're done, we write a new log file saying we're 
swapping these files (as a checkpoint), then clear the in progress log file 
and write that we're deleting the old files, followed by immediately 
promoting the new ones and deleting our swapping log entry

On startup we just read every log file in any CF directory, and take the 
appropriate action: any in progress or deleting files are deleted; if there 
is a swapping log entry, we just pick up where we left off, deleting the old.

This would dovetail well with the changes I plan for CASSANDRA-8568. [~marcuse] 
[~JoshuaMcKenzie] [~yukim] [~carlyeks] thoughts?

 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor
  Labels: compaction
 Fix For: 3.0


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329186#comment-14329186
 ] 

T Jake Luciani commented on CASSANDRA-8757:
---

Looking at this in conjunction with CASSANDRA-8689.  There is some issue with 
this patch, all tests are hanging on the SchemaLoader

{code}
main prio=10 tid=0x7f9ab000f000 nid=0x11d5 waiting on condition 
[0x7f9ab7447000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xecb941f8 (a 
java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398)
at 
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:375)
at 
org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:231)
at 
org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:220)
at 
org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:215)
at org.apache.cassandra.SchemaLoader.loadSchema(SchemaLoader.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
{code}

 IndexSummaryBuilder should construct itself offheap, and share memory between 
 the result of each build() invocation
 ---

 Key: CASSANDRA-8757
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1.4






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-20 Thread Matt Stump (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329192#comment-14329192
 ] 

Matt Stump commented on CASSANDRA-8833:
---

I'm not against refactoring of SSTableReader, I think it's overly complicated, 
but I also think that the fix being proposed is overly complicated. You can't 
get away from OS specific code because the code to manage the file system cache 
is OS specific. The trySkipCache logic in CLibrary is currently specific to 
Linux. I've slightly broadened it to Linux and some BSD variants (excluding 
OSX) in my patch for CASSANDRA-8160. We can get the desired behavior for 
Windows through selective use of the SEC_NOCACHE flag when opening the memory 
mapped file for windows, but that would require pushing the cache advisory 
logic one lower deeper. I don't think switching to an NIO channel will 
alleviate us from the need to provide hints to the OS of which IO is or is not 
appropriate to cache.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277925#comment-14277925
 ] 

Alex Liu edited comment on CASSANDRA-8576 at 2/20/15 5:04 PM:
--

v1 patch is attached to only support full partition key EQ queries.

To test it, e.g. table with key1, and key2 partition columns

{code}
set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url

input_cql=select%20*%20from%20compositekeytable%20where%20key1%20%3D%20%27key1%27%20and%20key2%20%3D%20111%20and%20column1%3D100
{code}


was (Author: alexliu68):
v1 patch is attached to only support full partition key EQ queries.

To test it, e.g. table with key1, and key2 partition columns

{code}
set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url
{code}

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-7066:
---

Assignee: Benedict

 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: compaction
 Fix For: 3.0


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329203#comment-14329203
 ] 

Tyler Hobbs commented on CASSANDRA-8786:


How difficult would it be to write a dtest or unit test for this?

 NullPointerException in ColumnDefinition.hasIndexOption
 ---

 Key: CASSANDRA-8786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.2
Reporter: Mathijs Vogelzang
Assignee: Aleksey Yeschenko
 Fix For: 2.1.4

 Attachments: 8786.txt


 We have a Cassandra cluster that we've been using through many upgrades, and 
 thus most of our column families have originally been created by Thrift. We 
 are on Cassandra 2.1.2 now.
 We've now ported most of our code to use CQL, and our code occasionally tries 
 to recreate tables with IF NOT EXISTS to work properly on development / 
 testing environments.
 When we issue the CQL statement CREATE INDEX IF NOT EXISTS index ON 
 tableName (accountId) (this index does exist on that table already), we 
 get a {{DriverInternalError: An unexpected error occurred server side on 
 cass_host/xx.xxx.xxx.xxx:9042: java.lang.NullPointerException}}
 The error on the server is:
 {noformat}
  java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.ColumnDefinition.hasIndexOption(ColumnDefinition.java:489)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.statements.CreateIndexStatement.validate(CreateIndexStatement.java:87)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 {noformat}
 This happens every time we run this CQL statement. We've tried to reproduce 
 it in a test cassandra cluster by creating the table according to the exact 
 DESCRIBE TABLE specification, but then this NullPointerException doesn't 
 happon upon the CREATE INDEX one. So it seems that the tables on our 
 production cluster (that were originally created through thrift) are still 
 subtly different schema-wise then a freshly created table according to the 
 same creation statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329221#comment-14329221
 ] 

T Jake Luciani commented on CASSANDRA-8833:
---

I think the reason pre-opening results has exposed so many issues is we had 
always assumed in the code that sstables are immutable.  pre-opening basically 
makes them mutable so in hindsight it's pretty clearly a major issue.  I'm on 
the fence as to what we should do here. Perhaps we can solve this performance 
problem differently once CASSANDRA-5863 is in.  If we control the page cache we 
can expunge the old sstables and warm the new one more intelligently than with 
the FADVISE tricks

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

cassandra git commit: fix bad merge

Repository: cassandra
Updated Branches:
  refs/heads/trunk ea3ff25a0 - 9fdbb6559


fix bad merge


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9fdbb655
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9fdbb655
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9fdbb655

Branch: refs/heads/trunk
Commit: 9fdbb6559dc57341cd5b3382748250cfadc06ae5
Parents: ea3ff25
Author: T Jake Luciani j...@apache.org
Authored: Fri Feb 20 12:38:28 2015 -0500
Committer: T Jake Luciani j...@apache.org
Committed: Fri Feb 20 12:38:28 2015 -0500

--
 build.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/9fdbb655/build.xml
--
diff --git a/build.xml b/build.xml
index 416f0e0..0b9d1a2 100644
--- a/build.xml
+++ b/build.xml
@@ -376,7 +376,7 @@
   dependency groupId=io.netty artifactId=netty-all 
version=4.0.23.Final /
   dependency groupId=com.google.code.findbugs artifactId=jsr305 
version=2.0.2 /
   dependency groupId=com.clearspring.analytics artifactId=stream 
version=2.5.2 /
-  dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core version=2.0.9.2 /
+  dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core version=2.1.2 /
   dependency groupId=org.javassist artifactId=javassist 
version=3.18.2-GA /
   dependency groupId=org.caffinitas.ohc artifactId=ohc-core 
version=0.3.2 /
   dependency groupId=net.sf.supercsv artifactId=super-csv 
version=2.1.0 /

[jira] [Commented] (CASSANDRA-8767) Added column does not sort as the last column when using new python driver


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329233#comment-14329233
 ] 

Alan Boudreault commented on CASSANDRA-8767:


Tested and LGTM. Thanks!

 Added column does not sort as the last column when using new python driver
 

 Key: CASSANDRA-8767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8767
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Drivers (now out of tree)
 Environment: Cassandra 2.0.10, python-driver 2.1.3
Reporter: Russ Garrett
Assignee: Tyler Hobbs
 Fix For: 2.0.13

 Attachments: 8767-debug-logging.txt, 8767.patch, describe-table.txt, 
 exception-with-logging.txt, exception.txt


 We've just upgraded one of our python apps from using the old cql library to 
 the new python-driver. When running one particular query, it produces the 
 attached assertion error in Cassandra. The query is: 
 bq. SELECT buffer, id, type, json FROM events WHERE buffer = %(bid)s AND 
 idkey = %(idkey)s ORDER BY id ASC
 Where buffer and idkey are integer primary keys, and id is the clustering key 
 (ordered asc).
 This query, with identical parameters, does not cause this error using the 
 old cql python library, or with the cqlsh client..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8842) Upgrade java-driver used for stress


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329127#comment-14329127
 ] 

Benedict commented on CASSANDRA-8842:
-

+1

 Upgrade java-driver used for stress
 ---

 Key: CASSANDRA-8842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8842
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.4


 There are a number of java-driver issues I've been hitting while using stress 
 on large clusters.  These issues are fixed in the later driver releases. 
 Mainly race conditions.
 https://github.com/datastax/java-driver/blob/2.0/driver-core/CHANGELOG.rst#2010



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8832) SSTableRewriter.abort() should be more robust to failure


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8832:
---
Tester: Alan Boudreault

 SSTableRewriter.abort() should be more robust to failure
 

 Key: CASSANDRA-8832
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8832
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1.4


 This fixes a bug introduced in CASSANDRA-8124 that attempts to open early 
 during abort, introducing a failure risk. This patch further preempts 
 CASSANDRA-8690 to wrap every rollback action in a try/catch block, so that 
 any internal assertion checks do not actually worsen the state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

cassandra git commit: Upgrade java-driver used by cassandra-stress

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 e4980b3b8 - 407b5de67


Upgrade java-driver used by cassandra-stress

patch by tjake; reviewed by benedict for CASSANDRA-8842


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/407b5de6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/407b5de6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/407b5de6

Branch: refs/heads/cassandra-2.1
Commit: 407b5de67cd06ff9e58b3288b8db243e67bc037d
Parents: e4980b3
Author: T Jake Luciani j...@apache.org
Authored: Fri Feb 20 11:24:43 2015 -0500
Committer: T Jake Luciani j...@apache.org
Committed: Fri Feb 20 11:32:59 2015 -0500

--
 CHANGES.txt |   1 +
 build.xml   |   2 +-
 tools/lib/cassandra-driver-core-2.0.5.jar   | Bin 544552 - 0 bytes
 tools/lib/cassandra-driver-core-2.0.9.2.jar | Bin 0 - 1847276 bytes
 4 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3b6ce5d..d9560fc 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.4
+ * Upgrade java-driver used for cassandra-stress (CASSANDRA-8842)
  * Fix CommitLog.forceRecycleAllSegments() memory access error (CASSANDRA-8812)
  * Improve assertions in Memory (CASSANDRA-8792)
  * Fix SSTableRewriter cleanup (CASSANDRA-8802)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/build.xml
--
diff --git a/build.xml b/build.xml
index eaef534..e969085 100644
--- a/build.xml
+++ b/build.xml
@@ -401,7 +401,7 @@
   dependency groupId=io.netty artifactId=netty-all 
version=4.0.23.Final /
   dependency groupId=com.google.code.findbugs artifactId=jsr305 
version=2.0.2 /
   dependency groupId=com.clearspring.analytics artifactId=stream 
version=2.5.2 /
-  dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core version=2.0.5 /
+  dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core version=2.0.9.2 /
   dependency groupId=net.sf.supercsv artifactId=super-csv 
version=2.1.0 /
  dependency groupId=net.ju-n.compile-command-annotations 
artifactId=compile-command-annotations version=1.2.0 /
 /dependencyManagement

http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/tools/lib/cassandra-driver-core-2.0.5.jar
--
diff --git a/tools/lib/cassandra-driver-core-2.0.5.jar 
b/tools/lib/cassandra-driver-core-2.0.5.jar
deleted file mode 100644
index 260183e..000
Binary files a/tools/lib/cassandra-driver-core-2.0.5.jar and /dev/null differ

http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/tools/lib/cassandra-driver-core-2.0.9.2.jar
--
diff --git a/tools/lib/cassandra-driver-core-2.0.9.2.jar 
b/tools/lib/cassandra-driver-core-2.0.9.2.jar
new file mode 100644
index 000..3f82e77
Binary files /dev/null and b/tools/lib/cassandra-driver-core-2.0.9.2.jar differ

[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329134#comment-14329134
 ] 

Benedict commented on CASSANDRA-7066:
-

So, I'm pretty sure this is still buggy. It also doesn't integrate well with 
the macro compaction actions. I'd like to propose a slight change to how we 
write sstables that would integrate better with when we make changes, and that 
would also eliminate TMP and TMPLINK files: basically write a mini transaction 
log file prior to any modification of the liveset, that outlines what we are 
doing, then delete it once we're done. So:

* if we're flushing a memtable we write that the new file is in progress
* if we're compacting multiple files into one, we write that the new file(s) 
are in progress, then when they're done, we write a new log file saying we're 
swapping these files (as a checkpoint), then clear the in progress log file 
and write that we're deleting the old files, followed by immediately 
promoting the new ones and deleting our swapping log entry

On startup we just read every log file in any CF directory, and take the 
appropriate action: any in progress or deleting files are deleted; if there 
is a swapping log entry, we just pick up where we left off, deleting the old.

This would dovetail well with the changes I plan for CASSANDRA-8568. [~marcuse] 
[~JoshuaMcKenzie] [~yukim] [~carlyeks] thoughts?

 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor
  Labels: compaction
 Fix For: 3.0


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk

Merge branch 'cassandra-2.1' into trunk

Conflicts:
build.xml


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ea3ff25a
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ea3ff25a
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ea3ff25a

Branch: refs/heads/trunk
Commit: ea3ff25a0cc204b2c77b4c43da25011b8f62a845
Parents: a709821 407b5de
Author: T Jake Luciani j...@apache.org
Authored: Fri Feb 20 11:37:55 2015 -0500
Committer: T Jake Luciani j...@apache.org
Committed: Fri Feb 20 11:37:55 2015 -0500

--
 CHANGES.txt |   1 +
 build.xml   |   2 +-
 tools/lib/cassandra-driver-core-2.0.9.2.jar | Bin 0 - 1847276 bytes
 3 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ea3ff25a/CHANGES.txt
--
diff --cc CHANGES.txt
index 85ca87d,d9560fc..07f3448
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,65 -1,5 +1,66 @@@
 +3.0
 + * Add role based access control (CASSANDRA-7653, 8650, 7216)
 + * Avoid accessing partitioner through StorageProxy (CASSANDRA-8244, 8268)
 + * Upgrade Metrics library and remove depricated metrics (CASSANDRA-5657)
 + * Serializing Row cache alternative, fully off heap (CASSANDRA-7438)
 + * Duplicate rows returned when in clause has repeated values (CASSANDRA-6707)
 + * Make CassandraException unchecked, extend RuntimeException (CASSANDRA-8560)
 + * Support direct buffer decompression for reads (CASSANDRA-8464)
 + * DirectByteBuffer compatible LZ4 methods (CASSANDRA-7039)
 + * Group sstables for anticompaction correctly (CASSANDRA-8578)
 + * Add ReadFailureException to native protocol, respond
 +   immediately when replicas encounter errors while handling
 +   a read request (CASSANDRA-7886)
 + * Switch CommitLogSegment from RandomAccessFile to nio (CASSANDRA-8308)
 + * Allow mixing token and partition key restrictions (CASSANDRA-7016)
 + * Support index key/value entries on map collections (CASSANDRA-8473)
 + * Modernize schema tables (CASSANDRA-8261)
 + * Support for user-defined aggregation functions (CASSANDRA-8053)
 + * Fix NPE in SelectStatement with empty IN values (CASSANDRA-8419)
 + * Refactor SelectStatement, return IN results in natural order instead
 +   of IN value list order and ignore duplicate values in partition key IN 
restrictions (CASSANDRA-7981)
 + * Support UDTs, tuples, and collections in user-defined
 +   functions (CASSANDRA-7563)
 + * Fix aggregate fn results on empty selection, result column name,
 +   and cqlsh parsing (CASSANDRA-8229)
 + * Mark sstables as repaired after full repair (CASSANDRA-7586)
 + * Extend Descriptor to include a format value and refactor reader/writer
 +   APIs (CASSANDRA-7443)
 + * Integrate JMH for microbenchmarks (CASSANDRA-8151)
 + * Keep sstable levels when bootstrapping (CASSANDRA-7460)
 + * Add Sigar library and perform basic OS settings check on startup 
(CASSANDRA-7838)
 + * Support for aggregation functions (CASSANDRA-4914)
 + * Remove cassandra-cli (CASSANDRA-7920)
 + * Accept dollar quoted strings in CQL (CASSANDRA-7769)
 + * Make assassinate a first class command (CASSANDRA-7935)
 + * Support IN clause on any partition key column (CASSANDRA-7855)
 + * Support IN clause on any clustering column (CASSANDRA-4762)
 + * Improve compaction logging (CASSANDRA-7818)
 + * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917)
 + * Do anticompaction in groups (CASSANDRA-6851)
 + * Support user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 
7929,
 +   7924, 7812, 8063, 7813, 7708)
 + * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416)
 + * Move sstable RandomAccessReader to nio2, which allows using the
 +   FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050)
 + * Remove CQL2 (CASSANDRA-5918)
 + * Add Thrift get_multi_slice call (CASSANDRA-6757)
 + * Optimize fetching multiple cells by name (CASSANDRA-6933)
 + * Allow compilation in java 8 (CASSANDRA-7028)
 + * Make incremental repair default (CASSANDRA-7250)
 + * Enable code coverage thru JaCoCo (CASSANDRA-7226)
 + * Switch external naming of 'column families' to 'tables' (CASSANDRA-4369) 
 + * Shorten SSTable path (CASSANDRA-6962)
 + * Use unsafe mutations for most unit tests (CASSANDRA-6969)
 + * Fix race condition during calculation of pending ranges (CASSANDRA-7390)
 + * Fail on very large batch sizes (CASSANDRA-8011)
 + * Improve concurrency of repair (CASSANDRA-6455, 8208)
 + * Select optimal CRC32 implementation at runtime (CASSANDRA-8614)
 + * Evaluate MurmurHash of Token once per query (CASSANDRA-7096)
 +
 +
  2.1.4
+  * Upgrade java-driver used for cassandra-stress (CASSANDRA-8842)
   * Fix CommitLog.forceRecycleAllSegments() memory

[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8576:
---
Tester: Philip Thompson

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers

2015-02-20 Thread Joshua McKenzie (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329158#comment-14329158
 ] 

Joshua McKenzie commented on CASSANDRA-7066:


I much prefer the idea of us building a simple custom abstraction that handles 
our view of the state of sstable deletion rather than an amalgamation of system 
tables + file names or parsing ancestor sets in sstable metadata - both of 
those options sound overly complex to address the question of are we done with 
this sstable?.

From a Windows perspective, I'm immediately in favor of limiting any renaming 
we do on grounds of ntfs being picky about that.

What you're proposing, [~benedict], sounds easier to reason about, more simple, 
and more elegant than the other options. I'm in favor.

 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor
  Labels: compaction
 Fix For: 3.0


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8767) Added column does not sort as the last column when using new python driver


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8767:
---
Reproduced In: 2.0.12, 2.0.10  (was: 2.0.10, 2.0.12)
   Labels: qa-resolved  (was: )

 Added column does not sort as the last column when using new python driver
 

 Key: CASSANDRA-8767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8767
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Drivers (now out of tree)
 Environment: Cassandra 2.0.10, python-driver 2.1.3
Reporter: Russ Garrett
Assignee: Tyler Hobbs
  Labels: qa-resolved
 Fix For: 2.0.13

 Attachments: 8767-debug-logging.txt, 8767.patch, describe-table.txt, 
 exception-with-logging.txt, exception.txt


 We've just upgraded one of our python apps from using the old cql library to 
 the new python-driver. When running one particular query, it produces the 
 attached assertion error in Cassandra. The query is: 
 bq. SELECT buffer, id, type, json FROM events WHERE buffer = %(bid)s AND 
 idkey = %(idkey)s ORDER BY id ASC
 Where buffer and idkey are integer primary keys, and id is the clustering key 
 (ordered asc).
 This query, with identical parameters, does not cause this error using the 
 old cql python library, or with the cqlsh client..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[1/2] cassandra git commit: Upgrade java-driver used by cassandra-stress

Repository: cassandra
Updated Branches:
  refs/heads/trunk a70982162 - ea3ff25a0


Upgrade java-driver used by cassandra-stress

patch by tjake; reviewed by benedict for CASSANDRA-8842


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/407b5de6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/407b5de6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/407b5de6

Branch: refs/heads/trunk
Commit: 407b5de67cd06ff9e58b3288b8db243e67bc037d
Parents: e4980b3
Author: T Jake Luciani j...@apache.org
Authored: Fri Feb 20 11:24:43 2015 -0500
Committer: T Jake Luciani j...@apache.org
Committed: Fri Feb 20 11:32:59 2015 -0500

--
 CHANGES.txt |   1 +
 build.xml   |   2 +-
 tools/lib/cassandra-driver-core-2.0.5.jar   | Bin 544552 - 0 bytes
 tools/lib/cassandra-driver-core-2.0.9.2.jar | Bin 0 - 1847276 bytes
 4 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3b6ce5d..d9560fc 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.4
+ * Upgrade java-driver used for cassandra-stress (CASSANDRA-8842)
  * Fix CommitLog.forceRecycleAllSegments() memory access error (CASSANDRA-8812)
  * Improve assertions in Memory (CASSANDRA-8792)
  * Fix SSTableRewriter cleanup (CASSANDRA-8802)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/build.xml
--
diff --git a/build.xml b/build.xml
index eaef534..e969085 100644
--- a/build.xml
+++ b/build.xml
@@ -401,7 +401,7 @@
   dependency groupId=io.netty artifactId=netty-all 
version=4.0.23.Final /
   dependency groupId=com.google.code.findbugs artifactId=jsr305 
version=2.0.2 /
   dependency groupId=com.clearspring.analytics artifactId=stream 
version=2.5.2 /
-  dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core version=2.0.5 /
+  dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core version=2.0.9.2 /
   dependency groupId=net.sf.supercsv artifactId=super-csv 
version=2.1.0 /
  dependency groupId=net.ju-n.compile-command-annotations 
artifactId=compile-command-annotations version=1.2.0 /
 /dependencyManagement

http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/tools/lib/cassandra-driver-core-2.0.5.jar
--
diff --git a/tools/lib/cassandra-driver-core-2.0.5.jar 
b/tools/lib/cassandra-driver-core-2.0.5.jar
deleted file mode 100644
index 260183e..000
Binary files a/tools/lib/cassandra-driver-core-2.0.5.jar and /dev/null differ

http://git-wip-us.apache.org/repos/asf/cassandra/blob/407b5de6/tools/lib/cassandra-driver-core-2.0.9.2.jar
--
diff --git a/tools/lib/cassandra-driver-core-2.0.9.2.jar 
b/tools/lib/cassandra-driver-core-2.0.9.2.jar
new file mode 100644
index 000..3f82e77
Binary files /dev/null and b/tools/lib/cassandra-driver-core-2.0.9.2.jar differ

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329142#comment-14329142
 ] 

Alex Liu commented on CASSANDRA-8576:
-

[~brandon.williams] Do u have time to review this ticket? 

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-02-20 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277925#comment-14277925
 ] 

Alex Liu edited comment on CASSANDRA-8576 at 2/20/15 5:03 PM:
--

v1 patch is attached to only support full partition key EQ queries.

To test it, e.g. table with key1, and key2 partition columns

{code}
set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url
{code}


was (Author: alexliu68):
v1 patch is attached to only support full partition key EQ queries.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.4

 Attachments: 8576-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329213#comment-14329213
 ] 

Philip Thompson commented on CASSANDRA-8786:


I'll defer to [~iamaleksey] here. I wasn't able to reproduce.

 NullPointerException in ColumnDefinition.hasIndexOption
 ---

 Key: CASSANDRA-8786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.2
Reporter: Mathijs Vogelzang
Assignee: Aleksey Yeschenko
 Fix For: 2.1.4

 Attachments: 8786.txt


 We have a Cassandra cluster that we've been using through many upgrades, and 
 thus most of our column families have originally been created by Thrift. We 
 are on Cassandra 2.1.2 now.
 We've now ported most of our code to use CQL, and our code occasionally tries 
 to recreate tables with IF NOT EXISTS to work properly on development / 
 testing environments.
 When we issue the CQL statement CREATE INDEX IF NOT EXISTS index ON 
 tableName (accountId) (this index does exist on that table already), we 
 get a {{DriverInternalError: An unexpected error occurred server side on 
 cass_host/xx.xxx.xxx.xxx:9042: java.lang.NullPointerException}}
 The error on the server is:
 {noformat}
  java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.ColumnDefinition.hasIndexOption(ColumnDefinition.java:489)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.statements.CreateIndexStatement.validate(CreateIndexStatement.java:87)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 {noformat}
 This happens every time we run this CQL statement. We've tried to reproduce 
 it in a test cassandra cluster by creating the table according to the exact 
 DESCRIBE TABLE specification, but then this NullPointerException doesn't 
 happon upon the CREATE INDEX one. So it seems that the tables on our 
 production cluster (that were originally created through thrift) are still 
 subtly different schema-wise then a freshly created table according to the 
 same creation statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8819) LOCAL_QUORUM writes returns wrong message


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329239#comment-14329239
 ] 

Tyler Hobbs commented on CASSANDRA-8819:


Hmm, the changes from CASSANDRA-8058 already fixed the {{totalBlockFor}} 
calculation for DatacenterWriteResponder (although the commit is a bit 
different/cleaner than the original patch).  Since this was seen in 2.0.8, I 
think this ticket is just a dupe.

Regardless, we could still use a dtest for this.

 LOCAL_QUORUM writes returns wrong message
 -

 Key: CASSANDRA-8819
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8819
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS 6.6
Reporter: Wei Zhu
Assignee: Sylvain Lebresne
 Fix For: 2.0.13

 Attachments: 8819-2.0.patch


 We have two DC3, each with 7 nodes.
 Here is the keyspace setup:
  create keyspace test
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 3, DC1 : 3}
  and durable_writes = true;
 We brought down two nodes in DC2 for maintenance. We only write to DC1 using 
 local_quroum (using datastax JavaClient)
 But we see this errors in the log:
 Cassandra timeout during write query at consistency LOCAL_QUORUM (4 replica 
 were required but only 3 acknowledged the write
 why does it say 4 replica were required? and Why would it give error back to 
 client since local_quorum should succeed.
 Here are the output from nodetool status
 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace
 Datacenter: DC2
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns   Host ID
Rack
 UN  10.2.0.1  10.92 GB   256 7.9%     RAC206
 UN  10.2.0.2   6.17 GB256 8.0%     RAC106
 UN  10.2.0.3  6.63 GB256 7.3%     RAC107
 DL  10.2.0.4  1.54 GB256 7.7%    RAC107
 UN  10.2.0.5  6.02 GB256 6.6%     RAC106
 UJ  10.2.0.6   3.68 GB256 ?    RAC205
 UN  10.2.0.7  7.22 GB256 7.7%    RAC205
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns   Host ID
Rack
 UN  10.1.0.1   6.04 GB256 8.6%    RAC10
 UN  10.1.0.2   7.55 GB256 7.4%     RAC8
 UN  10.1.0.3   5.83 GB256 7.0%     RAC9
 UN  10.1.0.47.34 GB256 7.9%     RAC6
 UN  10.1.0.5   7.57 GB256 8.0%    RAC7
 UN  10.1.0.6   5.31 GB256 7.3%     RAC10
 UN  10.1.0.7   5.47 GB256 8.6%    RAC9
 I did a cql trace on the query and here is the trace, and it does say 
Write timeout; received 3 of 4 required replies | 17:27:52,831 |  10.1.0.1 
 |2002873
 at the end. I guess that is where the client gets the error from. But the 
 rows was inserted to Cassandra correctly. And I traced read with local_quorum 
 and it behaves correctly and the reads don't go to DC2. The problem is only 
 with writes on local_quorum.
 {code}
 Tracing session: 5a789fb0-b70d-11e4-8fca-99bff9c19890
  activity 
| timestamp
 | source  | source_elapsed
 -+--+-+
   
 execute_cql3_query | 17:27:50,828 
 |  10.1.0.1 |  0
  Parsing insert into test (user_id, created, event_data, event_id)values ( 
 123456789 , 9eab8950-b70c-11e4-8fca-99bff9c19891, 'test', '16'); | 
 17:27:50,828 |  10.1.0.1 | 39
   
Preparing statement | 17:27:50,828 
 |  10.1.0.1 |135
   
  Message received from /10.1.0.1 | 17:27:50,829 | 
  10.1.0.5 | 25
   
 Sending message

[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-20 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329250#comment-14329250
 ] 

Jonathan Ellis commented on CASSANDRA-8838:
---

Do we not hint for bootstrapping nodes?  I imagine that should be an easy fix.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption

2015-02-20 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329495#comment-14329495
 ] 

Aleksey Yeschenko commented on CASSANDRA-8786:
--

Absolutely.

 NullPointerException in ColumnDefinition.hasIndexOption
 ---

 Key: CASSANDRA-8786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.2
Reporter: Mathijs Vogelzang
Assignee: Aleksey Yeschenko
 Fix For: 2.1.4

 Attachments: 8786.txt


 We have a Cassandra cluster that we've been using through many upgrades, and 
 thus most of our column families have originally been created by Thrift. We 
 are on Cassandra 2.1.2 now.
 We've now ported most of our code to use CQL, and our code occasionally tries 
 to recreate tables with IF NOT EXISTS to work properly on development / 
 testing environments.
 When we issue the CQL statement CREATE INDEX IF NOT EXISTS index ON 
 tableName (accountId) (this index does exist on that table already), we 
 get a {{DriverInternalError: An unexpected error occurred server side on 
 cass_host/xx.xxx.xxx.xxx:9042: java.lang.NullPointerException}}
 The error on the server is:
 {noformat}
  java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.ColumnDefinition.hasIndexOption(ColumnDefinition.java:489)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.statements.CreateIndexStatement.validate(CreateIndexStatement.java:87)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 {noformat}
 This happens every time we run this CQL statement. We've tried to reproduce 
 it in a test cassandra cluster by creating the table according to the exact 
 DESCRIBE TABLE specification, but then this NullPointerException doesn't 
 happon upon the CREATE INDEX one. So it seems that the tables on our 
 production cluster (that were originally created through thrift) are still 
 subtly different schema-wise then a freshly created table according to the 
 same creation statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7653) Add role based access control to Cassandra

2015-02-20 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329667#comment-14329667
 ] 

Jonathan Ellis commented on CASSANDRA-7653:
---

Two questions:

- Why is superuser a flag on a role instead of a permission?
- {{CREATE ROLE manager WITH LOGIN PASSWORD ’foo’}} is more natural than 
{{CREATE ROLE manager WITH  PASSWORD ’foo’ LOGIN}}.  Can we make WITH the 
global option delimiter the way it is for CREATE TABLE, rather than tied to 
password?

 Add role based access control to Cassandra
 --

 Key: CASSANDRA-7653
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7653
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Mike Adamson
Assignee: Sam Tunnicliffe
  Labels: docs-impacting, security
 Fix For: 3.0

 Attachments: 7653.patch, CQLSmokeTest.java, cql_smoke_test.py


 The current authentication model supports granting permissions to individual 
 users. While this is OK for small or medium organizations wanting to 
 implement authorization, it does not work well in large organizations because 
 of the overhead of having to maintain the permissions for each user.
 Introducing roles into the authentication model would allow sets of 
 permissions to be controlled in one place as a role and then the role granted 
 to users. Roles should also be able to be granted to other roles to allow 
 hierarchical sets of permissions to be built up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8779) Add type code to binary query parameters in QUERY messages


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-8779:
---
  Component/s: API
Fix Version/s: 3.0
   Labels: client-impacting protocolv4  (was: )
   Issue Type: Improvement  (was: Bug)
  Summary: Add type code to binary query parameters in QUERY messages  
(was: Able to unintentionally nest tuples during insert)

 Add type code to binary query parameters in QUERY messages
 --

 Key: CASSANDRA-8779
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8779
 Project: Cassandra
  Issue Type: Improvement
  Components: API
 Environment: Linux Mint 64-bit | ruby-driver 2.1 | java-driver 2.1 | 
 C* 2.1.2
Reporter: Kishan Karunaratne
Assignee: Tyler Hobbs
  Labels: client-impacting, protocolv4
 Fix For: 3.0


 If I insert a tuple using an extra pair of ()'s, C* will let me do the 
 insert, but (incorrectly) creates a nested tuple as the first tuple value. 
 Upon doing a select statement, the result is jumbled and has weird binary in 
 it (which I wasn't able to copy into here).
 Example using ruby-driver:
 {noformat}
 session.execute(CREATE TABLE mytable (a int PRIMARY KEY, b 
 frozentupleascii, bigint, boolean))
 complete = Cassandra::Tuple.new('foo', 123, true)
 session.execute(INSERT INTO mytable (a, b) VALUES (0, (?)), arguments: 
 [complete])# extra ()'s here
 result = session.execute(SELECT b FROM mytable WHERE a=0).first
 p result['b']
 {noformat}
 Output:
 {noformat}
 #Cassandra::Tuple:0x97b328 (fo{, , )
 {noformat}
 Bug also confirmed using java-driver. 
 Example using java-driver:
 {noformat}
 session.execute(CREATE TABLE mytable (a int PRIMARY KEY, b 
 frozentupleascii, int, boolean));
 TupleType t = TupleType.of(DataType.ascii(), DataType.cint(), 
 DataType.cboolean());
 TupleValue complete = t.newValue(foo, 123, true);
 session.execute(INSERT INTO mytable (a, b) VALUES (0, (?)), complete); // 
 extra ()'s here
 TupleValue r = session.execute(SELECT b FROM mytable WHERE 
 a=0).one().getTupleValue(b);
 System.out.println(r);
 {noformat}
 Output:
 {noformat}
 ('foo{', null, null)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329253#comment-14329253
 ] 

Benedict commented on CASSANDRA-8833:
-

I'm not sure I follow. Is your comment intended for this ticket?

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

[
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329257#comment-14329257
]

Benedict commented on CASSANDRA-8833:
-

bq. pre-opening basically makes them mutable so in hindsight it's pretty
clearly a major issue.

Pre-opening did _not_ make them mutable. Index summary redistribution did this,
and I followed suit in the way it dealt with this. We _will_ have to eliminate
index summary redistribution as well if we want to restore immutability.

Stop opening compaction results early
-

Key: CASSANDRA-8833
URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
Project: Cassandra
Issue Type: Improvement
Reporter: Marcus Eriksson
Fix For: 2.1.4

We should simplify the code base by not doing early opening of compaction
results. It makes it very hard to reason about sstable life cycles since they
can be in many different states, opened early, starts moved, shadowed,
final, instead of as before, basically just one (tmp files are not really
'live' yet so I don't count those). The ref counting of shared resources
between sstables in these different states is also hard to reason about. This
has caused quite a few issues since we released 2.1
I think it all boils down to a performance vs code complexity issue, is
opening compaction results early really 'worth it' wrt the performance gain?
The results in CASSANDRA-6916 sure look like the benefits are big enough, but
the difference should not be as big for people on SSDs (which most people who
care about latencies are)
WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8336) Quarantine nodes after receiving the gossip shutdown message

[
https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329286#comment-14329286
]

Brandon Williams commented on CASSANDRA-8336:
-

bq. The shutting down node might as well set the version of the shutdown state
to Integer.MAX_VALUE since receiving nodes will blindly use that.

Well, as I explained in an earlier comment, this isn't really much of an
optimization, and if the nodes receive the RPC first, we have to modify it on
the receiver anyway, so it seems cleaner to reuse markAsShutdown for both.

bq. Why does it increment the generation number? We call
Gossiper.instance.start with a new generation number set to the current time so
it would make sense to use that.

Because start calls maybeInitializeLocalState which won't actually add the
current time heartbeat, since as the method says, it will only add the new
state if the gossiper has never been started before (meaning we don't know our
own state.)

bq. If hit 'Unable to gossip with any seeds’ on replace, it shuts down the
gossiper. This throws an AssertionError in addLocalApplicationState since the
local epState is null.

Hmm, probably the best thing to do there is change it from stop to
stopForLeaving (though that method needs a better name now) since there's no
point in sending shutdown notifications for a node that isn't a member.

Quarantine nodes after receiving the gossip shutdown message

Key: CASSANDRA-8336
URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Fix For: 2.0.13

Attachments: 8336-v2.txt, 8336-v3.txt, 8336.txt

In CASSANDRA-3936 we added a gossip shutdown announcement. The problem here
is that this isn't sufficient; you can still get TOEs and have to wait on the
FD to figure things out. This happens due to gossip propagation time and
variance; if node X shuts down and sends the message to Y, but Z has a
greater gossip version than Y for X and has not yet received the message, it
can initiate gossip with Y and thus mark X alive again. I propose
quarantining to solve this, however I feel it should be a -D parameter you
have to specify, so as not to destroy current dev and test practices, since
this will mean a node that shuts down will not be able to restart until the
quarantine expires.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329323#comment-14329323
 ] 

Brandon Williams commented on CASSANDRA-8838:
-

Well, then the question becomes for how long?  If they try again reasonably 
soon, the fat client timeout gives them RING_DELAY, which I guess there's no 
harm in tweaking as long as you know the consequence.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8160) CF level option to call posix_fadvise for sstables on creation and startup


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329267#comment-14329267
 ] 

Benedict commented on CASSANDRA-8160:
-

What's the rationale here? This only seems to make sense if the dataset is 
smaller than memory? And even then, using FADV_SEQUENTIAL seems unhelpful. All 
that offers is the OS the opportunity to _evict_ pages since it thinks we're 
done with them. WILLNEED asks it _not_ to, but it may do so at the expense of 
pages it really should retain.

 CF level option to call posix_fadvise for sstables on creation and startup
 --

 Key: CASSANDRA-8160
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8160
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Matt Stump
Assignee: Branimir Lambov
Priority: Minor
 Fix For: 2.1.4

 Attachments: trunk-8160.txt


 We should have a CF level configuration with will result in posix_fadvise 
 being called for sstables for that CF. It should be called on node startup 
 and for new sstables. This should be configurable per CF to allow for some 
 CFs to be prioritized above others. Not sure if we should use 
 POSIX_FADV_SEQUENTIAL or POSIX_FADV_WILLNEED. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8655) Exception on upgrade to trunk


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329029#comment-14329029
 ] 

Philip Thompson edited comment on CASSANDRA-8655 at 2/20/15 6:32 PM:
-

Yes, this is still occurring, and can still be reproduced with the same test.


was (Author: philipthompson):
I don't know. The test no longer makes it that far because of a different 
error. I'll have to open another ticket.

 Exception on upgrade to trunk
 -

 Key: CASSANDRA-8655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8655
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Thompson
Assignee: Aleksey Yeschenko
 Fix For: 3.0


 The dtest 
 upgrade_through_versions_test.TestUpgrade_from_cassandra_2_1_latest_tag_to_trunk_HEAD.upgrade_test_mixed
  is failing with the following exception:
 {code}
 ERROR [Thread-10] 2015-01-20 14:12:44,117 CassandraDaemon.java:170 - 
 Exception in thread Thread[Thread-10,5,main]
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.db.SliceFromReadCommandSerializer.deserialize(SliceFromReadCommand.java:153)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:157)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:131)
  ~[main/:na]
 at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
 ~[main/:na]
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168)
  ~[main/:na]
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150)
  ~[main/:na]
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82)
  ~[main/:na]
 {code}
 It is trying to execute a simple SELECT k,v FROM cf WHERE k=X query on a 
 trunk node after upgrading from 2.1-HEAD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8692) Coalesce intra-cluster network messages

2015-02-20 Thread Russ Hatch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329361#comment-14329361
 ] 

Russ Hatch commented on CASSANDRA-8692:
---

I'm currently working on benchmarks comparing trunk unpatched with trunk 
patched, hoping to have some results before the day is out.

 Coalesce intra-cluster network messages
 ---

 Key: CASSANDRA-8692
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8692
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.4

 Attachments: batching-benchmark.png


 While researching CASSANDRA-8457 we found that it is effective and can be 
 done without introducing additional latency at low concurrency/throughput.
 The patch from that was used and found to be useful in a real life scenario 
 so I propose we implement this in 2.1 in addition to 3.0.
 The change set is a single file and is small enough to be reviewable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8832) SSTableRewriter.abort() should be more robust to failure


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329877#comment-14329877
 ] 

Alan Boudreault edited comment on CASSANDRA-8832 at 2/21/15 1:04 AM:
-

With latest branch cassandra-2.1, I was getting this error:
{code}
Aborted cleaning up atleast one column family in keyspace r1, check server logs 
for more information.
...
{code}

I confirm that this patch fixes my issue. Thanks!


was (Author: aboudreault):
With latest branch cassandra-2.1, I was getting error:
{code}
Aborted cleaning up atleast one column family in keyspace r1, check server logs 
for more information.
...
{code}

I confirm that this patch fixes my issue. Thanks!

 SSTableRewriter.abort() should be more robust to failure
 

 Key: CASSANDRA-8832
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8832
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1.4


 This fixes a bug introduced in CASSANDRA-8124 that attempts to open early 
 during abort, introducing a failure risk. This patch further preempts 
 CASSANDRA-8690 to wrap every rollback action in a try/catch block, so that 
 any internal assertion checks do not actually worsen the state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8835) 100% CPU spikes from disruptor thrift server

2015-02-20 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329430#comment-14329430
 ] 

Pavel Yaskevich commented on CASSANDRA-8835:


[~rbranson] I think it might be actually related to epoll behavior that Netty 
observed some time again when they put back-off mechanism into the select(..) 
code, I will try to do the same thing and give you a patch which would be on 
top of 0.3.8 so you can test it out.

 100% CPU spikes from disruptor thrift server
 

 Key: CASSANDRA-8835
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8835
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 2.0.11, hsha, JDK7u65, ~6,000 connections per machine
Reporter: Rick Branson
Assignee: Pavel Yaskevich

 Seeing big CPU spikes (to 100%) inside of the TDisruptorServer when running 
 hsha. perf top loaded with JVM symbols shows this one floating to the top for 
 anywhere between 15 and 90 seconds and then falling down:
 Lcom/thinkaurelius/thrift/TDisruptorServer$SelectorThread;.selectorIterationComplete
  in Lcom/thinkaurelius/thrift/TDisruptorServer$AbstractSelectorThread;.select



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8838) Resumable bootstrap streaming


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329459#comment-14329459
 ] 

T Jake Luciani edited comment on CASSANDRA-8838 at 2/20/15 8:02 PM:


This is a major win!  Since  bootstraps on dense nodes can take many hours.  If 
anything goes wrong you need to wipe and restart.




was (Author: tjake):
This is a major win!  Since failed bootstraps on dense nodes can take many 
hours.  If anything goes wrong you need to wipe and restart.



 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329459#comment-14329459
 ] 

T Jake Luciani commented on CASSANDRA-8838:
---

This is a major win!  Since failed bootstraps on dense nodes can take many 
hours.  If anything goes wrong you need to wipe and restart.



 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption

2015-02-20 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329480#comment-14329480
 ] 

Aleksey Yeschenko commented on CASSANDRA-8786:
--

Non-trivial to create a meaningful one, most definitely not worth the effort to 
add a test for just this particular case.

That said, having a new set of upgrade tests focused on schema manipulations 
after each upgrade would probably be a good idea.

 NullPointerException in ColumnDefinition.hasIndexOption
 ---

 Key: CASSANDRA-8786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.2
Reporter: Mathijs Vogelzang
Assignee: Aleksey Yeschenko
 Fix For: 2.1.4

 Attachments: 8786.txt


 We have a Cassandra cluster that we've been using through many upgrades, and 
 thus most of our column families have originally been created by Thrift. We 
 are on Cassandra 2.1.2 now.
 We've now ported most of our code to use CQL, and our code occasionally tries 
 to recreate tables with IF NOT EXISTS to work properly on development / 
 testing environments.
 When we issue the CQL statement CREATE INDEX IF NOT EXISTS index ON 
 tableName (accountId) (this index does exist on that table already), we 
 get a {{DriverInternalError: An unexpected error occurred server side on 
 cass_host/xx.xxx.xxx.xxx:9042: java.lang.NullPointerException}}
 The error on the server is:
 {noformat}
  java.lang.NullPointerException: null
 at 
 org.apache.cassandra.config.ColumnDefinition.hasIndexOption(ColumnDefinition.java:489)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.statements.CreateIndexStatement.validate(CreateIndexStatement.java:87)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 {noformat}
 This happens every time we run this CQL statement. We've tried to reproduce 
 it in a test cassandra cluster by creating the table according to the exact 
 DESCRIBE TABLE specification, but then this NullPointerException doesn't 
 happon upon the CREATE INDEX one. So it seems that the tables on our 
 production cluster (that were originally created through thrift) are still 
 subtly different schema-wise then a freshly created table according to the 
 same creation statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8786) NullPointerException in ColumnDefinition.hasIndexOption