[jira] [Commented] (CASSANDRA-11106) Experiment with strategies for picking compaction candidates in LCS

2016-03-25 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212803#comment-15212803
 ] 

Dikang Gu commented on CASSANDRA-11106:
---

[~krummas] cool, is anyone working on this now?

> Experiment with strategies for picking compaction candidates in LCS
> ---
>
> Key: CASSANDRA-11106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11106
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: lcs
> Fix For: 3.x
>
>
> Ideas taken here: http://rocksdb.org/blog/2921/compaction_pri/
> Current strategy in LCS is that we keep track of the token that was last 
> compacted and then we start a compaction with the sstable containing the next 
> token (kOldestSmallestSeqFirst in the blog post above)
> The rocksdb blog post above introduces a few ideas how this could be improved:
> * pick the 'coldest' sstable (sstable with the oldest max timestamp) - we 
> want to keep the hot data (recently updated) in the lower levels to avoid 
> write amplification
> * pick the sstable with the highest tombstone ratio, we want to get 
> tombstones to the top level as quickly as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11441) Filtering based on partition key feature in bulkLoader utility

2016-03-25 Thread Varun Barala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Barala updated CASSANDRA-11441:
-
Reviewer: Varun Barala

> Filtering based on partition key feature in bulkLoader utility 
> ---
>
> Key: CASSANDRA-11441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11441
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Varun Barala
>
> This feature will support user to transfer only required part of sstable 
> instead of  entire sstable. 
> Usage :-
> If someone has a CF of composite partition key. Let's say 
> [user(text),id(uuid)].
> Now if he/she only wants to transfer data of some given 'user A' not entire 
> SSTable. 
> So we can add one more parameter in our BulkLoader program 'filtering in 
> partition key'. 
> cmd will look like :- 
> bin/sstableLoader -h  --filtering  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11370) Display sstable count per level according to repair status on nodetool tablestats

2016-03-25 Thread Varun Barala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Barala updated CASSANDRA-11370:
-
Status: Ready to Commit  (was: Patch Available)

> Display sstable count per level according to repair status on nodetool 
> tablestats 
> --
>
> Key: CASSANDRA-11370
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11370
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: lhf
>
> After CASSANDRA-8004 we still display sstables in each level on nodetool 
> tablestats as if we had a single compaction strategy, while we have one 
> strategy for repaired and another for unrepaired data. 
> We should split display into repaired and unrepaired set, so this:
> SSTables in each level: [2, 20/10, 15, 0, 0, 0, 0, 0, 0]
> Would become:
> SSTables in each level (repaired): [1, 10, 0, 0, 0, 0, 0, 0, 0]
> SSTables in each level (unrepaired): [1, 10, 15, 0, 0, 0, 0, 0, 0]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11441) Filtering based on partition key feature in bulkLoader utility

2016-03-25 Thread Varun Barala (JIRA)
Varun Barala created CASSANDRA-11441:


 Summary: Filtering based on partition key feature in bulkLoader 
utility 
 Key: CASSANDRA-11441
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11441
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Varun Barala


This feature will support user to transfer only required part of sstable 
instead of  entire sstable. 

Usage :-
If someone has a CF of composite partition key. Let's say [user(text),id(uuid)].
Now if he/she only wants to transfer data of some given 'user A' not entire 
SSTable. 

So we can add one more parameter in our BulkLoader program 'filtering in 
partition key'. 

cmd will look like :- 
bin/sstableLoader -h  --filtering  





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11395) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212765#comment-15212765
 ] 

Russ Hatch commented on CASSANDRA-11395:


one singular failure out of 300, still need to investigate 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/46/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/whole_map_conditional_test/

> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test
> ---
>
> Key: CASSANDRA-11395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11395
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: Russ Hatch
>  Labels: dtest
>
> {code}
> Expected [[0, ['foo', 'bar'], 'foobar']] from SELECT * FROM test, but got 
> [[0, [u'foi', u'bar'], u'foobar']]
> {code}
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/24/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/cas_and_list_index_test
> Failed on CassCI build upgrade_tests-all #24
> Probably a consistency issue in the test code, but I haven't looked into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11395) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212765#comment-15212765
 ] 

Russ Hatch edited comment on CASSANDRA-11395 at 3/26/16 3:33 AM:
-

one singular failure out of 100, still need to investigate 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/46/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/whole_map_conditional_test/


was (Author: rhatch):
one singular failure out of 300, still need to investigate 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/46/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/whole_map_conditional_test/

> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test
> ---
>
> Key: CASSANDRA-11395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11395
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: Russ Hatch
>  Labels: dtest
>
> {code}
> Expected [[0, ['foo', 'bar'], 'foobar']] from SELECT * FROM test, but got 
> [[0, [u'foi', u'bar'], u'foobar']]
> {code}
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/24/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/cas_and_list_index_test
> Failed on CassCI build upgrade_tests-all #24
> Probably a consistency issue in the test code, but I haven't looked into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-03-25 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212733#comment-15212733
 ] 

Pavel Yaskevich commented on CASSANDRA-11067:
-

[~omichallat] I think it might be either depending on what you are trying to do 
because "%" are required but have to be user provided.

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
>  Labels: client-impacting
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6246) EPaxos

2016-03-25 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-6246:
---
Status: In Progress  (was: Ready to Commit)

> EPaxos
> --
>
> Key: CASSANDRA-6246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Blake Eggleston
>  Labels: messaging-service-bump-required
> Fix For: 3.x
>
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is 
> that Multi-paxos requires leader election and hence, a period of 
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
> (2) is particularly useful across multiple datacenters, and (3) allows any 
> node to act as coordinator: 
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to 
> implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11365) Recovering failed from a single disk failure using JBOD

2016-03-25 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212700#comment-15212700
 ] 

Paulo Motta commented on CASSANDRA-11365:
-

yes, make sure to run repair, since your node might be missing data from the 
lost disk.

> Recovering failed from a single disk failure using JBOD
> ---
>
> Key: CASSANDRA-11365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: cassandra 2.1.11
> jdk 1.7
>Reporter: zhaoyan
>Assignee: Paulo Motta
>  Labels: docs-impacting
>
> one cassandra node's one disk is failture. so it is down.
> i try recovering the node follow:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRecoverUsingJBOD.html
> but i get follow error when restart the node:
> ERROR 02:58:00 Exception encountered during startup
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
> Exception encountered during startup: A node with address /192.168.xx. 
> already exists, cancelling join. Use cassandra.replace_address if you want to 
> replace this node.
> INFO  02:58:00 Announcing shutdown
> INFO  02:58:02 Waiting for messaging service to quiesce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11365) Recovering failed from a single disk failure using JBOD

2016-03-25 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-11365:

Labels: docs-impacting  (was: )

> Recovering failed from a single disk failure using JBOD
> ---
>
> Key: CASSANDRA-11365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: cassandra 2.1.11
> jdk 1.7
>Reporter: zhaoyan
>Assignee: Paulo Motta
>  Labels: docs-impacting
>
> one cassandra node's one disk is failture. so it is down.
> i try recovering the node follow:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRecoverUsingJBOD.html
> but i get follow error when restart the node:
> ERROR 02:58:00 Exception encountered during startup
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
> Exception encountered during startup: A node with address /192.168.xx. 
> already exists, cancelling join. Use cassandra.replace_address if you want to 
> replace this node.
> INFO  02:58:00 Announcing shutdown
> INFO  02:58:02 Waiting for messaging service to quiesce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11365) Recovering failed from a single disk failure using JBOD

2016-03-25 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta resolved CASSANDRA-11365.
-
Resolution: Not A Problem

> Recovering failed from a single disk failure using JBOD
> ---
>
> Key: CASSANDRA-11365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: cassandra 2.1.11
> jdk 1.7
>Reporter: zhaoyan
>Assignee: Paulo Motta
>  Labels: docs-impacting
>
> one cassandra node's one disk is failture. so it is down.
> i try recovering the node follow:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRecoverUsingJBOD.html
> but i get follow error when restart the node:
> ERROR 02:58:00 Exception encountered during startup
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
> Exception encountered during startup: A node with address /192.168.xx. 
> already exists, cancelling join. Use cassandra.replace_address if you want to 
> replace this node.
> INFO  02:58:00 Announcing shutdown
> INFO  02:58:02 Waiting for messaging service to quiesce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11365) Recovering failed from a single disk failure using JBOD

2016-03-25 Thread zhaoyan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212699#comment-15212699
 ] 

zhaoyan commented on CASSANDRA-11365:
-

thanks for your help
 "auto_bootstrap: false"  is working.

i have  an another question: 

if has no new disk to replace the failture one, can I delete the failture disk 
in cassandra.yaml?

> Recovering failed from a single disk failure using JBOD
> ---
>
> Key: CASSANDRA-11365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: cassandra 2.1.11
> jdk 1.7
>Reporter: zhaoyan
>Assignee: Paulo Motta
>
> one cassandra node's one disk is failture. so it is down.
> i try recovering the node follow:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRecoverUsingJBOD.html
> but i get follow error when restart the node:
> ERROR 02:58:00 Exception encountered during startup
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) 
> [apache-cassandra-2.1.11.jar:2.1.11]
> java.lang.RuntimeException: A node with address /192.168.xx.xx already 
> exists, cancelling join. Use cassandra.replace_address if you want to replace 
> this node.
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:788)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
> Exception encountered during startup: A node with address /192.168.xx. 
> already exists, cancelling join. Use cassandra.replace_address if you want to 
> replace this node.
> INFO  02:58:00 Announcing shutdown
> INFO  02:58:02 Waiting for messaging service to quiesce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6246) EPaxos

2016-03-25 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated CASSANDRA-6246:
-
Status: Ready to Commit  (was: Patch Available)

> EPaxos
> --
>
> Key: CASSANDRA-6246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Blake Eggleston
>  Labels: messaging-service-bump-required
> Fix For: 3.x
>
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is 
> that Multi-paxos requires leader election and hence, a period of 
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
> (2) is particularly useful across multiple datacenters, and (3) allows any 
> node to act as coordinator: 
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to 
> implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11380) Client visible backpressure mechanism

2016-03-25 Thread Wei Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212642#comment-15212642
 ] 

Wei Deng commented on CASSANDRA-11380:
--

bq. but one simple client mechanism, especially in bulk loading scenarios, is 
to set a slightly higher consistency level.

That's exactly based on the load shedding approach mentioned in the first 
paragraph, and is not always effective.

> Client visible backpressure mechanism
> -
>
> Key: CASSANDRA-11380
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11380
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination
>Reporter: Wei Deng
>
> Cassandra currently lacks a sophisticated back pressure mechanism to prevent 
> clients ingesting data at too high throughput. One of the reasons why it 
> hasn't done so is because of its SEDA (Staged Event Driven Architecture) 
> design. With SEDA, an overloaded thread pool can drop those droppable 
> messages (in this case, MutationStage can drop mutation or counter mutation 
> messages) when they exceed the 2-second timeout. This can save the JVM from 
> running out of memory and crash. However, one downside from this kind of 
> load-shedding based backpressure approach is that increased number of dropped 
> mutations will increase the chance of inconsistency among replicas and will 
> likely require more repair (hints can help to some extent, but it's not 
> designed to cover all inconsistencies); another downside is that excessive 
> writes will also introduce much more pressure on compaction (especially LCS), 
>  and backlogged compaction will increase read latency and cause more frequent 
> GC pauses, and depending on the type of compaction, some backlog can take a 
> long time to clear up even after the write is removed. It seems that the 
> current load-shedding mechanism is not adequate to address a common bulk 
> loading scenario, where clients are trying to ingest data at highest 
> throughput possible. We need a more direct way to tell the client drivers to 
> slow down.
> It appears that HBase had suffered similar situation as discussed in 
> HBASE-5162, and they introduced some special exception type to tell the 
> client to slow down when a certain "overloaded" criteria is met. If we can 
> leverage a similar mechanism, our dropped mutation event can be used to 
> trigger such exceptions to push back on the client; at the same time, 
> backlogged compaction (when the number of pending compactions exceeds a 
> certain threshold) can also be used for the push back and this can prevent 
> vicious cycle mentioned in 
> https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11364) dtest failure in deletion_test.TestDeletion.gc_test

2016-03-25 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch resolved CASSANDRA-11364.

Resolution: Cannot Reproduce

> dtest failure in deletion_test.TestDeletion.gc_test
> ---
>
> Key: CASSANDRA-11364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11364
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: Russ Hatch
>  Labels: dtest
>
> This is one of those "Unable to connect" flaps:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/606/testReport/deletion_test/TestDeletion/gc_test
> It's the only failure that I've seen on this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11364) dtest failure in deletion_test.TestDeletion.gc_test

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212585#comment-15212585
 ] 

Russ Hatch commented on CASSANDRA-11364:


doesn't repro on 100 iterations, closing.

> dtest failure in deletion_test.TestDeletion.gc_test
> ---
>
> Key: CASSANDRA-11364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11364
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: Russ Hatch
>  Labels: dtest
>
> This is one of those "Unable to connect" flaps:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/606/testReport/deletion_test/TestDeletion/gc_test
> It's the only failure that I've seen on this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11395) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212546#comment-15212546
 ] 

Russ Hatch commented on CASSANDRA-11395:


trying out a fix on bulk run here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/46/

> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test
> ---
>
> Key: CASSANDRA-11395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11395
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: Russ Hatch
>  Labels: dtest
>
> {code}
> Expected [[0, ['foo', 'bar'], 'foobar']] from SELECT * FROM test, but got 
> [[0, [u'foi', u'bar'], u'foobar']]
> {code}
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/24/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/cas_and_list_index_test
> Failed on CassCI build upgrade_tests-all #24
> Probably a consistency issue in the test code, but I haven't looked into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-25 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212527#comment-15212527
 ] 

Robbie Strickland edited comment on CASSANDRA-9666 at 3/25/16 10:46 PM:


We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls through 
the cluster every few days. It does a great job keeping up after 6ish months of 
heavy pounding.





was (Author: rstrickland):
We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls
through the cluster every few days. It does a great job keeping up after
6ish months of heavy pounding.




> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated 

[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-25 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212527#comment-15212527
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls
through the cluster every few days. It does a great job keeping up after
6ish months of heavy pounding.




> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was 

[jira] [Commented] (CASSANDRA-11395) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212524#comment-15212524
 ] 

Russ Hatch commented on CASSANDRA-11395:


does look like a consistency issues:

in all three tests we're reading back at a low consistency which may not always 
be the inserted/updated data.


> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test
> ---
>
> Key: CASSANDRA-11395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11395
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: Russ Hatch
>  Labels: dtest
>
> {code}
> Expected [[0, ['foo', 'bar'], 'foobar']] from SELECT * FROM test, but got 
> [[0, [u'foi', u'bar'], u'foobar']]
> {code}
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/24/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/cas_and_list_index_test
> Failed on CassCI build upgrade_tests-all #24
> Probably a consistency issue in the test code, but I haven't looked into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-25 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212451#comment-15212451
 ] 

Robert Coli commented on CASSANDRA-9666:


My personal preferences, as someone who supports people confused by the 
configuration of DTCS, in descending order of preference :

1) Deprecate DTCS and merge TWCS, eventually removing DTCS.
2) Keep both DTCS and TWCS, but recommend the use of TWCS.
3) Keep both DTCS and TWCS, but make no statement regarding which one to use.
4) Only keep DTCS.

I have a meaningful preference for 1), and a meaningful anti-preference for 4).

I have heard no reports of cases where DTCS outperformed TWCS, or where an end 
user found TWCS to be more confusing to configure. But of course absence of 
evidence is not evidence of absence

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: 

[jira] [Comment Edited] (CASSANDRA-10528) Proposal: Integrate RxJava

2016-03-25 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212391#comment-15212391
 ] 

T Jake Luciani edited comment on CASSANDRA-10528 at 3/25/16 9:04 PM:
-

I've done some work trying out some ideas and here are my results.

Still only focused on RF=1 in memory reads. Just want to see how much faster 
they can be in existing codebase.

Test Setup:
  * 1 c3.8xlarge (16 "cores", 32g ram, tsc clock)
  * 6 m3.2xlarge stress nodes

I checked with perf and verified the max TX of the c* nodes was ~952Mb

Next for a baseline I hacked a version of trunk that simply returns a cached 
response from netty.
*With this cached netty response version I was able to max out the network of 
the C* node at 600k/sec.* 

*With trunk version I was able to get 240k/sec with a p99 latency of 37ms and a 
p999 of 67ms.*  

Next I tried a couple approaches to RxJava schedulers.

  * Wrote a busy spin event loop with thread affinity and the ability to route 
to a single core per token.  This approach required pinning the netty threads 
to specific cores and the rest of the cores to the other scheduler.  I think 
this approach worked best with 1/4 or the cores focused on netty work and rest 
on other work.
  * Used netty event loop as the RxJava scheduler loop with/without affinity.  
The loop is setup to process all the events for a given request on the same 
event loop.
  
*The Netty event loop ended up working best with -Dio.netty.eventLoopThread=32 
and I was able to see 288k/sec with p99 at 25ms and p999 at 49ms.*

*An increase of ~15% in throughput and tail latency > 25% improved. * 

I'm going to try running this on a beefier instance like an i2.8xlarge that has 
32 cores and 10g network



was (Author: tjake):
I've done some work trying out some ideas and here are my results.

Still only focused on RF=1 in memory reads. Just want to see how much faster 
they can be in existing codebase.

Test Setup:
  * 1 c3.8xlarge (16 "cores", 32g ram, tsc clock)
  * 6 m3.2xlarge stress nodes

I checked with perf and verified the max TX of the c* nodes was ~952Mb

Next for a baseline I hacked a version of trunk that simply returns a cached 
response from netty.
*With this cached netty response version I was able to max out the network of 
the C* node at 600k/sec.* 

*With trunk version I was able to get 240k/sec with a p99 latency of 37ms and a 
p999 of 67ms.*  

Next I tried a couple approaches to RxJava schedulers.

  * Wrote a busy spin event loop with thread affinity and the ability to route 
to a single core per token.  This approach required pinning the netty threads 
to specific cores and the rest of the cores to the other nodes.  I think this 
approach worked best with 1/4 or the cores focused on netty work and rest on 
other work.
  * Used netty event loop as the RxJava scheduler loop with/without affinity.  
The loop is setup to process all the events for a given request on the same 
event loop.
  
*The Netty event loop ended up working best with -Dio.netty.eventLoopThread=32 
and I was able to see 288k/sec with p99 at 25ms and p999 at 49ms.*

*An increase of ~15% in throughput and tail latency > 25% improved. * 

I'm going to try running this on a beefier instance like an i2.8xlarge that has 
32 cores and 10g network


> Proposal: Integrate RxJava
> --
>
> Key: CASSANDRA-10528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10528
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 3.x
>
> Attachments: rxjava-stress.png
>
>
> The purpose of this ticket is to discuss the merits of integrating the 
> [RxJava|https://github.com/ReactiveX/RxJava] framework into C*.  Enabling us 
> to incrementally make the internals of C* async and move away from SEDA to a 
> more modern thread per core architecture. 
> Related tickets:
>* CASSANDRA-8520
>* CASSANDRA-8457
>* CASSANDRA-5239
>* CASSANDRA-7040
>* CASSANDRA-5863
>* CASSANDRA-6696
>* CASSANDRA-7392
> My *primary* goals in raising this issue are to provide a way of:
> *  *Incrementally* making the backend async
> *  Avoiding code complexity/readability issues
> *  Avoiding NIH where possible
> *  Building on an extendable library
> My *non*-goals in raising this issue are:
> 
>* Rewrite the entire database in one big bang
>* Write our own async api/framework
> 
> -
> I've attempted to integrate RxJava a while back and found it not ready mainly 
> due to our lack of lambda support.  Now with Java 8 I've found it very 
> enjoyable and have not hit any performance issues. A gentle introduction to 
> RxJava is 

[jira] [Comment Edited] (CASSANDRA-10528) Proposal: Integrate RxJava

2016-03-25 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212391#comment-15212391
 ] 

T Jake Luciani edited comment on CASSANDRA-10528 at 3/25/16 8:58 PM:
-

I've done some work trying out some ideas and here are my results.

Still only focused on RF=1 in memory reads. Just want to see how much faster 
they can be in existing codebase.

Test Setup:
  * 1 c3.8xlarge (16 "cores", 32g ram, tsc clock)
  * 6 m3.2xlarge stress nodes

I checked with perf and verified the max TX of the c* nodes was ~952Mb

Next for a baseline I hacked a version of trunk that simply returns a cached 
response from netty.
*With this cached netty response version I was able to max out the network of 
the C* node at 600k/sec.* 

*With trunk version I was able to get 240k/sec with a p99 latency of 37ms and a 
p999 of 67ms.*  

Next I tried a couple approaches to RxJava schedulers.

  * Wrote a busy spin event loop with thread affinity and the ability to route 
to a single core per token.  This approach required pinning the netty threads 
to specific cores and the rest of the cores to the other nodes.  I think this 
approach worked best with 1/4 or the cores focused on netty work and rest on 
other work.
  * Used netty event loop as the RxJava scheduler loop with/without affinity.  
The loop is setup to process all the events for a given request on the same 
event loop.
  
*The Netty event loop ended up working best with -Dio.netty.eventLoopThread=32 
and I was able to see 288k/sec with p99 at 25ms and p999 at 49ms.*

*An increase of ~15% in throughput and tail latency > 25% improved. * 

I'm going to try running this on a beefier instance like an i2.8xlarge that has 
32 cores and 10g network



was (Author: tjake):
I've done some work trying out some ideas and here are my results.

Still only focused on RF=1 in memory reads. Just want to see how much faster 
they can be in existing codebase.

Test Setup:
  * 1 c3.8xlarge (16 "cores", 32g ram, tsc clock)
  * 6 m3.2xlarge stress nodes

I checked with perf and verified the max TX of the c* nodes was ~952Mb

Next for a baseline I hacked a version of trunk that simply returns a cached 
response from netty.
*With this cached netty response version I was able to max out the network of 
the C* node at 600k/sec.* 

*With trunk version I was able to get 240k/sec with a p99 latency of 37ms and a 
p999 of 67ms.*  

Next I tried a couple approaches to RxJava schedulers.

  * Wrote a busy spin event loop with thread affinity and the ability to route 
to a single core per token.  This approach required pinning the netty threads 
to specific cores and the rest of the cores to the other nodes.  I think this 
approach worked best with 1/4 or the cores focused on netty work and rest on 
other work.
  * Used netty event loop as the RxJava scheduler loop with/without affinity.  
The loop is setup to process all the events for a given request on the same 
event loop.
  
The Netty event loop ended up working best with -Dio.netty.eventLoopThread=32 
and I was able to see
288k/sec with p99 at 25ms and p999 at 49ms. 

* An increase of ~15% in throughput and tail latency > 25% improved. * 

I'm going to try running this on a beefier instance like an i2.8xlarge that has 
32 cores and 10g network


> Proposal: Integrate RxJava
> --
>
> Key: CASSANDRA-10528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10528
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 3.x
>
> Attachments: rxjava-stress.png
>
>
> The purpose of this ticket is to discuss the merits of integrating the 
> [RxJava|https://github.com/ReactiveX/RxJava] framework into C*.  Enabling us 
> to incrementally make the internals of C* async and move away from SEDA to a 
> more modern thread per core architecture. 
> Related tickets:
>* CASSANDRA-8520
>* CASSANDRA-8457
>* CASSANDRA-5239
>* CASSANDRA-7040
>* CASSANDRA-5863
>* CASSANDRA-6696
>* CASSANDRA-7392
> My *primary* goals in raising this issue are to provide a way of:
> *  *Incrementally* making the backend async
> *  Avoiding code complexity/readability issues
> *  Avoiding NIH where possible
> *  Building on an extendable library
> My *non*-goals in raising this issue are:
> 
>* Rewrite the entire database in one big bang
>* Write our own async api/framework
> 
> -
> I've attempted to integrate RxJava a while back and found it not ready mainly 
> due to our lack of lambda support.  Now with Java 8 I've found it very 
> enjoyable and have not hit any performance issues. A gentle introduction to 
> RxJava is 

[jira] [Commented] (CASSANDRA-10528) Proposal: Integrate RxJava

2016-03-25 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212391#comment-15212391
 ] 

T Jake Luciani commented on CASSANDRA-10528:


I've done some work trying out some ideas and here are my results.

Still only focused on RF=1 in memory reads. Just want to see how much faster 
they can be in existing codebase.

Test Setup:
  * 1 c3.8xlarge (16 "cores", 32g ram, tsc clock)
  * 6 m3.2xlarge stress nodes

I checked with perf and verified the max TX of the c* nodes was ~952Mb

Next for a baseline I hacked a version of trunk that simply returns a cached 
response from netty.
*With this cached netty response version I was able to max out the network of 
the C* node at 600k/sec.* 

*With trunk version I was able to get 240k/sec with a p99 latency of 37ms and a 
p999 of 67ms.*  

Next I tried a couple approaches to RxJava schedulers.

  * Wrote a busy spin event loop with thread affinity and the ability to route 
to a single core per token.  This approach required pinning the netty threads 
to specific cores and the rest of the cores to the other nodes.  I think this 
approach worked best with 1/4 or the cores focused on netty work and rest on 
other work.
  * Used netty event loop as the RxJava scheduler loop with/without affinity.  
The loop is setup to process all the events for a given request on the same 
event loop.
  
The Netty event loop ended up working best with -Dio.netty.eventLoopThread=32 
and I was able to see
288k/sec with p99 at 25ms and p999 at 49ms. 

* An increase of ~15% in throughput and tail latency > 25% improved. * 

I'm going to try running this on a beefier instance like an i2.8xlarge that has 
32 cores and 10g network


> Proposal: Integrate RxJava
> --
>
> Key: CASSANDRA-10528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10528
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 3.x
>
> Attachments: rxjava-stress.png
>
>
> The purpose of this ticket is to discuss the merits of integrating the 
> [RxJava|https://github.com/ReactiveX/RxJava] framework into C*.  Enabling us 
> to incrementally make the internals of C* async and move away from SEDA to a 
> more modern thread per core architecture. 
> Related tickets:
>* CASSANDRA-8520
>* CASSANDRA-8457
>* CASSANDRA-5239
>* CASSANDRA-7040
>* CASSANDRA-5863
>* CASSANDRA-6696
>* CASSANDRA-7392
> My *primary* goals in raising this issue are to provide a way of:
> *  *Incrementally* making the backend async
> *  Avoiding code complexity/readability issues
> *  Avoiding NIH where possible
> *  Building on an extendable library
> My *non*-goals in raising this issue are:
> 
>* Rewrite the entire database in one big bang
>* Write our own async api/framework
> 
> -
> I've attempted to integrate RxJava a while back and found it not ready mainly 
> due to our lack of lambda support.  Now with Java 8 I've found it very 
> enjoyable and have not hit any performance issues. A gentle introduction to 
> RxJava is [here|http://blog.danlew.net/2014/09/15/grokking-rxjava-part-1/] as 
> well as their 
> [wiki|https://github.com/ReactiveX/RxJava/wiki/Additional-Reading].  The 
> primary concept of RX is the 
> [Obervable|http://reactivex.io/documentation/observable.html] which is 
> essentially a stream of stuff you can subscribe to and act on, chain, etc. 
> This is quite similar to [Java 8 streams 
> api|http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html]
>  (or I should say streams api is similar to it).  The difference is java 8 
> streams can't be used for asynchronous events while RxJava can.
> Another improvement since I last tried integrating RxJava is the completion 
> of CASSANDRA-8099 which provides is a very iterable/incremental approach to 
> our storage engine.  *Iterators and Observables are well paired conceptually 
> so morphing our current Storage engine to be async is much simpler now.*
> In an effort to show how one can incrementally change our backend I've done a 
> quick POC with RxJava and replaced our non-paging read requests to become 
> non-blocking.
> https://github.com/apache/cassandra/compare/trunk...tjake:rxjava-3.0
> As you can probably see the code is straight-forward and sometimes quite nice!
> *Old*
> {code}
> private static PartitionIterator 
> fetchRows(List commands, ConsistencyLevel 
> consistencyLevel)
> throws UnavailableException, ReadFailureException, ReadTimeoutException
> {
> int cmdCount = commands.size();
> SinglePartitionReadLifecycle[] reads = new 
> 

[jira] [Created] (CASSANDRA-11440) consider LWT staleness warning when reading LWT back below serial consistency

2016-03-25 Thread Russ Hatch (JIRA)
Russ Hatch created CASSANDRA-11440:
--

 Summary: consider LWT staleness warning when reading LWT back 
below serial consistency
 Key: CASSANDRA-11440
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11440
 Project: Cassandra
  Issue Type: New Feature
Reporter: Russ Hatch
Priority: Minor


When reading a recently written LWT at cl.one, there is a potential for 
staleness. Users are encouraged to read LWT values back at SERIAL consistency 
level for events to remain linearizable.

If there's a low-impact way to warn about staleness, this could prevent people 
from shooting themselves in the foot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11380) Client visible backpressure mechanism

2016-03-25 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212353#comment-15212353
 ] 

Jeremy Hanna commented on CASSANDRA-11380:
--

Not to discourage work on this ticket at all, but one simple client mechanism, 
especially in bulk loading scenarios, is to set a slightly higher consistency 
level.  See also https://datastax-oss.atlassian.net/browse/SPARKC-262

> Client visible backpressure mechanism
> -
>
> Key: CASSANDRA-11380
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11380
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination
>Reporter: Wei Deng
>
> Cassandra currently lacks a sophisticated back pressure mechanism to prevent 
> clients ingesting data at too high throughput. One of the reasons why it 
> hasn't done so is because of its SEDA (Staged Event Driven Architecture) 
> design. With SEDA, an overloaded thread pool can drop those droppable 
> messages (in this case, MutationStage can drop mutation or counter mutation 
> messages) when they exceed the 2-second timeout. This can save the JVM from 
> running out of memory and crash. However, one downside from this kind of 
> load-shedding based backpressure approach is that increased number of dropped 
> mutations will increase the chance of inconsistency among replicas and will 
> likely require more repair (hints can help to some extent, but it's not 
> designed to cover all inconsistencies); another downside is that excessive 
> writes will also introduce much more pressure on compaction (especially LCS), 
>  and backlogged compaction will increase read latency and cause more frequent 
> GC pauses, and depending on the type of compaction, some backlog can take a 
> long time to clear up even after the write is removed. It seems that the 
> current load-shedding mechanism is not adequate to address a common bulk 
> loading scenario, where clients are trying to ingest data at highest 
> throughput possible. We need a more direct way to tell the client drivers to 
> slow down.
> It appears that HBase had suffered similar situation as discussed in 
> HBASE-5162, and they introduced some special exception type to tell the 
> client to slow down when a certain "overloaded" criteria is met. If we can 
> leverage a similar mechanism, our dropped mutation event can be used to 
> trigger such exceptions to push back on the client; at the same time, 
> backlogged compaction (when the number of pending compactions exceeds a 
> certain threshold) can also be used for the push back and this can prevent 
> vicious cycle mentioned in 
> https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11439) [windows] dtest failure in replication_test.SnitchConfigurationUpdateTest

2016-03-25 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212309#comment-15212309
 ] 

Philip Thompson commented on CASSANDRA-11439:
-

The tests in hintedhandoff_test.TestHintedHandoffConfig are failing for the 
same reason (line ending changes)

> [windows] dtest failure in replication_test.SnitchConfigurationUpdateTest
> -
>
> Key: CASSANDRA-11439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11439
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: DS Test Eng
>  Labels: dtest, windows
>
> Almost all of the SnitchConfigurationUpdateTests have begun failing on 
> windows, across all versions.
> I suspect this is due to the recent CCM changes to use universal_newlines in 
> nodetool output, which was needed for Python3 functionality. I assume the 
> tests' expected output of the CCM functions just needs to be updated.
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_dtest_win32/213/testReport/replication_test/SnitchConfigurationUpdateTest/test_failed_snitch_update_gossiping_property_file_snitch
> Failed on CassCI build cassandra-2.2_dtest_win32 #213



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11436) dtest failure in repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test

2016-03-25 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212292#comment-15212292
 ] 

Jim Witschey commented on CASSANDRA-11436:
--

Kickin' it to you [~jkni]; I don't see any problems in the test logic. It could 
be a range allocation problem, but I don't have much experience debugging 
those, so if you have a hunch, go for it.

> dtest failure in 
> repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test
> 
>
> Key: CASSANDRA-11436
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11436
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Joel Knighton
>  Labels: dtest
> Fix For: 2.1.x, 2.2.x
>
>
> sstable_repairedset_test is failing on 2.1 and 2.2, but not on trunk.
> In the final assertion, after running sstablemetadata on both nodes, we see 
> unrepaired sstables, when we expect all sstables to be repaired.
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/220/testReport/repair_tests.incremental_repair_test/TestIncRepair/sstable_repairedset_test
> Failed on CassCI build cassandra-2.1_novnode_dtest #220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11436) dtest failure in repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test

2016-03-25 Thread Jim Witschey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Witschey updated CASSANDRA-11436:
-
Assignee: Joel Knighton  (was: Jim Witschey)

> dtest failure in 
> repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test
> 
>
> Key: CASSANDRA-11436
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11436
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Joel Knighton
>  Labels: dtest
> Fix For: 2.1.x, 2.2.x
>
>
> sstable_repairedset_test is failing on 2.1 and 2.2, but not on trunk.
> In the final assertion, after running sstablemetadata on both nodes, we see 
> unrepaired sstables, when we expect all sstables to be repaired.
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/220/testReport/repair_tests.incremental_repair_test/TestIncRepair/sstable_repairedset_test
> Failed on CassCI build cassandra-2.1_novnode_dtest #220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11439) [windows] dtest failure in replication_test.SnitchConfigurationUpdateTest

2016-03-25 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11439:
---

 Summary: [windows] dtest failure in 
replication_test.SnitchConfigurationUpdateTest
 Key: CASSANDRA-11439
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11439
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


Almost all of the SnitchConfigurationUpdateTests have begun failing on windows, 
across all versions.

I suspect this is due to the recent CCM changes to use universal_newlines in 
nodetool output, which was needed for Python3 functionality. I assume the 
tests' expected output of the CCM functions just needs to be updated.

example failure:

http://cassci.datastax.com/job/cassandra-2.2_dtest_win32/213/testReport/replication_test/SnitchConfigurationUpdateTest/test_failed_snitch_update_gossiping_property_file_snitch

Failed on CassCI build cassandra-2.2_dtest_win32 #213



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11438) dtest failure in consistency_test.TestAccuracy.test_network_topology_strategy_users

2016-03-25 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11438:
---

 Summary: dtest failure in 
consistency_test.TestAccuracy.test_network_topology_strategy_users
 Key: CASSANDRA-11438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11438
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


This test and consistency_test.TestAvailability.test_network_topology_strategy 
have begun failing now that we dropped the instance size we run CI with. The 
tests should be altered to reflect the constrained resources. They are 
ambitious for dtests, regardless.

example failure:

http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/221/testReport/consistency_test/TestAccuracy/test_network_topology_strategy_users

Failed on CassCI build cassandra-2.1_novnode_dtest #221



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11438) dtest failure in consistency_test.TestAccuracy.test_network_topology_strategy_users

2016-03-25 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson reassigned CASSANDRA-11438:
---

Assignee: Philip Thompson  (was: DS Test Eng)

> dtest failure in 
> consistency_test.TestAccuracy.test_network_topology_strategy_users
> ---
>
> Key: CASSANDRA-11438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11438
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Philip Thompson
>  Labels: dtest
>
> This test and 
> consistency_test.TestAvailability.test_network_topology_strategy have begun 
> failing now that we dropped the instance size we run CI with. The tests 
> should be altered to reflect the constrained resources. They are ambitious 
> for dtests, regardless.
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/221/testReport/consistency_test/TestAccuracy/test_network_topology_strategy_users
> Failed on CassCI build cassandra-2.1_novnode_dtest #221



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11432) Counter values become under-counted when running repair.

2016-03-25 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu reassigned CASSANDRA-11432:
-

Assignee: Aleksey Yeschenko

> Counter values become under-counted when running repair.
> 
>
> Key: CASSANDRA-11432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11432
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
>Assignee: Aleksey Yeschenko
>
> We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have 6 
> nodes, across three different regions, and in each region, the replication 
> factor is 2. Basically, each nodes holds a full copy of the data.
> We are writing to cluster with CL = 2, and reading with CL = 1. 
> When are doing 30k/s counter increment/decrement per node, and at the 
> meanwhile, we are double writing to our mysql tier, so that we can measure 
> the accuracy of C* counter, compared to mysql.
> The experiment result was great at the beginning, the counter value in C* and 
> mysql are very close. The difference is less than 0.1%. 
> But when we start to run the repair on one node, the counter value in C* 
> become much less than the value in mysql,  the difference becomes larger than 
> 1%.
> My question is that is it a known problem that the counter value will become 
> under-counted if repair is running? Should we avoid running repair for 
> counter tables?
> Thanks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8777) Streaming operations should log both endpoint and port associated with the operation

2016-03-25 Thread Kaide Mu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212149#comment-15212149
 ] 

Kaide Mu commented on CASSANDRA-8777:
-

Patch available for review

> Streaming operations should log both endpoint and port associated with the 
> operation
> 
>
> Key: CASSANDRA-8777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8777
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeremy Hanna
>  Labels: lhf
> Fix For: 2.1.x
>
> Attachments: 8777-2.2.txt
>
>
> Currently we log the endpoint for a streaming operation.  If the port has 
> been overridden, it would be valuable to know that that setting is getting 
> picked up.  Therefore, when logging the endpoint address, it would be nice to 
> also log the port it's trying to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8777) Streaming operations should log both endpoint and port associated with the operation

2016-03-25 Thread Kaide Mu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaide Mu updated CASSANDRA-8777:

Status: Patch Available  (was: Open)

> Streaming operations should log both endpoint and port associated with the 
> operation
> 
>
> Key: CASSANDRA-8777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8777
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeremy Hanna
>  Labels: lhf
> Fix For: 2.1.x
>
> Attachments: 8777-2.2.txt
>
>
> Currently we log the endpoint for a streaming operation.  If the port has 
> been overridden, it would be valuable to know that that setting is getting 
> picked up.  Therefore, when logging the endpoint address, it would be nice to 
> also log the port it's trying to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8777) Streaming operations should log both endpoint and port associated with the operation

2016-03-25 Thread Kaide Mu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaide Mu updated CASSANDRA-8777:

Attachment: 8777-2.2.txt

> Streaming operations should log both endpoint and port associated with the 
> operation
> 
>
> Key: CASSANDRA-8777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8777
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeremy Hanna
>  Labels: lhf
> Fix For: 2.1.x
>
> Attachments: 8777-2.2.txt
>
>
> Currently we log the endpoint for a streaming operation.  If the port has 
> been overridden, it would be valuable to know that that setting is getting 
> picked up.  Therefore, when logging the endpoint address, it would be nice to 
> also log the port it's trying to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11437) Make number of cores used for copy tasks visible

2016-03-25 Thread Jim Witschey (JIRA)
Jim Witschey created CASSANDRA-11437:


 Summary: Make number of cores used for copy tasks visible
 Key: CASSANDRA-11437
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11437
 Project: Cassandra
  Issue Type: Bug
Reporter: Jim Witschey
Priority: Minor


As per this conversation with [~Stefania]:

https://github.com/riptano/cassandra-dtest/pull/869#issuecomment-200597829

we don't currently have a way to verify that the test environment variable 
{{CQLSH_COPY_TEST_NUM_CORES}} actually affects the behavior of {{COPY}} in the 
intended way. If this were added, we could make our tests of the one-core edge 
case a little stricter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11395) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test

2016-03-25 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch reassigned CASSANDRA-11395:
--

Assignee: Russ Hatch  (was: DS Test Eng)

> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD.cas_and_list_index_test
> ---
>
> Key: CASSANDRA-11395
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11395
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: Russ Hatch
>  Labels: dtest
>
> {code}
> Expected [[0, ['foo', 'bar'], 'foobar']] from SELECT * FROM test, but got 
> [[0, [u'foi', u'bar'], u'foobar']]
> {code}
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/24/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_2_1_UpTo_2_2_HEAD/cas_and_list_index_test
> Failed on CassCI build upgrade_tests-all #24
> Probably a consistency issue in the test code, but I haven't looked into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11364) dtest failure in deletion_test.TestDeletion.gc_test

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212036#comment-15212036
 ] 

Russ Hatch commented on CASSANDRA-11364:


bulk run here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/45/

> dtest failure in deletion_test.TestDeletion.gc_test
> ---
>
> Key: CASSANDRA-11364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11364
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: Russ Hatch
>  Labels: dtest
>
> This is one of those "Unable to connect" flaps:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/606/testReport/deletion_test/TestDeletion/gc_test
> It's the only failure that I've seen on this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11364) dtest failure in deletion_test.TestDeletion.gc_test

2016-03-25 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch reassigned CASSANDRA-11364:
--

Assignee: Russ Hatch  (was: DS Test Eng)

> dtest failure in deletion_test.TestDeletion.gc_test
> ---
>
> Key: CASSANDRA-11364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11364
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: Russ Hatch
>  Labels: dtest
>
> This is one of those "Unable to connect" flaps:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/606/testReport/deletion_test/TestDeletion/gc_test
> It's the only failure that I've seen on this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11225) dtest failure in consistency_test.TestAccuracy.test_simple_strategy_counters

2016-03-25 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212032#comment-15212032
 ] 

Russ Hatch commented on CASSANDRA-11225:


[~stefania_alborghetti] Any idea what could be going on with this test?

> dtest failure in consistency_test.TestAccuracy.test_simple_strategy_counters
> 
>
> Key: CASSANDRA-11225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11225
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/209/testReport/consistency_test/TestAccuracy/test_simple_strategy_counters
> Failed on CassCI build cassandra-2.1_novnode_dtest #209
> error: "AssertionError: Failed to read value from sufficient number of nodes, 
> required 2 but got 1 - [574, 2]"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11436) dtest failure in repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test

2016-03-25 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212016#comment-15212016
 ] 

Joel Knighton commented on CASSANDRA-11436:
---

To clarify, these are failing only on the novnode jobs for 2.1 and 2.2, it 
looks like.

I've been looking at sstablerepairedset lately, so feel free to assign to me if 
you can't find a test-based reason for this to fail only without vnodes.

> dtest failure in 
> repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test
> 
>
> Key: CASSANDRA-11436
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11436
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
> Fix For: 2.1.x, 2.2.x
>
>
> sstable_repairedset_test is failing on 2.1 and 2.2, but not on trunk.
> In the final assertion, after running sstablemetadata on both nodes, we see 
> unrepaired sstables, when we expect all sstables to be repaired.
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/220/testReport/repair_tests.incremental_repair_test/TestIncRepair/sstable_repairedset_test
> Failed on CassCI build cassandra-2.1_novnode_dtest #220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11436) dtest failure in repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test

2016-03-25 Thread Jim Witschey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Witschey reassigned CASSANDRA-11436:


Assignee: Jim Witschey  (was: DS Test Eng)

> dtest failure in 
> repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test
> 
>
> Key: CASSANDRA-11436
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11436
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
> Fix For: 2.1.x, 2.2.x
>
>
> sstable_repairedset_test is failing on 2.1 and 2.2, but not on trunk.
> In the final assertion, after running sstablemetadata on both nodes, we see 
> unrepaired sstables, when we expect all sstables to be repaired.
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/220/testReport/repair_tests.incremental_repair_test/TestIncRepair/sstable_repairedset_test
> Failed on CassCI build cassandra-2.1_novnode_dtest #220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11417) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc

2016-03-25 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211980#comment-15211980
 ] 

Jim Witschey commented on CASSANDRA-11417:
--

Trying to reproduce on dtest master here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/43/

Trying to verify fix from PR here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/44/

> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
> --
>
> Key: CASSANDRA-11417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11417
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
>
> Error is 
> {code}
> Unknown table 'rf_test' in keyspace 'testing'
> {code}
> Just seems like a schema disagreement problem. Presumably we just need to 
> have the driver block until schema agreement.
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/90/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc
> Failed on CassCI build trunk_offheap_dtest #90



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11435) PrepareCallback#response should not have DEBUG output

2016-03-25 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan reassigned CASSANDRA-11435:
---

Assignee: Jeremiah Jordan

> PrepareCallback#response should not have DEBUG output
> -
>
> Key: CASSANDRA-11435
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11435
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Jeremiah Jordan
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: CASSANDRA-11435-2.2.txt
>
>
> With the new debug logging from 
> https://issues.apache.org/jira/browse/CASSANDRA-10241?focusedCommentId=14934310=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934310
>  I think the following should probably be at TRACE not DEBUG.
> https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/paxos/PrepareCallback.java#L61-L61



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11435) PrepareCallback#response should not have DEBUG output

2016-03-25 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11435:

Status: Patch Available  (was: Open)

> PrepareCallback#response should not have DEBUG output
> -
>
> Key: CASSANDRA-11435
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11435
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Jeremiah Jordan
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: CASSANDRA-11435-2.2.txt
>
>
> With the new debug logging from 
> https://issues.apache.org/jira/browse/CASSANDRA-10241?focusedCommentId=14934310=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934310
>  I think the following should probably be at TRACE not DEBUG.
> https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/paxos/PrepareCallback.java#L61-L61



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11435) PrepareCallback#response should not have DEBUG output

2016-03-25 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11435:

Attachment: CASSANDRA-11435-2.2.txt

> PrepareCallback#response should not have DEBUG output
> -
>
> Key: CASSANDRA-11435
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11435
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: CASSANDRA-11435-2.2.txt
>
>
> With the new debug logging from 
> https://issues.apache.org/jira/browse/CASSANDRA-10241?focusedCommentId=14934310=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934310
>  I think the following should probably be at TRACE not DEBUG.
> https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/paxos/PrepareCallback.java#L61-L61



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11368) Lists inserts are not truly idempotent

2016-03-25 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11368:

Since Version: 1.2.0

> Lists inserts are not truly idempotent
> --
>
> Key: CASSANDRA-11368
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11368
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thanh
>
> List of UDT can't be updated properly when using USING TIMESTAMP
> Observe:
> {code}
> cqlsh:t360> CREATE TYPE fullname ( 
> ... fname text, 
> ... lname text 
> ... );
> cqlsh:t360> CREATE TABLE users ( 
> ... id text PRIMARY KEY, 
> ... names list>, 
> ... phone text 
> ... ); 
> cqlsh:t360> UPDATE users USING TIMESTAMP 1458019725701 SET names = [{ fname: 
> 'fname1', lname: 'lname1'},{ fname: 'fname2', lname: 'lname2'},{ fname: 
> 'fname3', lname: 'lname3'}] WHERE id='a'; 
> cqlsh:t360> select * from users;
> id | names | phone 
> +--+---
>  
> a | [{lname: 'lname1', fname: 'fname1'}, {lname: 'lname2', fname: 'fname2'}, 
> {lname: 'lname3', fname: 'fname3'}] | null
> (1 rows) 
> cqlsh:t360> UPDATE users USING TIMESTAMP 1458019725701 SET names = [{ fname: 
> 'fname1', lname: 'lname1'},{ fname: 'fname2', lname: 'lname2'},{ fname: 
> 'fname3', lname: 'lname3'}] WHERE id='a'; 
> cqlsh:t360> select * from users;
> id | names | phone 
> +--+---
>  
> a | [{lname: 'lname1', fname: 'fname1'}, {lname: 'lname2', fname: 'fname2'}, 
> {lname: 'lname3', fname: 'fname3'}, {lname: 'lname1', fname: 'fname1'}, 
> {lname: 'lname2', fname: 'fname2'}, {lname: 'lname3', fname: 'fname3'}] | null
> (1 rows)
> {code}
> => the list doesn't get replaced, it gets appended, which is not the 
> expected/desired result



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11394) dtest failure in upgrade_tests.upgrade_through_versions_test.ProtoV1Upgrade_2_1_UpTo_2_2_HEAD.bootstrap_multidc_test

2016-03-25 Thread Jim Witschey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Witschey reassigned CASSANDRA-11394:


Assignee: Jim Witschey  (was: DS Test Eng)

> dtest failure in 
> upgrade_tests.upgrade_through_versions_test.ProtoV1Upgrade_2_1_UpTo_2_2_HEAD.bootstrap_multidc_test
> 
>
> Key: CASSANDRA-11394
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11394
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
> Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, 
> node3.log, node3_debug.log, node4.log, node4_debug.log, node5.log, 
> node5_debug.log
>
>
> The other nodes fail to notice node5 come up. If we check the logs, they're 
> complaining about long interval times for that node. Might just be an issue 
> with how many nodes we're starting? I don't actually see any errors.
> Logs for this failure attached. This is flaky, and has happened on a few 
> bootstrap_multidc_tests
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/24/testReport/upgrade_tests.upgrade_through_versions_test/ProtoV1Upgrade_2_1_UpTo_2_2_HEAD/bootstrap_multidc_test
> Failed on CassCI build upgrade_tests-all #24



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11417) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc

2016-03-25 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211957#comment-15211957
 ] 

Jim Witschey commented on CASSANDRA-11417:
--

[filed a dtest PR|https://github.com/riptano/cassandra-dtest/pull/891].

> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
> --
>
> Key: CASSANDRA-11417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11417
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: DS Test Eng
>  Labels: dtest
>
> Error is 
> {code}
> Unknown table 'rf_test' in keyspace 'testing'
> {code}
> Just seems like a schema disagreement problem. Presumably we just need to 
> have the driver block until schema agreement.
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/90/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc
> Failed on CassCI build trunk_offheap_dtest #90



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11417) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc

2016-03-25 Thread Jim Witschey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Witschey reassigned CASSANDRA-11417:


Assignee: Jim Witschey  (was: DS Test Eng)

> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
> --
>
> Key: CASSANDRA-11417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11417
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
>
> Error is 
> {code}
> Unknown table 'rf_test' in keyspace 'testing'
> {code}
> Just seems like a schema disagreement problem. Presumably we just need to 
> have the driver block until schema agreement.
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/90/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc
> Failed on CassCI build trunk_offheap_dtest #90



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11436) dtest failure in repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test

2016-03-25 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11436:
---

 Summary: dtest failure in 
repair_tests.incremental_repair_test.TestIncRepair.sstable_repairedset_test
 Key: CASSANDRA-11436
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11436
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng
 Fix For: 2.1.x, 2.2.x


sstable_repairedset_test is failing on 2.1 and 2.2, but not on trunk.

In the final assertion, after running sstablemetadata on both nodes, we see 
unrepaired sstables, when we expect all sstables to be repaired.

example failure:

http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/220/testReport/repair_tests.incremental_repair_test/TestIncRepair/sstable_repairedset_test

Failed on CassCI build cassandra-2.1_novnode_dtest #220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11428) Eliminate Allocations

2016-03-25 Thread Nitsan Wakart (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitsan Wakart updated CASSANDRA-11428:
--
Attachment: pom.xml
benchmarks.tar.gz

Some before/after benchmarks for yer pleasure.
Patched versions of classes are hackishly bundled in and named *Patch.

> Eliminate Allocations
> -
>
> Key: CASSANDRA-11428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11428
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Nitsan Wakart
>Priority: Minor
> Fix For: 3.0.x
>
> Attachments: benchmarks.tar.gz, pom.xml
>
>
> Linking relevant issues under this master ticket.  For small changes I'd like 
> to test and commit these in bulk 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11435) PrepareCallback#response should not have DEBUG output

2016-03-25 Thread Jeremiah Jordan (JIRA)
Jeremiah Jordan created CASSANDRA-11435:
---

 Summary: PrepareCallback#response should not have DEBUG output
 Key: CASSANDRA-11435
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11435
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jeremiah Jordan
 Fix For: 2.2.x, 3.0.x, 3.x


With the new debug logging from 
https://issues.apache.org/jira/browse/CASSANDRA-10241?focusedCommentId=14934310=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934310
 I think the following should probably be at TRACE not DEBUG.

https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/paxos/PrepareCallback.java#L61-L61



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11279) dtest failure in disk_balance_test.TestDiskBalance.disk_balance_bootstrap_test

2016-03-25 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211855#comment-15211855
 ] 

Philip Thompson commented on CASSANDRA-11279:
-

[~krummas], even bumping it up to 1 million keys does not solve the problem. 
Not of the other disk balance tests flake at that level though:

http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/41/

> dtest failure in disk_balance_test.TestDiskBalance.disk_balance_bootstrap_test
> --
>
> Key: CASSANDRA-11279
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11279
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1011/testReport/disk_balance_test/TestDiskBalance/disk_balance_bootstrap_test
> Failed on CassCI build trunk_dtest #1011
> This looks likely to be a test issue, perhaps we need to relax the assertion 
> here a bit:
> {noformat}
> values not within 20.00% of the max: (474650, 382235, 513385) (node1)
> {noformat}
> This is flaky with several flaps in the last few weeks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-03-25 Thread Olivier Michallat (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211854#comment-15211854
 ] 

Olivier Michallat commented on CASSANDRA-11067:
---

It's not possible to bind LIKE's argument in a prepared statement (the grammar 
requires a string literal). Is this an oversight or do we have any reason not 
to allow it?

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
>  Labels: client-impacting
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8888) Disable internode compression by default

2016-03-25 Thread Kaide Mu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaide Mu updated CASSANDRA-:

Status: Patch Available  (was: Open)

Changed internode compression from all to dc

> Disable internode compression by default
> 
>
> Key: CASSANDRA-
> URL: https://issues.apache.org/jira/browse/CASSANDRA-
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Matt Stump
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: -3.0.txt
>
>
> Internode compression increases GC load, and can cause high CPU utilization 
> for high throughput use cases. Very rarely are customers restricted by 
> intra-DC or cross-DC network bandwidth. I'de rather we optimize for the 75% 
> of cases where internode compression isn't needed and then selectively enable 
> it for customers where it would provide a benefit. Currently I'm advising all 
> field consultants disable compression by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10134) Always require replace_address to replace existing address

2016-03-25 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211797#comment-15211797
 ] 

Sam Tunnicliffe commented on CASSANDRA-10134:
-

bq. If we expose "in a shadow round" in some form 
I think this is pretty simple to do. Instead of always dropping syns 
immediately whenever gossip is disabled, a node currently in a shadow round 
could respond with a minimal ack. The node receiving the syn can infer that the 
sender is in a shadow round itself by inspecting the syn, as a "real" syn will 
never have an empty digest list. So, the syn-receiving node can preserve 
current behaviour when the sender is not in a shadow round, but respond with 
the minimal ack when it is. When in a shadow round, a node can keep track of 
which seeds have replied to its syns with such a minimal ack, then the decision 
about exiting the round becomes whether any "genuine" ack was received (only 
one is required, as current behaviour) or whether a "shadow" ack was received 
from every seed. 

A brief experiment with this approach seems to suggest it's viable, Tyler's 
dtest passes and startup time for fresh clusters is minimally impacted. This 
doesn't add a great deal of complexity, so unless I overlooked something it 
seems like a reasonable idea.


> Always require replace_address to replace existing address
> --
>
> Key: CASSANDRA-10134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10134
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
> Fix For: 3.x
>
>
> Normally, when a node is started from a clean state with the same address as 
> an existing down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception 
> encountered during startup
> java.lang.RuntimeException: A node with address /127.0.0.3 already exists, 
> cancelling join. Use cassandra.replace_address if you want to replace this 
> node.
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) 
> [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed 
> list, it will not throw this error and will start normally.  The new node 
> then takes over the host ID of the old node (even if the tokens are 
> different), and the only message you will see is a warning in the other 
> nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, 
> hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information 
> for a down node without replacing it.  To fix this, we should check for an 
> endpoint collision even if {{auto_bootstrap}} is false or the node is a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8888) Disable internode compression by default

2016-03-25 Thread Kaide Mu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaide Mu updated CASSANDRA-:

Attachment: -3.0.txt

> Disable internode compression by default
> 
>
> Key: CASSANDRA-
> URL: https://issues.apache.org/jira/browse/CASSANDRA-
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Matt Stump
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: -3.0.txt
>
>
> Internode compression increases GC load, and can cause high CPU utilization 
> for high throughput use cases. Very rarely are customers restricted by 
> intra-DC or cross-DC network bandwidth. I'de rather we optimize for the 75% 
> of cases where internode compression isn't needed and then selectively enable 
> it for customers where it would provide a benefit. Currently I'm advising all 
> field consultants disable compression by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11421) Eliminate allocations of byte array for UTF8 String serializations

2016-03-25 Thread Nitsan Wakart (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211651#comment-15211651
 ] 

Nitsan Wakart commented on CASSANDRA-11421:
---

See last commit on branch which brings over the missing Netty util method.

> Eliminate allocations of byte array for UTF8 String serializations
> --
>
> Key: CASSANDRA-11421
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11421
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Nitsan Wakart
>Assignee: Nitsan Wakart
>
> When profiling a read workload (YCSB workload c) on Cassandra 3.2.1 I noticed 
> a large part of allocation profile was generated from String.getBytes() calls 
> on CBUtil::writeString
> I have fixed up the code to use a thread local cached ByteBuffer and 
> CharsetEncoder to eliminate the allocations. This results in improved 
> allocation profile, and a mild improvement in performance.
> The fix is available here:
> https://github.com/nitsanw/cassandra/tree/fix-write-string-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11434) Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting SA metadata per term

2016-03-25 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211624#comment-15211624
 ] 

Pavel Yaskevich commented on CASSANDRA-11434:
-

/cc [~jbellis] [~beobal]

> Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting 
> SA metadata per term
> --
>
> Key: CASSANDRA-11434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11434
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Pavel Yaskevich
>Assignee: Jordan West
> Fix For: 3.6
>
>
> We can support EQ/PREFIX requests to CONTAINS indexes by tracking 
> "partiality" of the data stored in the OnDiskIndex and IndexMemtable, if we 
> know exactly if current match represents part of the term or it's original 
> form it would be trivial to support EQ/PREFIX since PREFIX is subset of 
> SUFFIX matches.
> Since we attach uint16 size to each term stored we can take advantage of sign 
> bit so size of the index is not impacted at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11434) Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting SA metadata per term

2016-03-25 Thread Pavel Yaskevich (JIRA)
Pavel Yaskevich created CASSANDRA-11434:
---

 Summary: Support EQ/PREFIX queries in CONTAINS mode without 
tokenization by augmenting SA metadata per term
 Key: CASSANDRA-11434
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11434
 Project: Cassandra
  Issue Type: Improvement
  Components: sasi
Reporter: Pavel Yaskevich
Assignee: Jordan West
 Fix For: 3.6


We can support EQ/PREFIX requests to CONTAINS indexes by tracking "partiality" 
of the data stored in the OnDiskIndex and IndexMemtable, if we know exactly if 
current match represents part of the term or it's original form it would be 
trivial to support EQ/PREFIX since PREFIX is subset of SUFFIX matches.

Since we attach uint16 size to each term stored we can take advantage of sign 
bit so size of the index is not impacted at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11433) Allow to use different algorithms to choose compaction candidates.

2016-03-25 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-11433.
-
Resolution: Duplicate

> Allow to use different algorithms to choose compaction candidates.
> --
>
> Key: CASSANDRA-11433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11433
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>  Labels: performance
>
> I'm experimenting different algorithms for choosing the compaction 
> candidates, for leveled compaction. 
> Instead of the round robin selection by default, one of our experiment is to 
> choose the candidates with least overlapping sstable sizes. And I see 
> significant drop of cpu usage, and reduce of compacted bytes count, in one of 
> our write heavy system.
> I think it's useful to make it configurable for how to choose the compaction 
> candidates.
> The idea is from rocksdb. (http://rocksdb.org/blog/2921/compaction_pri/).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11433) Allow to use different algorithms to choose compaction candidates.

2016-03-25 Thread Dikang Gu (JIRA)
Dikang Gu created CASSANDRA-11433:
-

 Summary: Allow to use different algorithms to choose compaction 
candidates.
 Key: CASSANDRA-11433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11433
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
Reporter: Dikang Gu
Assignee: Dikang Gu


I'm experimenting different algorithms for choosing the compaction candidates, 
for leveled compaction. 

Instead of the round robin selection by default, one of our experiment is to 
choose the candidates with least overlapping sstable sizes. And I see 
significant drop of cpu usage, and reduce of compacted bytes count, in one of 
our write heavy system.

I think it's useful to make it configurable for how to choose the compaction 
candidates.

The idea is from rocksdb. (http://rocksdb.org/blog/2921/compaction_pri/).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)