[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292882#comment-15292882
 ] 

Alex Petrov commented on CASSANDRA-10786:
-

Looks like I have misunderstood the problem. I first thought more about the 
cluster upgrade. But this would certainly help with schema upgrade.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-20 Thread vin01 (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vin01 updated CASSANDRA-11845:
--
Attachment: cassandra-2.2.4.error.log

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
> Attachments: cassandra-2.2.4.error.log
>
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
>  399865/399865 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
>  149066/149066 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
>  126000/126000 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE

[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-20 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292934#comment-15292934
 ] 

vin01 commented on CASSANDRA-11845:
---

Thanks Paulo, i restarted all 3 nodes and started repair again and got the 
errors which i have attached. (cassandra-2.2.4.error.log)

Nodetool output for repair session :-

[2016-05-20 02:37:59,168] Repair session cffbadd3-1e55-11e6-bd05-b717b380ffdd 
for range (-8184117312116560831,-8171918810495776305] failed with error 
Endpoint /Node-3 died (progress: 100%)

.. still running 

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
> Attachments: cassandra-2.2.4.error.log
>
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
>  399865/399865 bytes(100%) s

[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

2016-05-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292991#comment-15292991
 ] 

Sylvain Lebresne commented on CASSANDRA-9669:
-

[~blambov] or [~benedict], would one of you have some time to look at [~beobal] 
patch above?

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> ---
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benedict
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: correctness
> Fix For: 2.2.7, 3.7, 3.0.7
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11824) If repair fails no way to run repair again

2016-05-20 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-11824:

Fix Version/s: 3.x
   2.2.x
   2.1.x

> If repair fails no way to run repair again
> --
>
> Key: CASSANDRA-11824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11824
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
>  Labels: fallout
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> I have a test that disables gossip and runs repair at the same time. 
> {quote}
> WARN  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> StorageService.java:384 - Stopping gossip by operator request
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> Gossiper.java:1463 - Announcing shutdown
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 
> StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown
> INFO  [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
> INFO  [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76
> INFO  [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting 
> repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting 
> repair command #2, repairing keyspace stresscql with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting 
> repair command #3, repairing keyspace system_traces with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2)
> {quote}
> This ends up failing:
> {quote}
> 16:54:44.844 INFO  serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] 
> Starting repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [172.31.24.76, 172.31.17.32]
> [2016-05-17 16:57:21,945] null
> {quote}
> Subsequent calls to repair with all nodes up still fails:
> {quote}
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 
> CompactionManager.java:1193 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - 
> Failed creating a merkle tree for [repair 
> #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1, 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-20 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291211#comment-15291211
 ] 

Stefan Podkowinski edited comment on CASSANDRA-11349 at 5/20/16 8:26 AM:
-

I've been debuging the latest mentioned error case using the following cql/ccm 
statements and a local 2 node cluster.

{code}
create keyspace ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
use ks;
CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float,
 PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'c';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'd';
ccm node1 flush
{code}

Timestamps have been added for easier tracking of the specific tombstone in the 
debugger.

ColmnIndex.Builder.buildForCompaction() will add tombstones in the following 
order to the tracker:

*Node1*

{{1463656272792: c1 = 'a' AND c2 = 'b'}}
First RT, added to unwritten + opened tombstones

{{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}}
Overshadowed by RT added before while being older at the same time. Will not be 
added and simply ignored.

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}}
Overshaded by first and only RT added to opened so far, but newer and will thus 
be added to unwritten+opened

We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for 
message digest.


*Node2*

{{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START)
First RT, added to unwritten + opened tombstones

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END)
comparision of EOC flag (Tracker:251) of previously added RT will cause having 
it removed from the opened list (Tracker:258). Afterwards the current RT will 
be added to unwritten + opened.

{{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color})
Gets compared with prev. added RT, which supersedes the current one and thus 
stays in the list. Will again be added to unwritten + opened list.

We end up with 3 unwritten RTs, including 1463656272792 twice.

-I still haven't been able to exactly pinpoint why the reducer will be called 
twice with the same TS, but since [~blambov] explicitly mentioned that 
possibility, I guess it's intended behavior (but why? :)).-

Running sstable2json makes it more obvious how node2 flushes the RTs:

{noformat}
[
{"key": "a",
 "cells": [["b:_","b:d:_",1463656272792,"t",1463731877],
   ["b:d:_","b:d:!",1463656272793,"t",1463731886],
   ["b:d:!","b:!",1463656272792,"t",1463731877]]}
]
{noformat}


was (Author: spo...@gmail.com):
I've been debuging the latest mentioned error case using the following cql/ccm 
statements and a local 2 node cluster.

{code}
create keyspace ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
use ks;
CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float,
 PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'c';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'd';
ccm node1 flush
{code}

Timestamps have been added for easier tracking of the specific tombstone in the 
debugger.

ColmnIndex.Builder.buildForCompaction() will add tombstones in the following 
order to the tracker:

*Node1*

{{1463656272792: c1 = 'a' AND c2 = 'b'}}
First RT, added to unwritten + opened tombstones

{{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}}
Overshadowed by RT added before while being older at the same time. Will not be 
added and simply ignored.

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}}
Overshaded by first and only RT added to opened so far, but newer and will thus 
be added to unwritten+opened

We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for 
message digest.


*Node2*

{{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START)
First RT, added to unwritten + opened tombstones

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END)
comparision of EOC flag (Tracker:251) of previously added RT will cause having 
it removed from the opened list (Tracker:258). Afterwards the current RT will 
be added to unwritten + opened.

{{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color})
Gets compared with prev. added RT, which supersedes the current one and thus 
stays in the list. Will again be added to unwritten + opened list.

We end up with 3 un

[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-20 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292934#comment-15292934
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/20/16 8:27 AM:


Thanks Paulo, i restarted all 3 nodes and started repair again and got the 
errors which i have attached. (cassandra-2.2.4.error.log)

Nodetool output for repair session :-

[2016-05-20 02:37:59,168] Repair session cffbadd3-1e55-11e6-bd05-b717b380ffdd 
for range (-8184117312116560831,-8171918810495776305] failed with error 
Endpoint /Node-3 died (progress: 100%)

.. still running  (compaction is going on)


was (Author: vin01):
Thanks Paulo, i restarted all 3 nodes and started repair again and got the 
errors which i have attached. (cassandra-2.2.4.error.log)

Nodetool output for repair session :-

[2016-05-20 02:37:59,168] Repair session cffbadd3-1e55-11e6-bd05-b717b380ffdd 
for range (-8184117312116560831,-8171918810495776305] failed with error 
Endpoint /Node-3 died (progress: 100%)

.. still running 

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
> Attachments: cassandra-2.2.4.error.log
>
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp

[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests

2016-05-20 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293001#comment-15293001
 ] 

Stefania commented on CASSANDRA-11521:
--

Thank you for your input [~snazy] and [~benedict].

bq. However these streams can be arbitrarily large, so certainly we don't want 
to evaluate the entire query to permit releasing the sstables.

Noted, I will keep only a small amount of buffers in memory. I kind of came to 
this conclusion after reading Robert's comment, the time-bound lifespan is an 
excellent idea, thank you for suggesting it.

bq. Note, that the OpOrder should not be used by these queries - actual 
references should be taken so that long lifespans have no impact.

You mean actual references to the sstables via something like 
CFS.selectAndReference? I don't understand how we can read from off-heap 
memtables without using OpOrder since their memory would be invalidated once 
they are flushed. Above I mentioned releasing sstables but I actually meant 
both sstables and memtables, do you think it's a bad idea to block memtable 
flushing?

bq. The code that takes these references really needs to be fixed, also, so 
that the races to update the data tracker don't cause temporary "infinite" 
loops - like we see for range queries today.

Sorry but I really cannot understand how PartitionRangeReadCommand.queryStorage 
may cause races when updating the data tracker view atomic reference (I presume 
this is what you meant).





> Implement streaming for bulk read requests
> --
>
> Key: CASSANDRA-11521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer 
> and eliminating the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-20 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293014#comment-15293014
 ] 

Sam Tunnicliffe commented on CASSANDRA-11731:
-

I think there are couple of issues regarding the coverage for CASSANDRA-11038. 

Firstly, {{NEW_NODE}} notifications are not always delivered when they should 
be, sometimes when a node is added only the {{UP}} event is fired. I suspect 
this might be caused by some raciness in setting the {{hostId}} for the new 
node which results in it not being null when checked in {{handleStateNormal}}. 
In fact, when replacing there will always be an existing endpoint for the 
{{hostId}}, so at the minimum we'd need to check that it didn't match 
{{endpoint}}. In my patch for 11038 I used {{TokenMetadata::isMember}} for the 
same purpose and that seems to give consistent behaviour. 

Second, my reading of CASSANDRA-8236 and the [native protocol 
spec|https://github.com/apache/cassandra/blob/cassandra-3.5/doc/native_protocol_v4.spec#L760-L765]
 suggest that {{NEW_NODE}} notifications *do* need to be delayed until the new 
node is in an RPC ready state as clients are permitted to interpret them as 
signal that the node is up and ready for connections. This is somewhat 
redundant as it means that the new node and first up notifications will always 
be delivered together, but we shouldn't arbitrarily change the behaviour that 
CASSANDRA-8236 introduced (aside from fixing the 11038 bug).

I'm planning to move 11038 to review today, I was just checking whether 
{{LatestEvent}} was still necessary before doing that but what I have so far is 
[here|https://github.com/beobal/cassandra/tree/11038-3.0].


> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions

2016-05-20 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293016#comment-15293016
 ] 

Branimir Lambov commented on CASSANDRA-11828:
-

Patch uploaded here:
|[2.2|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-2.2-rebased]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-2.2-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-2.2-rebased-dtest/]|
|[3.0|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-3.0-rebased]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-3.0-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-3.0-rebased-dtest/]|
|[trunk|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-rebased]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-rebased-dtest/]|

Changes the commit log segment dirty and clean tracking to intervals: one dirty 
interval per cf that covers the span of writes to that cf, and a set of clean 
intervals (which would normally be a single contiguous one). The segment is 
only discarded if the clean set completely covers the dirty interval; if a 
failed flush left a hole the segment will remain.

Sstables are also changed to track covered replay intervals so that compaction 
that includes a table flushed after a failed one doesn't obscure the unflushed 
region from the commit log replayer.

> Commit log needs to track unflushed intervals rather than positions
> ---
>
> Key: CASSANDRA-11828
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11828
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In CASSANDRA-11448 in an effort to give a more thorough handling of flush 
> errors I have introduced a possible correctness bug with disk failure policy 
> ignore if a flush fails with an error:
> - we report the error but continue
> - we correctly do not update the commit log with the flush position
> - but we allow the post-flush executor to resume
> - a successful later flush can thus move the log's clear position beyond the 
> data from the failed flush
> - the log will then delete segment(s) that contain unflushed data.
> After CASSANDRA-9669 it is relatively easy to fix this problem by making the 
> commit log track sets of intervals of unflushed data (as described in 
> CASSANDRA-8496).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions

2016-05-20 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-11828:

Status: Patch Available  (was: In Progress)

> Commit log needs to track unflushed intervals rather than positions
> ---
>
> Key: CASSANDRA-11828
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11828
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In CASSANDRA-11448 in an effort to give a more thorough handling of flush 
> errors I have introduced a possible correctness bug with disk failure policy 
> ignore if a flush fails with an error:
> - we report the error but continue
> - we correctly do not update the commit log with the flush position
> - but we allow the post-flush executor to resume
> - a successful later flush can thus move the log's clear position beyond the 
> data from the failed flush
> - the log will then delete segment(s) that contain unflushed data.
> After CASSANDRA-9669 it is relatively easy to fix this problem by making the 
> commit log track sets of intervals of unflushed data (as described in 
> CASSANDRA-8496).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

2016-05-20 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293021#comment-15293021
 ] 

Sam Tunnicliffe commented on CASSANDRA-9669:


To be clear, my patch addresses the deadlock during flush, but it seems that 
the regressions in {{CassandraIndexTest}} started when the initial patch was 
committed (at least I don't see any before then and they happen pretty 
regularly since) - 
http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_utest/747/

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> ---
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benedict
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: correctness
> Fix For: 2.2.7, 3.7, 3.0.7
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions

2016-05-20 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293016#comment-15293016
 ] 

Branimir Lambov edited comment on CASSANDRA-11828 at 5/20/16 8:53 AM:
--

Patch uploaded here:
|[2.2|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-2.2]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-2.2-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-2.2-dtest/]|
|[3.0|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-3.0]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-3.0-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-3.0-dtest/]|
|[trunk|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-dtest/]|

Changes the commit log segment dirty and clean tracking to intervals: one dirty 
interval per cf that covers the span of writes to that cf, and a set of clean 
intervals (which would normally be a single contiguous one). The segment is 
only discarded if the clean set completely covers the dirty interval; if a 
failed flush left a hole the segment will remain.

Sstables are also changed to track covered replay intervals so that compaction 
that includes a table flushed after a failed one doesn't obscure the unflushed 
region from the commit log replayer.


was (Author: blambov):
Patch uploaded here:
|[2.2|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-2.2-rebased]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-2.2-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-2.2-rebased-dtest/]|
|[3.0|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-3.0-rebased]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-3.0-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-3.0-rebased-dtest/]|
|[trunk|https://github.com/blambov/cassandra/tree/11828-cl-ss-intervals-rebased]|[utest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11828-cl-ss-intervals-rebased-dtest/]|

Changes the commit log segment dirty and clean tracking to intervals: one dirty 
interval per cf that covers the span of writes to that cf, and a set of clean 
intervals (which would normally be a single contiguous one). The segment is 
only discarded if the clean set completely covers the dirty interval; if a 
failed flush left a hole the segment will remain.

Sstables are also changed to track covered replay intervals so that compaction 
that includes a table flushed after a failed one doesn't obscure the unflushed 
region from the commit log replayer.

> Commit log needs to track unflushed intervals rather than positions
> ---
>
> Key: CASSANDRA-11828
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11828
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In CASSANDRA-11448 in an effort to give a more thorough handling of flush 
> errors I have introduced a possible correctness bug with disk failure policy 
> ignore if a flush fails with an error:
> - we report the error but continue
> - we correctly do not update the commit log with the flush position
> - but we allow the post-flush executor to resume
> - a successful later flush can thus move the log's clear position beyond the 
> data from the failed flush
> - the log will then delete segment(s) that contain unflushed data.
> After CASSANDRA-9669 it is relatively easy to fix this problem by making the 
> commit log track sets of intervals of unflushed data (as described in 
> CASSANDRA-8496).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-20 Thread Fabien Rousseau (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293041#comment-15293041
 ] 

Fabien Rousseau commented on CASSANDRA-11349:
-

[~blambov] We have 4 clusters impacted by this bug, and for 3 out of 4, what 
you have in mind works.
I still need to verify for the 4th one. I'll try to verify this today.
Regarding the 3.0, migrating 60 nodes is not something done easily.

[~spo...@gmail.com] Yes, there are 3 RT on node2, because, in memory, RT are 
stored in a RangeTombstoneList (then serialized). The RangeTombstoneList 
automatically split the tombstones which are overlapping.

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8496) Remove MemtablePostFlusher

2016-05-20 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293060#comment-15293060
 ] 

Branimir Lambov commented on CASSANDRA-8496:


I did not fully appreciate how much correctness relies on {{PostFlush}}. There 
are several aspects to letting flushes continue after a stuck/failed one:
- Commit log must track unflushed intervals to be aware if there's a flushing 
"hole", to avoid discarding the relevant segments. (1)
- Replay must be able to see such unflushed intervals, even after later flushes 
and compaction. (2)
- Snapshotting should be able to tell if there are unflushed intervals, i.e. 
holes in the snapshot. Should we error out if this is the case? (3)
- Truncation should be able to deal with a flush of old data completing after 
truncation. (4)
- We should be able to re-attempt flushes to solve transient failures (e.g. 
space ran out but was then made available). (5)

Bonus complication:
- There is potential replay order problem involving table metadata: if an 
incompatible change to a table's metadata was flushed, but older data remained 
unflushed, commit log replay will attempt to apply old data on new format which 
could fail, resulting in data loss on the node. (6)

Anything I'm missing?

CASSANDRA-9669 made sstables track intervals of replay positions and added 
replay machinery to replay unflushed intervals that may be earlier than the 
latest flush position. CASSANDRA-11828 expands on this to track sets of flushed 
intervals in commit log segments and sstables, which finishes (1) and (2). I am 
currently looking to address (5) which would enable (3) and (4) to see a 
successful flush round as a guarantee of lack of holes.

I am a little worried about (6). Is there anything I may be missing that could 
help with this scenario?


> Remove MemtablePostFlusher
> --
>
> Key: CASSANDRA-8496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8496
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>Priority: Minor
>
> To improve clearing of the CL, prevent infinite growth, and ensure the prompt 
> completion of tasks waiting on flush in the case of transient errors, large 
> flushes or slow disks, in 2.1 we could eliminate the post flusher altogether. 
> Since we now enforce that Memtables track contiguous ranges, a relatively 
> small change would permit Memtables to know the exact minimum as well as the 
> currently known exact maximum. The CL could easily track the total dirty 
> range, knowing that it must be contiguous, by using an AtomicLong instead of 
> an AtomicInteger, and tracking both the min/max seen, not just the max. The 
> only slight complexity will come in for tracking the _clean_ range as this 
> can now be non-contiguous, if there are 3 memtable flushes covering the same 
> CL segment, and one of them completes later. To solve this we can use an 
> interval tree since these operations are infrequent, so the extra overhead is 
> nominal. Once the interval tree completely overlaps the dirty range, we mark 
> the entire dirty range clean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions

2016-05-20 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-11828:

Reviewer: Sylvain Lebresne
  Status: Patch Available  (was: Open)

> Commit log needs to track unflushed intervals rather than positions
> ---
>
> Key: CASSANDRA-11828
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11828
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In CASSANDRA-11448 in an effort to give a more thorough handling of flush 
> errors I have introduced a possible correctness bug with disk failure policy 
> ignore if a flush fails with an error:
> - we report the error but continue
> - we correctly do not update the commit log with the flush position
> - but we allow the post-flush executor to resume
> - a successful later flush can thus move the log's clear position beyond the 
> data from the failed flush
> - the log will then delete segment(s) that contain unflushed data.
> After CASSANDRA-9669 it is relatively easy to fix this problem by making the 
> commit log track sets of intervals of unflushed data (as described in 
> CASSANDRA-8496).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11828) Commit log needs to track unflushed intervals rather than positions

2016-05-20 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-11828:

Status: Open  (was: Patch Available)

> Commit log needs to track unflushed intervals rather than positions
> ---
>
> Key: CASSANDRA-11828
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11828
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In CASSANDRA-11448 in an effort to give a more thorough handling of flush 
> errors I have introduced a possible correctness bug with disk failure policy 
> ignore if a flush fails with an error:
> - we report the error but continue
> - we correctly do not update the commit log with the flush position
> - but we allow the post-flush executor to resume
> - a successful later flush can thus move the log's clear position beyond the 
> data from the failed flush
> - the log will then delete segment(s) that contain unflushed data.
> After CASSANDRA-9669 it is relatively easy to fix this problem by making the 
> commit log track sets of intervals of unflushed data (as described in 
> CASSANDRA-8496).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11829) Cassandra 2.0.12 is not compatible with any 2.1 version

2016-05-20 Thread Shanmugaapriyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanmugaapriyan updated CASSANDRA-11829:

Environment: CentOS release 6.4 , C* 2.0.12  (was: Cassandra 2.0.12)

> Cassandra 2.0.12 is not compatible with any  2.1 version 
> -
>
> Key: CASSANDRA-11829
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11829
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS release 6.4 , C* 2.0.12
>Reporter: Shanmugaapriyan
>
> I have 8 C* 2.0.12 nodes ( DC1:4 and DC2:4 ) with the Replication Factor of 
> DC1:3 and DC2:3.
> In the process of upgrading my Cassandra Cluster, from 2.0.12 to 2.1.13, One 
> of the node in DC2 has been upgraded to 2.1.13 ( and  run upgradesstable ). 
> Data inserted using C* 2.0.12 nodes (as coordinator node ) are not replicated 
> to 2.1.13 nodes and vice versa. 
> Testing also done using C* 2.1.1 version 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-20 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293088#comment-15293088
 ] 

Stefania commented on CASSANDRA-11731:
--

Thank you for the patch Sam, it looks definitely better than what I had. I've 
tested it with my [dtest 
branch|https://github.com/stef1927/cassandra-dtest/commits/11731] and, at least 
locally, the tests passed. 

Let's wait until the patch is finalized and run the repetition tests then.

It would be good to upgrade to debug level the trace log messages in 
{{onTopologyChange}} and {{onStatusChange}}, at least temporarily whilst we run 
the repetition tests.



> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted

2016-05-20 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293104#comment-15293104
 ] 

Sam Tunnicliffe commented on CASSANDRA-11742:
-

bq.  Can you think of any reason we can't just call 
SystemKeyspace.persistLocalMetadata immediately after snapshotting the system 
keyspace in CassandraDaemon?
That sounds entirely reasonable to me.

> Failed bootstrap results in exception when node is restarted
> 
>
> Key: CASSANDRA-11742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a 
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system 
> keyspace files, but they couldn't be loaded!}} exception when the node is 
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I 
> know that the workaround is relatively easy, just delete the system keyspace 
> in the data folder on disk and try again, but its a bit annoying that you 
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has 
> been moved to just before the bootstrap begins (in 2.1 it was done much 
> earlier) and as a result its still in the memtable och commitlog if the 
> bootstrap failes. Still a few values is inserted to the {{system.local}} 
> table at an earlier point in the startup and they have been flushed from the 
> memtable to an sstable. When the node is restarted the 
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed 
> and therefore only see the sstable with an incomplete {{system.local}} table 
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in 
> the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11719) Add bind variables to trace

2016-05-20 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293159#comment-15293159
 ] 

Robert Stupp commented on CASSANDRA-11719:
--

Yes, this should go into trunk (it's a new feature).

> Add bind variables to trace
> ---
>
> Key: CASSANDRA-11719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11719
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Mahdi Mohammadi
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 11719-2.1.patch
>
>
> {{org.apache.cassandra.transport.messages.ExecuteMessage#execute}} mentions a 
> _TODO_ saying "we don't have [typed] access to CQL bind variables here".
> In fact, we now have access typed access to CQL bind variables there. So, it 
> is now possible to show the bind variables in the trace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11569) Track message latency across DCs

2016-05-20 Thread stone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293227#comment-15293227
 ] 

stone commented on CASSANDRA-11569:
---

I understand the latency calculate method now.what i still confuse is 
crossNodeLatency should be calculated when a message is writed,not calculate as 
time goes by.
for example,I have no workload,the metrics count should not increase,and the 
mean should keep same as no new data.

but actually when the crossDCLatency is large,about 45ms.the count never 
increase.

> Track message latency across DCs
> 
>
> Key: CASSANDRA-11569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11569
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: CASSANDRA-11569.patch, CASSANDRA-11569v2.txt, 
> nodeLatency.PNG
>
>
> Since we have the timestamp a message is created and when arrives, we can get 
> an approximate time it took relatively easy and would remove necessity for 
> more complex hacks to determine latency between DCs.
> Although is not going to be very meaningful when ntp is not setup, it is 
> pretty common to have NTP setup and even with clock drift nothing is really 
> hurt except the metric becoming whacky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11831) Ability to disable purgeable tombstone check via startup flag

2016-05-20 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-11831:

Fix Version/s: 3.x
   3.0.x
   2.2.x
   2.1.x
   Status: Patch Available  (was: Open)

patch to add a -Dcassandra.never_purge_tombstones parameter

||branch||testall||dtest||
|[marcuse/11831|https://github.com/krummas/cassandra/tree/marcuse/11831]|[testall|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-dtest]|
|[marcuse/11831-2.2|https://github.com/krummas/cassandra/tree/marcuse/11831-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-2.2-dtest]|
|[marcuse/11831-3.0|https://github.com/krummas/cassandra/tree/marcuse/11831-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-3.0-dtest]|
|[marcuse/11831-3.7|https://github.com/krummas/cassandra/tree/marcuse/11831-3.7]|[testall|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-3.7-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-3.7-dtest]|
|[marcuse/11831-trunk|https://github.com/krummas/cassandra/tree/marcuse/11831-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-trunk-dtest]|

> Ability to disable purgeable tombstone check via startup flag
> -
>
> Key: CASSANDRA-11831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11831
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Ryan Svihla
>Assignee: Marcus Eriksson
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> On Cassandra 2.1.14 hen a node gets way behind and has 10s of thousand 
> sstables it appears a lot of the CPU time is spent doing checks like this on 
> a call to getMaxPurgeableTimestamp 
>  org.apache.cassandra.utils.Murmur3BloomFilter.hash(java.nio.ByteBuffer, 
> int, int, long, long[]) @bci=13, line=57 (Compiled frame; information may be 
> imprecise)
> - org.apache.cassandra.utils.BloomFilter.indexes(java.nio.ByteBuffer) 
> @bci=22, line=82 (Compiled frame)
> - org.apache.cassandra.utils.BloomFilter.isPresent(java.nio.ByteBuffer) 
> @bci=2, line=107 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionController.maxPurgeableTimestamp(org.apache.cassandra.db.DecoratedKey)
>  @bci=89, line=186 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.getMaxPurgeableTimestamp()
>  @bci=21, line=99 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.access$300(org.apache.cassandra.db.compaction.LazilyCompactedRow)
>  @bci=1, line=49 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=241, line=296 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=1, line=206 (Compiled frame)
> - org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext() 
> @bci=44, line=206 (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=645 
> (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(java.util.Iterator)
>  @bci=1, line=166 (Compiled frame)
> - org.apache.cassandra.db.compaction.LazilyCompactedRow.write(long, 
> org.apache.cassandra.io.util.DataOutputPlus) @bci=52, line=121 (Compiled 
> frame)
> - 
> org.apache.cassandra.io.sstable.SSTableWriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=18, line=193 (Compiled frame)
> - 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=13, line=127 (Compiled frame)
> - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() 
> @bci=666, line=197 (Compiled frame)
> - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Compiled frame)
> - 
> org.apach

[jira] [Comment Edited] (CASSANDRA-11569) Track message latency across DCs

2016-05-20 Thread stone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293227#comment-15293227
 ] 

stone edited comment on CASSANDRA-11569 at 5/20/16 11:58 AM:
-

I understand the latency calculate method now.what i still confuse is 
crossNodeLatency should be calculated when a message is writed,not calculate as 
time goes by.
for example,I have no workload,the metrics count should not increase,and the 
mean should keep same as no new data.

but actually when the crossDCLatency is large,about 45ms.the count never 
increase.I know there may caused by misunderstanding node status.and I need to 
increase the connect tries to check the crossDC node,which may do an impact on 
performance


was (Author: stone):
I understand the latency calculate method now.what i still confuse is 
crossNodeLatency should be calculated when a message is writed,not calculate as 
time goes by.
for example,I have no workload,the metrics count should not increase,and the 
mean should keep same as no new data.

but actually when the crossDCLatency is large,about 45ms.the count never 
increase.

> Track message latency across DCs
> 
>
> Key: CASSANDRA-11569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11569
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: CASSANDRA-11569.patch, CASSANDRA-11569v2.txt, 
> nodeLatency.PNG
>
>
> Since we have the timestamp a message is created and when arrives, we can get 
> an approximate time it took relatively easy and would remove necessity for 
> more complex hacks to determine latency between DCs.
> Although is not going to be very meaningful when ntp is not setup, it is 
> pretty common to have NTP setup and even with clock drift nothing is really 
> hurt except the metric becoming whacky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11852) Feature request: Cluster metrics should include keyspace and tablename where possible.

2016-05-20 Thread Sucwinder Bassi (JIRA)
Sucwinder Bassi created CASSANDRA-11852:
---

 Summary: Feature request: Cluster metrics should include keyspace 
and tablename where possible.
 Key: CASSANDRA-11852
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11852
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sucwinder Bassi


For example, for read repair metrics

Metric name (from cluster): 
llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.RepairedBlocking.count

Proposed (added .): 
llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair...RepairedBlocking.count

It would be beneficial to add . where possible in 
the metrics as this will help narrow down the investigation and remove guess 
work as to which object is being impacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11824) If repair fails no way to run repair again

2016-05-20 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-11824:

Status: Patch Available  (was: Open)

> If repair fails no way to run repair again
> --
>
> Key: CASSANDRA-11824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11824
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
>  Labels: fallout
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> I have a test that disables gossip and runs repair at the same time. 
> {quote}
> WARN  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> StorageService.java:384 - Stopping gossip by operator request
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> Gossiper.java:1463 - Announcing shutdown
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 
> StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown
> INFO  [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
> INFO  [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76
> INFO  [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting 
> repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting 
> repair command #2, repairing keyspace stresscql with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting 
> repair command #3, repairing keyspace system_traces with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2)
> {quote}
> This ends up failing:
> {quote}
> 16:54:44.844 INFO  serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] 
> Starting repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [172.31.24.76, 172.31.17.32]
> [2016-05-17 16:57:21,945] null
> {quote}
> Subsequent calls to repair with all nodes up still fails:
> {quote}
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 
> CompactionManager.java:1193 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - 
> Failed creating a merkle tree for [repair 
> #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1, 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11853) Improve Cassandra-Stress latency measurement

2016-05-20 Thread Nitsan Wakart (JIRA)
Nitsan Wakart created CASSANDRA-11853:
-

 Summary: Improve Cassandra-Stress latency measurement
 Key: CASSANDRA-11853
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11853
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Nitsan Wakart


Currently CS reports latency using a sampling latency container and reporting 
service time (as opposed to response time from intended schedule) leading to 
coordinated omission.
Fixed here:
https://github.com/nitsanw/cassandra/tree/co-correction




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11853) Improve Cassandra-Stress latency measurement

2016-05-20 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-11853:
---
Fix Version/s: 3.x

> Improve Cassandra-Stress latency measurement
> 
>
> Key: CASSANDRA-11853
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11853
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Nitsan Wakart
>Assignee: Nitsan Wakart
> Fix For: 3.x
>
>
> Currently CS reports latency using a sampling latency container and reporting 
> service time (as opposed to response time from intended schedule) leading to 
> coordinated omission.
> Fixed here:
> https://github.com/nitsanw/cassandra/tree/co-correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11853) Improve Cassandra-Stress latency measurement

2016-05-20 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-11853:
---
Assignee: Nitsan Wakart

> Improve Cassandra-Stress latency measurement
> 
>
> Key: CASSANDRA-11853
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11853
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Nitsan Wakart
>Assignee: Nitsan Wakart
> Fix For: 3.x
>
>
> Currently CS reports latency using a sampling latency container and reporting 
> service time (as opposed to response time from intended schedule) leading to 
> coordinated omission.
> Fixed here:
> https://github.com/nitsanw/cassandra/tree/co-correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11853) Improve Cassandra-Stress latency measurement

2016-05-20 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-11853:
---
Reviewer: T Jake Luciani
  Status: Patch Available  (was: Open)

> Improve Cassandra-Stress latency measurement
> 
>
> Key: CASSANDRA-11853
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11853
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Nitsan Wakart
>Assignee: Nitsan Wakart
> Fix For: 3.x
>
>
> Currently CS reports latency using a sampling latency container and reporting 
> service time (as opposed to response time from intended schedule) leading to 
> coordinated omission.
> Fixed here:
> https://github.com/nitsanw/cassandra/tree/co-correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7190) Add schema to snapshot manifest

2016-05-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293466#comment-15293466
 ] 

Aleksey Yeschenko commented on CASSANDRA-7190:
--

bq. With the caveat that we can't express the drop informations with a CQL 
statements, and as we've noted, that's important for a snapshot. Unless we do 
add some kind of WITH TIMESTAMP to ALTER DROP that is (which I don't love-love 
but might be the most pragmatic answer in practice).

Yeah. Can't say I liked it when I proposed it first, or that I like it still, 
either. So either we do that, or, alternatively, we add a way to load the 
schema snapshot via other means (load JSON/YAML encoded full schema - including 
dropped columns - via JMX?, get it all applied in batch, then proceed with 
restoring the backups).

> Add schema to snapshot manifest
> ---
>
> Key: CASSANDRA-7190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> followup from CASSANDRA-6326



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11852) Feature request: Cluster metrics should include keyspace and tablename where possible.

2016-05-20 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293509#comment-15293509
 ] 

Chris Lohfink edited comment on CASSANDRA-11852 at 5/20/16 3:08 PM:


The table metrics do this in 
{{org.apache.cassandra.metrics:type=ColumnFamily,name=...}} or 
{{org.apache.cassandra.metrics:type=Table,name=...}} (newer versions) for 
metric that can be tracked per table easily. Read repair is one that could 
probably be tracked per table but is tracked globally. but for most part the 
global ones are that way due to some complication or oversight in the 
implementation.


was (Author: cnlwsu):
The table metrics do this in 
{{org.apache.cassandra.metrics:type=ColumnFamily,name=*}} or 
{{org.apache.cassandra.metrics:type=Table,name=*}} (newer versions) for metric 
that can be tracked per table easily. Read repair is one that could probably be 
tracked per table but is tracked globally. but for most part the global ones 
are that way due to some complication or oversight in the implementation.

> Feature request: Cluster metrics should include keyspace and tablename where 
> possible.
> --
>
> Key: CASSANDRA-11852
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11852
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sucwinder Bassi
>
> For example, for read repair metrics
> Metric name (from cluster): 
> llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.RepairedBlocking.count
> Proposed (added .): 
> llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.  Name>..RepairedBlocking.count
> It would be beneficial to add . where possible in 
> the metrics as this will help narrow down the investigation and remove guess 
> work as to which object is being impacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11806) sstableloader fails with "Remote peer failed stream session" on small table

2016-05-20 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-11806:

Assignee: Paulo Motta

> sstableloader fails with "Remote peer  failed stream session" on 
> small table
> 
>
> Key: CASSANDRA-11806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11806
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Linux ny2-proda-app01 3.13.0-86-generic #130-Ubuntu SMP 
> Mon Apr 18 18:27:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Don Branson
>Assignee: Paulo Motta
> Fix For: 3.5
>
> Attachments: roles_table.txt, roles_table_error.txt
>
>
> This error is with sstableloader loading a 2-column table with 20 rows. All 
> other tables in the keyspace load clean. The database dump is from cassandra 
> 2.1.9, the target is cassandra 3.5. 14 of the 20 rows load successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11852) Feature request: Cluster metrics should include keyspace and tablename where possible.

2016-05-20 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293509#comment-15293509
 ] 

Chris Lohfink edited comment on CASSANDRA-11852 at 5/20/16 3:09 PM:


The table metrics do this in 
{{org.apache.cassandra.metrics:type=ColumnFamily,name=...}} or 
{{org.apache.cassandra.metrics:type=Table,name=...}} (newer versions) for 
metric that can be tracked per table easily. 
ie:
{code}org.apache.cassandra.metrics:type=ColumnFamily,keyspace=system,scope=peers,name=ReadLatency
{code}

Read repair is one that could probably be tracked per table but is tracked 
globally. but for most part the global ones are that way due to some 
complication or oversight in the implementation.


was (Author: cnlwsu):
The table metrics do this in 
{{org.apache.cassandra.metrics:type=ColumnFamily,name=...}} or 
{{org.apache.cassandra.metrics:type=Table,name=...}} (newer versions) for 
metric that can be tracked per table easily. Read repair is one that could 
probably be tracked per table but is tracked globally. but for most part the 
global ones are that way due to some complication or oversight in the 
implementation.

> Feature request: Cluster metrics should include keyspace and tablename where 
> possible.
> --
>
> Key: CASSANDRA-11852
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11852
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sucwinder Bassi
>
> For example, for read repair metrics
> Metric name (from cluster): 
> llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.RepairedBlocking.count
> Proposed (added .): 
> llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.  Name>..RepairedBlocking.count
> It would be beneficial to add . where possible in 
> the metrics as this will help narrow down the investigation and remove guess 
> work as to which object is being impacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11852) Feature request: Cluster metrics should include keyspace and tablename where possible.

2016-05-20 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293509#comment-15293509
 ] 

Chris Lohfink commented on CASSANDRA-11852:
---

The table metrics do this in 
{{org.apache.cassandra.metrics:type=ColumnFamily,name=*}} or 
{{org.apache.cassandra.metrics:type=Table,name=*}} (newer versions) for metric 
that can be tracked per table easily. Read repair is one that could probably be 
tracked per table but is tracked globally. but for most part the global ones 
are that way due to some complication or oversight in the implementation.

> Feature request: Cluster metrics should include keyspace and tablename where 
> possible.
> --
>
> Key: CASSANDRA-11852
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11852
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sucwinder Bassi
>
> For example, for read repair metrics
> Metric name (from cluster): 
> llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.RepairedBlocking.count
> Proposed (added .): 
> llds.cassandra.llds_1_acc.cluster.dcr.lrv140gh.org.apache.cassandra.metrics.ReadRepair.  Name>..RepairedBlocking.count
> It would be beneficial to add . where possible in 
> the metrics as this will help narrow down the investigation and remove guess 
> work as to which object is being impacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-20 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293529#comment-15293529
 ] 

Joshua McKenzie commented on CASSANDRA-11818:
-

[~snazy]: what is the state of the node when you repro? It dies and we just 
don't catch / print out on the OOM?

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314)
>  ~[main/:na]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:619)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:676)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:612)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher$Flusher.run(Message.java:445)
>  ~[main/:na]
>   at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
> {code}
> The situation does not get better when the load driver is stopped.
> I can reproduce this scenario at will. Managed to get histogram, stack traces 
> and heap dump. Already increased {{-XX:MaxDirectMemorySize}} to {{2g}}.
> A {{nodetool flush}} causes the daemon to exit (as that direct-memory OOM is 
> caught by {{JVMStabilityInspector}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11820) Altering a column's type causes EOF

2016-05-20 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293538#comment-15293538
 ] 

Joshua McKenzie commented on CASSANDRA-11820:
-

What's the context of the EOF? Is the node left in a bad state into perpetuity 
from this operation or does it just cause problems w/the current session?

> Altering a column's type causes EOF
> ---
>
> Key: CASSANDRA-11820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11820
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Carl Yeksigian
> Fix For: 3.0.x, 3.x
>
>
> While working on CASSANDRA-10309, I was testing altering columns' types. This 
> series of operations fails:
> {code}
> CREATE TABLE test (a int PRIMARY KEY, b int)
> INSERT INTO test (a, b) VALUES (1, 1)
> ALTER TABLE test ALTER b TYPE BLOB
> SELECT * FROM test WHERE a = 1
> {code}
> Tried this on 3.0 and trunk, both fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11823) Creating a table leads to a race with GraphiteReporter

2016-05-20 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-11823:
-
Labels: lhf  (was: )

> Creating a table leads to a race with GraphiteReporter
> --
>
> Key: CASSANDRA-11823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11823
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
>Priority: Minor
>  Labels: lhf
>
> Happened only on 3/4 nodes out of 13.
> {code:xml}
> INFO  [MigrationStage:1] 2016-05-18 00:34:11,566 ColumnFamilyStore.java:381 - 
> Initializing schema.table
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-18 00:34:11,569 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) 
> ~[na:1.8.0_91]
>   at java.util.HashMap$KeyIterator.next(HashMap.java:1453) ~[na:1.8.0_91]
>   at 
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:690) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11689) dtest failures in internode_ssl_test tests

2016-05-20 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-11689:

Issue Type: Bug  (was: Test)

> dtest failures in internode_ssl_test tests
> --
>
> Key: CASSANDRA-11689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11689
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Russ Hatch
>Assignee: Joel Knighton
>  Labels: dtest
> Fix For: 3.x
>
>
> has happened a few times on trunk, two different tests:
> http://cassci.datastax.com/job/trunk_dtest/1179/testReport/internode_ssl_test/TestInternodeSSL/putget_with_internode_ssl_without_compression_test
> http://cassci.datastax.com/job/trunk_dtest/1169/testReport/internode_ssl_test/TestInternodeSSL/putget_with_internode_ssl_test/
> Failed on CassCI build trunk_dtest #1179



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9509) Streams throughput control

2016-05-20 Thread Alain RODRIGUEZ (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alain RODRIGUEZ updated CASSANDRA-9509:
---
Description: 
Currently, I have to keep tuning stream throughput all the time manually 
(through nodetool setstreamthroughput) since the same value stands for example 
for a decommission or a removenode. The point is in first case data goes from 1 
to N nodes (and is obviously limited by the node sending), in the second it 
goes from ALL to N nodes (N being number of nodes - 1). While removing a node 
with 'nodetool removenode', throughput limit will not be reached in most cases, 
and all the nodes will be under heavy load. So with the same value of stream 
throughput, we send N times faster on a removenode than using decommission to 
the nodes receiving the data. 

An other example is running repair. We have 20 nodes, taking 2+ days to repair 
data, and repair have to run within 10 days, can't be one at the time, and 
stream throughput needs to be adjusted accordingly.

Is there a way to:

- limit incoming streaming throughput on a node ?
- limit outgoing streaming speed, make sure all the nodes never send more than 
x Mbps per second to any other node?
- make streaming processes a background task (using remaining resources only, 
handle priority) ?

If none of those ideas are doable, can we imagine to dissociate stream 
throughputs depending on the operation '1 to many' and 'many to 1' 
(decommission, rebuild, bootstrap) AND 'N to N' (repairs, removenode), to 
configure them individually in cassandra.yaml ?

  was:
Currently, I have to keep tuning stream throughput all the time manually 
(through nodetool setstreamthroughput) since the same value stands for example 
for a decommission or a removenode (for exemple). The point is in first case 
Network goes from 1 --> N nodes (and is obviously limited by the node sending), 
in the second it is a N --> N nodes (N being number of remaining nodes). 
Removing node, throughput limit will not be reached in most cases, and all the 
nodes will be under heavy load. So with the same value of stream throughput, we 
send N times faster on a removenode than using decommission. 

An other exemple is repair is also faster as  more nodes start repairing (we 
have 20 nodes, taking 2+ days to repair data, and repair have to run within 10 
days, can't be one at the time, and stream throughput needs to be adjusted 
accordingly.

Is there a way to:

- limit incoming network on a node ?
- limit cluster wide sent network ?
- make streaming processes background task (using remaining resources) ? This 
looks harder to me since the bottleneck depends on the node hardware and the 
workload. It can be either the CPU, the network, the disk throughput or even 
the memory...  

If none of those ideas are doable, can we imagine to dissociate stream 
throughputs depending on the operation, to configure them individually in 
cassandra.yaml ?


> Streams throughput control
> --
>
> Key: CASSANDRA-9509
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9509
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Alain RODRIGUEZ
>Priority: Minor
>
> Currently, I have to keep tuning stream throughput all the time manually 
> (through nodetool setstreamthroughput) since the same value stands for 
> example for a decommission or a removenode. The point is in first case data 
> goes from 1 to N nodes (and is obviously limited by the node sending), in the 
> second it goes from ALL to N nodes (N being number of nodes - 1). While 
> removing a node with 'nodetool removenode', throughput limit will not be 
> reached in most cases, and all the nodes will be under heavy load. So with 
> the same value of stream throughput, we send N times faster on a removenode 
> than using decommission to the nodes receiving the data. 
> An other example is running repair. We have 20 nodes, taking 2+ days to 
> repair data, and repair have to run within 10 days, can't be one at the time, 
> and stream throughput needs to be adjusted accordingly.
> Is there a way to:
> - limit incoming streaming throughput on a node ?
> - limit outgoing streaming speed, make sure all the nodes never send more 
> than x Mbps per second to any other node?
> - make streaming processes a background task (using remaining resources only, 
> handle priority) ?
> If none of those ideas are doable, can we imagine to dissociate stream 
> throughputs depending on the operation '1 to many' and 'many to 1' 
> (decommission, rebuild, bootstrap) AND 'N to N' (repairs, removenode), to 
> configure them individually in cassandra.yaml ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11689) dtest failures in internode_ssl_test tests

2016-05-20 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-11689:

Fix Version/s: 3.x

> dtest failures in internode_ssl_test tests
> --
>
> Key: CASSANDRA-11689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11689
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Russ Hatch
>Assignee: Joel Knighton
>  Labels: dtest
> Fix For: 3.x
>
>
> has happened a few times on trunk, two different tests:
> http://cassci.datastax.com/job/trunk_dtest/1179/testReport/internode_ssl_test/TestInternodeSSL/putget_with_internode_ssl_without_compression_test
> http://cassci.datastax.com/job/trunk_dtest/1169/testReport/internode_ssl_test/TestInternodeSSL/putget_with_internode_ssl_test/
> Failed on CassCI build trunk_dtest #1179



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-20 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293563#comment-15293563
 ] 

Robert Stupp commented on CASSANDRA-11818:
--

The node no longer responds to any native protocol message. Restarting the 
native protocol listener doesn't help (as expected). JMX requests are still 
working.
I assume it just hangs in a loop trying to allocate more native memory, which 
fails (as in the jstack output).
It does not immediately die. But it does if C* itself requests more off heap 
memory (since that OOM is passed to JVMStabilityInspector).


> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314)
>  ~[main/:na]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:619)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:676)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:612)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher$Flusher.run(Message.java:445)
>  ~[main/:na]
>   at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
> {code}
> The situation does not get better when the load driver is stopped.
> I can reproduce this scenario at will. Managed to get histogram, stack traces 
> and heap dump. Already increased {{-XX:MaxDirectMemorySize}} to {{2g}}.
> A {{nodetool flush}} causes the daemon to exit (as that direct-m

[jira] [Commented] (CASSANDRA-11760) dtest failure in TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test

2016-05-20 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293578#comment-15293578
 ] 

Philip Thompson commented on CASSANDRA-11760:
-

Based on the run here, I believe this is fixed by your patch, yes.
http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/18/testReport/

> dtest failure in 
> TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test
> 
>
> Key: CASSANDRA-11760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Tyler Hobbs
>  Labels: dtest
> Fix For: 3.6
>
> Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, 
> node3.log, node3_debug.log
>
>
> example failure:
> http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/12/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_2_x_To_next_3_x/user_types_test/
> I've attached the logs. The test upgrades from 2.2.5 to 3.6. The relevant 
> failure stack trace extracted here:
> {code}
> ERROR [MessagingService-Incoming-/127.0.0.1] 2016-05-11 17:08:31,33
> 4 CassandraDaemon.java:185 - Exception in thread Thread[MessagingSe
> rvice-Incoming-/127.0.0.1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:99)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11760) dtest failure in TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test

2016-05-20 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson resolved CASSANDRA-11760.
-
Resolution: Duplicate

> dtest failure in 
> TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test
> 
>
> Key: CASSANDRA-11760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Tyler Hobbs
>  Labels: dtest
> Fix For: 3.6
>
> Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, 
> node3.log, node3_debug.log
>
>
> example failure:
> http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/12/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_2_x_To_next_3_x/user_types_test/
> I've attached the logs. The test upgrades from 2.2.5 to 3.6. The relevant 
> failure stack trace extracted here:
> {code}
> ERROR [MessagingService-Incoming-/127.0.0.1] 2016-05-11 17:08:31,33
> 4 CassandraDaemon.java:185 - Exception in thread Thread[MessagingSe
> rvice-Incoming-/127.0.0.1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:99)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11850) cannot use cql since upgrading python to 2.7.11+

2016-05-20 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-11850:
-
Labels: cqlsh  (was: )

> cannot use cql since upgrading python to 2.7.11+
> 
>
> Key: CASSANDRA-11850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11850
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Development
>Reporter: Andrew Madison
>  Labels: cqlsh
> Fix For: 3.5
>
>
> OS: Debian GNU/Linux stretch/sid 
> Kernel: 4.5.0-2-amd64 #1 SMP Debian 4.5.4-1 (2016-05-16) x86_64 GNU/Linux
> Python version: 2.7.11+ (default, May  9 2016, 15:54:33)
> [GCC 5.3.1 20160429]
> cqlsh --version: cqlsh 5.0.1
> cassandra -v: 3.5 (also occurs with 3.0.6)
> Issue:
> when running cqlsh, it returns the following error:
> cqlsh -u dbarpt_usr01
> Password: *
> Connection error: ('Unable to connect to any servers', {'odbasandbox1': 
> TypeError('ref() does not take keyword arguments',)})
> I cleared PYTHONPATH:
> python -c "import json; print dir(json); print json.__version__"
> ['JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', 
> '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', 
> '_default_decoder', '_default_encoder', 'decoder', 'dump', 'dumps', 
> 'encoder', 'load', 'loads', 'scanner']
> 2.0.9
> Java based clients can connect to Cassandra with no issue. Just CQLSH and 
> Python clients cannot.
> nodetool status also works.
> Thank you for your help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11850) cannot use cql since upgrading python to 2.7.11+

2016-05-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293591#comment-15293591
 ] 

Aleksey Yeschenko commented on CASSANDRA-11850:
---

[~aholmber] Are there any known issues with Python driver on 2.7.11+?

> cannot use cql since upgrading python to 2.7.11+
> 
>
> Key: CASSANDRA-11850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11850
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Development
>Reporter: Andrew Madison
>  Labels: cqlsh
> Fix For: 3.5
>
>
> OS: Debian GNU/Linux stretch/sid 
> Kernel: 4.5.0-2-amd64 #1 SMP Debian 4.5.4-1 (2016-05-16) x86_64 GNU/Linux
> Python version: 2.7.11+ (default, May  9 2016, 15:54:33)
> [GCC 5.3.1 20160429]
> cqlsh --version: cqlsh 5.0.1
> cassandra -v: 3.5 (also occurs with 3.0.6)
> Issue:
> when running cqlsh, it returns the following error:
> cqlsh -u dbarpt_usr01
> Password: *
> Connection error: ('Unable to connect to any servers', {'odbasandbox1': 
> TypeError('ref() does not take keyword arguments',)})
> I cleared PYTHONPATH:
> python -c "import json; print dir(json); print json.__version__"
> ['JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', 
> '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', 
> '_default_decoder', '_default_encoder', 'decoder', 'dump', 'dumps', 
> 'encoder', 'load', 'loads', 'scanner']
> 2.0.9
> Java based clients can connect to Cassandra with no issue. Just CQLSH and 
> Python clients cannot.
> nodetool status also works.
> Thank you for your help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7190) Add schema to snapshot manifest

2016-05-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293595#comment-15293595
 ] 

Sylvain Lebresne commented on CASSANDRA-7190:
-

I have to say that I'm warming up to that idea of a {{WITH TIMESTAMP}} to 
{{ALTER DROP}}. Or at least, I really prefer it over loading schema via JMX 
from some ad-hoc format. And who knows, this could prove to be useful in some 
other situations than snapshots (for instance this would give a way to get 
incrementally get rid of all the values of a column before a given date (by 
dropping with the timestamp and recreating), which is a tad hackish but could 
come in handy). 

> Add schema to snapshot manifest
> ---
>
> Key: CASSANDRA-7190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> followup from CASSANDRA-6326



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11569) Track message latency across DCs

2016-05-20 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293608#comment-15293608
 ] 

Chris Lohfink commented on CASSANDRA-11569:
---

Its not just mutations that this will use. Any message sent between nodes (ie 
gossip) will send a timestamp with it, this just reuses that.

> Track message latency across DCs
> 
>
> Key: CASSANDRA-11569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11569
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: CASSANDRA-11569.patch, CASSANDRA-11569v2.txt, 
> nodeLatency.PNG
>
>
> Since we have the timestamp a message is created and when arrives, we can get 
> an approximate time it took relatively easy and would remove necessity for 
> more complex hacks to determine latency between DCs.
> Although is not going to be very meaningful when ntp is not setup, it is 
> pretty common to have NTP setup and even with clock drift nothing is really 
> hurt except the metric becoming whacky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9530) SSTable corruption can trigger OOM

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293614#comment-15293614
 ] 

Alex Petrov edited comment on CASSANDRA-9530 at 5/20/16 4:09 PM:
-

bq. It also failed with an OOM here and here
Great :) that means it worked! 

We're catching {{EOF}} Exceptions now in several places (after changes):
{code}
org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:246)
{code}
and
{code}
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:513)
{code}

Although all these EOFs would also be caught given the value is sufficiently 
large to blow up the heap (with the new {{max_value_size_in_mb}}). 
While working on improvements, I have also discovered that yet another code 
path {{SStableScanner}} (used in for the partition range reads) was not tested. 
This code path also had out of boundary exceptions and was throwing similar 
EOFs. I wrote tests to cover both cases. 

I also wrote the test (not included into the patch) for early corrupting early 
opened compaction result, although I'm not sure how useful it is in general. 
Early opened file will get corrupted, although there's no way to catch this 
corruption until the next compaction or sstable read / scan occurs. Testing 
this is also non-trivial, as if we trigger compaction, it runs asyncronously, 
so the only reasonable way it is to create a rewriter manually and catch the 
moment when we have a new sstable in {{EARLY}} state. Since existing cases 
already do cover the situation, I decided to leave it out. I might have missed 
something, so please let me know.

cassandra.yaml changes are made according to your comments. 
BlackListingCompactionTest.java is adjusted to print the seed and write 
randomised values.
All invocations of {{AbstractType.readValue()}} are now capped to protect from 
OOM. A single instance of {{ByteBufferUtil.readWithShortLength}} is converted 
to {{skip}} in order to avoid allocating a throwaway buffer. I thought it's 
also worth a separate commit.

I'll be putting the tests on CI for a couple of dozen runs (even though I ran 
them locally a couple of hundred times already) to verify that nothing was 
accidentally forgotten.


was (Author: ifesdjeen):
> It also failed with an OOM here and here
Great :) that means it worked! 

We're catching {{EOF}} Exceptions now in several places (after changes):

{code}
org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:246)
{code}
and
{code}
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:513)
{code}

Although all these EOFs would also be caught given the value is sufficiently 
large to blow up the heap (with the new {{max_value_size_in_mb}}). 
While working on improvements, I have also discovered that yet another code 
path ({{SStableScanner}} (used in for the partition range reads) was not 
tested. This code path also had out of boundary exceptions and was throwing 
similar EOFs. I wrote tests to cover both cases. 

I also wrote the test (not included into the patch) for early corrupting early 
opened compaction result, although I'm not sure how useful it is in general. 
Early opened file will get corrupted, although there's no way to catch this 
corruption until the next compaction or sstable read / scan occurs. Testing 
this is also non-trivial, as if we trigger compaction, it runs asyncronously, 
so the only reasonable way it is to create a rewriter manually and catch the 
moment when we have a new sstable in {{EARLY}} state. Since existing cases 
already do cover the situation, I decided to leave it out. I might have missed 
something, so please let me know.

cassandra.yaml changes are made according to your comments. 
BlackListingCompactionTest.java is adjusted to print the seed and write 
randomised values.
All invocations of {{AbstractType.readValue()}} are now capped to protect from 
OOM. A single instance of {{ByteBufferUtil.readWithShortLength}} is converted 
to {{skip}} in order to avoid allocating a throwaway buffer. I thought it's 
also worth a separate commit.

I'll be putting the tests on CI for a couple of dozen runs (even though I ran 
them locally a couple of hundred times already) to verify that nothing was 
accidentally forgotten.

> SSTable corruption can trigger OOM
> --
>
> Key: CASSANDRA-9530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9530
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Alex Petrov
>
> If a sstable is corrupted so that the length of a given is bogus, we'll still 
> happily try to allocate a buffer of that bogus size to read the value, which 
> can easily lead to an OOM.
> We should probably protect against this. In practice, a given value can

[jira] [Commented] (CASSANDRA-9530) SSTable corruption can trigger OOM

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293614#comment-15293614
 ] 

Alex Petrov commented on CASSANDRA-9530:


> It also failed with an OOM here and here
Great :) that means it worked! 

We're catching {{EOF}} Exceptions now in several places (after changes):

{code}
org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:246)
{code}
and
{code}
org.apache.cassandra.db.ClusteringPrefix$Deserializer.deserializeOne(ClusteringPrefix.java:513)
{code}

Although all these EOFs would also be caught given the value is sufficiently 
large to blow up the heap (with the new {{max_value_size_in_mb}}). 
While working on improvements, I have also discovered that yet another code 
path ({{SStableScanner}} (used in for the partition range reads) was not 
tested. This code path also had out of boundary exceptions and was throwing 
similar EOFs. I wrote tests to cover both cases. 

I also wrote the test (not included into the patch) for early corrupting early 
opened compaction result, although I'm not sure how useful it is in general. 
Early opened file will get corrupted, although there's no way to catch this 
corruption until the next compaction or sstable read / scan occurs. Testing 
this is also non-trivial, as if we trigger compaction, it runs asyncronously, 
so the only reasonable way it is to create a rewriter manually and catch the 
moment when we have a new sstable in {{EARLY}} state. Since existing cases 
already do cover the situation, I decided to leave it out. I might have missed 
something, so please let me know.

cassandra.yaml changes are made according to your comments. 
BlackListingCompactionTest.java is adjusted to print the seed and write 
randomised values.
All invocations of {{AbstractType.readValue()}} are now capped to protect from 
OOM. A single instance of {{ByteBufferUtil.readWithShortLength}} is converted 
to {{skip}} in order to avoid allocating a throwaway buffer. I thought it's 
also worth a separate commit.

I'll be putting the tests on CI for a couple of dozen runs (even though I ran 
them locally a couple of hundred times already) to verify that nothing was 
accidentally forgotten.

> SSTable corruption can trigger OOM
> --
>
> Key: CASSANDRA-9530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9530
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Alex Petrov
>
> If a sstable is corrupted so that the length of a given is bogus, we'll still 
> happily try to allocate a buffer of that bogus size to read the value, which 
> can easily lead to an OOM.
> We should probably protect against this. In practice, a given value can be so 
> big since it's limited by the protocol frame size in the first place. Maybe 
> we could add a max_value_size_in_mb setting and we'd considered a sstable 
> corrupted if it was containing a value bigger than that.
> I'll note that this ticket would be a good occasion to improve 
> {{BlacklistingCompactionsTest}}. Typically, it currently generate empty 
> values which makes it pretty much impossible to get the problem described 
> here. And as described in CASSANDRA-9478, it also doesn't test properly for 
> thing like early opening of compaction results. We could try to randomize as 
> much of the parameters of this test as possible to make it more likely to 
> catch any type of corruption that could happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11831) Ability to disable purgeable tombstone check via startup flag

2016-05-20 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-11831:

Reviewer: Paulo Motta

> Ability to disable purgeable tombstone check via startup flag
> -
>
> Key: CASSANDRA-11831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11831
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Ryan Svihla
>Assignee: Marcus Eriksson
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> On Cassandra 2.1.14 hen a node gets way behind and has 10s of thousand 
> sstables it appears a lot of the CPU time is spent doing checks like this on 
> a call to getMaxPurgeableTimestamp 
>  org.apache.cassandra.utils.Murmur3BloomFilter.hash(java.nio.ByteBuffer, 
> int, int, long, long[]) @bci=13, line=57 (Compiled frame; information may be 
> imprecise)
> - org.apache.cassandra.utils.BloomFilter.indexes(java.nio.ByteBuffer) 
> @bci=22, line=82 (Compiled frame)
> - org.apache.cassandra.utils.BloomFilter.isPresent(java.nio.ByteBuffer) 
> @bci=2, line=107 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionController.maxPurgeableTimestamp(org.apache.cassandra.db.DecoratedKey)
>  @bci=89, line=186 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.getMaxPurgeableTimestamp()
>  @bci=21, line=99 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.access$300(org.apache.cassandra.db.compaction.LazilyCompactedRow)
>  @bci=1, line=49 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=241, line=296 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=1, line=206 (Compiled frame)
> - org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext() 
> @bci=44, line=206 (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=645 
> (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(java.util.Iterator)
>  @bci=1, line=166 (Compiled frame)
> - org.apache.cassandra.db.compaction.LazilyCompactedRow.write(long, 
> org.apache.cassandra.io.util.DataOutputPlus) @bci=52, line=121 (Compiled 
> frame)
> - 
> org.apache.cassandra.io.sstable.SSTableWriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=18, line=193 (Compiled frame)
> - 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=13, line=127 (Compiled frame)
> - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() 
> @bci=666, line=197 (Compiled frame)
> - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=6, line=73 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=2, line=59 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
>  @bci=125, line=264 (Compiled frame)
> - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
> - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
> - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Compiled frame)
> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> If we could at least on startup pass a flag like -DskipTombstonePurgeCheck so 
> we could in these particularly bad cases just avoid the calculation and merge 
> tables until we have less to worry about then restart the node with that flag 
> missing once we're down to a more manageable amount of sstables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11686) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc

2016-05-20 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch resolved CASSANDRA-11686.

Resolution: Fixed

> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
> --
>
> Key: CASSANDRA-11686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11686
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> intermittent failure. this test also fails on windows but looks to be for 
> another reason (CASSANDRA-11439)
> http://cassci.datastax.com/job/cassandra-3.0_dtest/682/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc/
> {noformat}
> Nodetool command '/home/automaton/cassandra/bin/nodetool -h localhost -p 7400 
> getendpoints testing rf_test dummy' failed; exit status: 1; stderr: nodetool: 
> Failed to connect to 'localhost:7400' - ConnectException: 'Connection 
> refused'.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11686) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc

2016-05-20 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293627#comment-15293627
 ] 

Russ Hatch commented on CASSANDRA-11686:


tests are set to be moved, resolving this.

> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
> --
>
> Key: CASSANDRA-11686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11686
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> intermittent failure. this test also fails on windows but looks to be for 
> another reason (CASSANDRA-11439)
> http://cassci.datastax.com/job/cassandra-3.0_dtest/682/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc/
> {noformat}
> Nodetool command '/home/automaton/cassandra/bin/nodetool -h localhost -p 7400 
> getendpoints testing rf_test dummy' failed; exit status: 1; stderr: nodetool: 
> Failed to connect to 'localhost:7400' - ConnectException: 'Connection 
> refused'.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11854) Remove finished streaming connections from MessagingService

2016-05-20 Thread Paulo Motta (JIRA)
Paulo Motta created CASSANDRA-11854:
---

 Summary: Remove finished streaming connections from 
MessagingService
 Key: CASSANDRA-11854
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11854
 Project: Cassandra
  Issue Type: Bug
Reporter: Paulo Motta
Assignee: Paulo Motta
 Attachments: oom.png

When a new {{IncomingStreamingConnection}} is created, [we register it in the 
connections 
map|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1109]
 of {{MessagingService}}, but we [only remove it if there is an 
exception|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/IncomingStreamingConnection.java#L83]
 while attaching the socket to the stream session.

On nodes with SSL and large number of vnodes, after many repair sessions these 
old connections can accumulate and cause OOM (heap dump attached).

The connection should be removed from the connections map after if it's 
finished in order to be garbage collected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11855) MessagingService#getCommandDroppedTasks should be displayed in netstats

2016-05-20 Thread Jeremiah Jordan (JIRA)
Jeremiah Jordan created CASSANDRA-11855:
---

 Summary: MessagingService#getCommandDroppedTasks should be 
displayed in netstats
 Key: CASSANDRA-11855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11855
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Jeremiah Jordan


MessagingService#getCommandDroppedTasks should be displayed in netstats along 
with the pending and completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11038) Is node being restarted treated as node joining?

2016-05-20 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-11038:

Fix Version/s: 3.x
   3.0.x
   2.2.x
   Status: Patch Available  (was: Open)


Pushed branches with fixes for 2.2/3.0/3.7/trunk - though the fix merges 
forward cleanly except for conflicts where I've cleaned up imports. Basically, 
these preserve the existing behaviour of delivering both {{NEW_NODE}} and 
{{UP}} events when a node first joins the cluster & of delaying both until 
after the node becomes available for clients. The erroneous {{NEW_NODE}} when a 
known node is restarted has been removed. The tracking of pushed notifications 
in {{EventNotifier}} is still necessary at the moment (because 
[reasons|https://issues.apache.org/jira/browse/CASSANDRA-7816?focusedCommentId=14346387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14346387]),
 but they will go away with CASSANDRA-9156. See CASSANDRA-11731 for some 
related discussion.

dtest branch [here|https://github.com/beobal/cassandra-dtest/tree/11038]

||branch||testall||dtest||
|[11038-2.2|https://github.com/beobal/cassandra/tree/11038-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-dtest]|
|[11038-3.0|https://github.com/beobal/cassandra/tree/11038-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-dtest]|
|[11038-3.7|https://github.com/beobal/cassandra/tree/11038-3.7]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-dtest]|
|[11038-trunk|https://github.com/beobal/cassandra/tree/11038-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-dtest]|

(so far I've only kicked off CI for the 2.2 branch, just in case there's some 
problem I didn't run into locally, will kick off the other jobs when that 
finishes).

> Is node being restarted treated as node joining?
> 
>
> Key: CASSANDRA-11038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: cheng ren
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> Hi, 
> What we found recently is that every time we restart a node, all other nodes 
> in the cluster treat the restarted node as a new node joining and issue node 
> joining notification to clients. We have traced the code path being hit when 
> a peer node detected a restarted node:
> src/java/org/apache/cassandra/gms/Gossiper.java
> {code}
> private void handleMajorStateChange(InetAddress ep, EndpointState epState)
> {
> if (!isDeadState(epState))
> {
> if (endpointStateMap.get(ep) != null)
> logger.info("Node {} has restarted, now UP", ep);
> else
> logger.info("Node {} is now part of the cluster", ep);
> }
> if (logger.isTraceEnabled())
> logger.trace("Adding endpoint state for " + ep);
> endpointStateMap.put(ep, epState);
> // the node restarted: it is up to the subscriber to take whatever 
> action is necessary
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onRestart(ep, epState);
> if (!isDeadState(epState))
> markAlive(ep, epState);
> else
> {
> logger.debug("Not marking " + ep + " alive due to dead state");
> markDead(ep, epState);
> }
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onJoin(ep, epState);
> }
> {code}
> subscriber.onJoin(ep, epState) ends up with calling onJoinCluster in 
> Server.java
> {code}
> src/java/org/apache/cassandra/transport/Server.java
> public void onJoinCluster(InetAddress endpoint)
> {
> server.connectionTracker.send(Event.TopologyChange.newNode(getRpcAddress(endpoint),
>  server.socket.getPort()));
> }
> {code}
> We have a full trace of code path and skip some intermedia function calls 
> here for being brief. 
> Upon receiving the node joining notification, clients would go and scan 
> system peer table to fetch the latest topology information. Since we have 
> tens of thousands of client connections, scans from all of them put an 
> enormous load to our cluster. 
> Although in 

[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-20 Thread Fabien Rousseau (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293709#comment-15293709
 ] 

Fabien Rousseau commented on CASSANDRA-11349:
-

Ok, it appears that the initial idea by [~blambov] is sufficient (after having 
done some basic testing for our 4th cluster).

Nevertheless, I'm surprised that we seems to be the only one affected by this 
issue. Maybe it's because it took us some time to realize it and investigate 
it, and there was no clear sign apart from big streams during repairs + data 
set size increasing too fast. So this may explain why not many people reported 
it, but there may be others affected out in the wild. That's why it's probably 
best to try to fix most of it (if it's not possible to fix it entirely), but I 
also understand that the less changes there are, the less risky it is...

So I'm good with it either partially fixed or mostly fixed. 

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-20 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293719#comment-15293719
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


I've now created a new patch version 
[here|https://github.com/spodkowinski/cassandra/commit/c8601f8cd3921e754bcbe8c9362cf3d2e7072e1e]
  that basically combines both of your ideas of doing the digest updates in the 
serializer and using {{RangeTombstonesList}} to normalize RT intervals. Tests 
look good, feel free to add your own. [~blambov], can you think of any further 
cases that would not be covered by this approach? 

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11038) Is node being restarted treated as node joining?

2016-05-20 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293680#comment-15293680
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-11038 at 5/20/16 5:08 PM:
--

Pushed branches with fixes for 2.2/3.0/3.7/trunk - though the fix merges 
forward cleanly except for conflicts where I've cleaned up imports. Basically, 
these preserve the existing behaviour of delivering both {{NEW_NODE}} and 
{{UP}} events when a node first joins the cluster & of delaying both until 
after the node becomes available for clients. The erroneous {{NEW_NODE}} when a 
known node is restarted has been removed. The tracking of pushed notifications 
in {{EventNotifier}} is still necessary at the moment (because 
[reasons|https://issues.apache.org/jira/browse/CASSANDRA-7816?focusedCommentId=14346387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14346387]),
 but they will go away with CASSANDRA-9156. See CASSANDRA-11731 for some 
related discussion.

dtest branch [here|https://github.com/beobal/cassandra-dtest/tree/11038]

||branch||testall||dtest||
|[11038-2.2|https://github.com/beobal/cassandra/tree/11038-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-dtest]|
|[11038-3.0|https://github.com/beobal/cassandra/tree/11038-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-dtest]|
|[11038-3.7|https://github.com/beobal/cassandra/tree/11038-3.7]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-dtest]|
|[11038-trunk|https://github.com/beobal/cassandra/tree/11038-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-dtest]|

(so far I've only kicked off CI for the 2.2 branch, just in case there's some 
problem I didn't run into locally, will kick off the other jobs when that 
finishes).

edit: pushed an additional commit to the 2.2 branch as I forgot to switch to 
java 7 during dev and accidentally included an 8ism.


was (Author: beobal):

Pushed branches with fixes for 2.2/3.0/3.7/trunk - though the fix merges 
forward cleanly except for conflicts where I've cleaned up imports. Basically, 
these preserve the existing behaviour of delivering both {{NEW_NODE}} and 
{{UP}} events when a node first joins the cluster & of delaying both until 
after the node becomes available for clients. The erroneous {{NEW_NODE}} when a 
known node is restarted has been removed. The tracking of pushed notifications 
in {{EventNotifier}} is still necessary at the moment (because 
[reasons|https://issues.apache.org/jira/browse/CASSANDRA-7816?focusedCommentId=14346387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14346387]),
 but they will go away with CASSANDRA-9156. See CASSANDRA-11731 for some 
related discussion.

dtest branch [here|https://github.com/beobal/cassandra-dtest/tree/11038]

||branch||testall||dtest||
|[11038-2.2|https://github.com/beobal/cassandra/tree/11038-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-dtest]|
|[11038-3.0|https://github.com/beobal/cassandra/tree/11038-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-dtest]|
|[11038-3.7|https://github.com/beobal/cassandra/tree/11038-3.7]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-dtest]|
|[11038-trunk|https://github.com/beobal/cassandra/tree/11038-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-dtest]|

(so far I've only kicked off CI for the 2.2 branch, just in case there's some 
problem I didn't run into locally, will kick off the other jobs when that 
finishes).

> Is node being restarted treated as node joining?
> 
>
> Key: CASSANDRA-11038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: cheng ren
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> Hi, 
> What

[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-20 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293734#comment-15293734
 ] 

Philip Thompson commented on CASSANDRA-11731:
-

Testing with Sam's branch here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/111/

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293748#comment-15293748
 ] 

Alex Petrov commented on CASSANDRA-4:
-

I've added a little [unit 
test|https://github.com/ifesdjeen/cassandra/tree/4-trunk] to bruteforce 
through all primitive types. It's a little bit more lightweight since it 
doesn't go through the whole pipeline, although it verifies and yields same 
exact results as ones listed by you.

Should I combine our patches [~giampaolo], WDYT?

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11850) cannot use cql since upgrading python to 2.7.11+

2016-05-20 Thread Adam Holmberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293737#comment-15293737
 ] 

Adam Holmberg commented on CASSANDRA-11850:
---

No issues with released version of Python. This is a known change with dev 
versions:
https://github.com/datastax/python-driver/pull/585


> cannot use cql since upgrading python to 2.7.11+
> 
>
> Key: CASSANDRA-11850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11850
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Development
>Reporter: Andrew Madison
>  Labels: cqlsh
> Fix For: 3.5
>
>
> OS: Debian GNU/Linux stretch/sid 
> Kernel: 4.5.0-2-amd64 #1 SMP Debian 4.5.4-1 (2016-05-16) x86_64 GNU/Linux
> Python version: 2.7.11+ (default, May  9 2016, 15:54:33)
> [GCC 5.3.1 20160429]
> cqlsh --version: cqlsh 5.0.1
> cassandra -v: 3.5 (also occurs with 3.0.6)
> Issue:
> when running cqlsh, it returns the following error:
> cqlsh -u dbarpt_usr01
> Password: *
> Connection error: ('Unable to connect to any servers', {'odbasandbox1': 
> TypeError('ref() does not take keyword arguments',)})
> I cleared PYTHONPATH:
> python -c "import json; print dir(json); print json.__version__"
> ['JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', 
> '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', 
> '_default_decoder', '_default_encoder', 'decoder', 'dump', 'dumps', 
> 'encoder', 'load', 'loads', 'scanner']
> 2.0.9
> Java based clients can connect to Cassandra with no issue. Just CQLSH and 
> Python clients cannot.
> nodetool status also works.
> Thank you for your help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11831) Ability to disable purgeable tombstone check via startup flag

2016-05-20 Thread Wei Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293756#comment-15293756
 ] 

Wei Deng commented on CASSANDRA-11831:
--

I tried to reproduce this problem using an artificially created "excessive 
amount of SSTable" environment. It is a one-node cluster on GCE with modest 
amount of hardware resource (8GB RAM + 2 cores) running C* 2.1.14. To allow 
tiny SSTables to accumulate I used "nodetool disableautocompaction" to 
temporarily disable compaction when I'm inserting my test data; from the same 
python code doing the insert I also perform a system call "nodetool flush" 
after every insert, so that the amount of sstables can quickly grow to a number 
that's usually hard to get on a tiny node. The table I'm inserting into has the 
following schema so we know it will create some range tombstone markers for 
every insert of the collection cells.

{noformat}
CREATE TABLE amazon.metadata (
asin text PRIMARY KEY,
also_bought set,
buy_after_viewing set,
categories set,
imurl text,
price double,
title text
);
{noformat}

Once I got the amount of SSTable grew to 7000+, I started JFR to capture the 
JVM recording and start the compaction by running "nodetool 
enableautocompaction". By analyzing the JFR recording I was able to find 
getMaxPurgeableTimestamp() call among the hottest methods for the compaction 
thread.

What's interesting is that I also noticed another compaction eating a full CPU 
core even after the LCS compaction on amazon.metadata table finished. Turns out 
system.sstable_activity is also going through a lot of compaction activities on 
its own because it is supposed to be flushed every time a compaction finishes, 
so it too generated a lot of tiny SSTables (after enableautocompaction gets 
called). Even though system.sstable_activity is using STCS, it still suffered 
from the same expensive getMaxPurgeableTimestamp() function call and took a 
while to finish. This seems to indicate the problem discussed in this JIRA is 
not LCS only, but generally applicable to situations where you have thousands 
of tiny SSTables.

IMHO this exercise also prompts us that we may want to consider adding some 
dtest to cover the scenario when we have thousands of tiny SSTables, no matter 
LCS or STCS is used. This can also be caused by streaming activities like 
repair or bootstrapping.

> Ability to disable purgeable tombstone check via startup flag
> -
>
> Key: CASSANDRA-11831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11831
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Ryan Svihla
>Assignee: Marcus Eriksson
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> On Cassandra 2.1.14 hen a node gets way behind and has 10s of thousand 
> sstables it appears a lot of the CPU time is spent doing checks like this on 
> a call to getMaxPurgeableTimestamp 
>  org.apache.cassandra.utils.Murmur3BloomFilter.hash(java.nio.ByteBuffer, 
> int, int, long, long[]) @bci=13, line=57 (Compiled frame; information may be 
> imprecise)
> - org.apache.cassandra.utils.BloomFilter.indexes(java.nio.ByteBuffer) 
> @bci=22, line=82 (Compiled frame)
> - org.apache.cassandra.utils.BloomFilter.isPresent(java.nio.ByteBuffer) 
> @bci=2, line=107 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionController.maxPurgeableTimestamp(org.apache.cassandra.db.DecoratedKey)
>  @bci=89, line=186 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.getMaxPurgeableTimestamp()
>  @bci=21, line=99 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.access$300(org.apache.cassandra.db.compaction.LazilyCompactedRow)
>  @bci=1, line=49 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=241, line=296 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=1, line=206 (Compiled frame)
> - org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext() 
> @bci=44, line=206 (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=645 
> (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(java.util.Iterator)
>  @bci=1, line=166 (Compiled frame)
> - org.apache.cas

[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

2016-05-20 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293760#comment-15293760
 ] 

Branimir Lambov commented on CASSANDRA-9669:


+1 on Sam's patch

I will look into the test failures next week.

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> ---
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benedict
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: correctness
> Fix For: 2.2.7, 3.7, 3.0.7
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11856) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.list_item_conditional_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11856:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.list_item_conditional_test
 Key: CASSANDRA-11856
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11856
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x/list_item_conditional_test

Failed on CassCI build upgrade_tests-all #46

On first glance, I believe this is a test issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11856) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.list_item_conditional_test

2016-05-20 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293776#comment-15293776
 ] 

Philip Thompson commented on CASSANDRA-11856:
-

Scanning the test, I believe it's not being properly version gated, definitely 
not a C* issue.

> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.list_item_conditional_test
> -
>
> Key: CASSANDRA-11856
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11856
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: DS Test Eng
>  Labels: dtest
>
> {{}}
> example failure:
> http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x/list_item_conditional_test
> Failed on CassCI build upgrade_tests-all #46
> On first glance, I believe this is a test issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11831) Ability to disable purgeable tombstone check via startup flag

2016-05-20 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293784#comment-15293784
 ] 

Paulo Motta commented on CASSANDRA-11831:
-

LGTM, just two minor nits:
* On the unit tests, it seems you're setting 
{{cassandra.never_purge_tombstones}} globally via system properties, won't this 
propagate to other tests? Perhaps it's better to set the property directly 
during these tests on {{CompactionController}} ? Maybe that's why some testing 
are failing on 
[trunk|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-trunk-testall/lastCompletedBuild/testReport/]
 ?
* To avoid doing useless tombstone compactions that won't drop tombstones, 
should we maybe also disable tombstone compactions on 
{{AbstractCompactionStrategy}} if {{cassandra.never_purge_tombstones=true}} ?

> Ability to disable purgeable tombstone check via startup flag
> -
>
> Key: CASSANDRA-11831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11831
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Ryan Svihla
>Assignee: Marcus Eriksson
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> On Cassandra 2.1.14 hen a node gets way behind and has 10s of thousand 
> sstables it appears a lot of the CPU time is spent doing checks like this on 
> a call to getMaxPurgeableTimestamp 
>  org.apache.cassandra.utils.Murmur3BloomFilter.hash(java.nio.ByteBuffer, 
> int, int, long, long[]) @bci=13, line=57 (Compiled frame; information may be 
> imprecise)
> - org.apache.cassandra.utils.BloomFilter.indexes(java.nio.ByteBuffer) 
> @bci=22, line=82 (Compiled frame)
> - org.apache.cassandra.utils.BloomFilter.isPresent(java.nio.ByteBuffer) 
> @bci=2, line=107 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionController.maxPurgeableTimestamp(org.apache.cassandra.db.DecoratedKey)
>  @bci=89, line=186 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.getMaxPurgeableTimestamp()
>  @bci=21, line=99 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.access$300(org.apache.cassandra.db.compaction.LazilyCompactedRow)
>  @bci=1, line=49 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=241, line=296 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=1, line=206 (Compiled frame)
> - org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext() 
> @bci=44, line=206 (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=645 
> (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(java.util.Iterator)
>  @bci=1, line=166 (Compiled frame)
> - org.apache.cassandra.db.compaction.LazilyCompactedRow.write(long, 
> org.apache.cassandra.io.util.DataOutputPlus) @bci=52, line=121 (Compiled 
> frame)
> - 
> org.apache.cassandra.io.sstable.SSTableWriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=18, line=193 (Compiled frame)
> - 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=13, line=127 (Compiled frame)
> - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() 
> @bci=666, line=197 (Compiled frame)
> - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=6, line=73 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=2, line=59 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
>  @bci=125, line=264 (Compiled frame)
> - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
> - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
> - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Compiled frame)
> - java.util.concur

[jira] [Created] (CASSANDRA-11857) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.map_item_conditional_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11857:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.map_item_conditional_test
 Key: CASSANDRA-11857
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11857
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x/map_item_conditional_test

Failed on CassCI build upgrade_tests-all #46

Again, just a test issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11831) Ability to disable purgeable tombstone check via startup flag

2016-05-20 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293784#comment-15293784
 ] 

Paulo Motta edited comment on CASSANDRA-11831 at 5/20/16 5:33 PM:
--

LGTM, just two minor nits:
* On the unit tests, it seems you're setting 
{{cassandra.never_purge_tombstones}} globally via system properties, won't this 
propagate to other tests? Perhaps it's better to set the property directly 
during these tests on {{CompactionController}} ? Maybe that's why some testing 
are failing on 
[trunk|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-trunk-testall/lastCompletedBuild/testReport/]
 ?
* To avoid doing useless tombstone compactions that won't drop tombstones, 
should we maybe also disable tombstone compactions on 
{{AbstractCompactionStrategy}} if {{cassandra.never_purge_tombstones=true}} ?

edit: this is just a review of {{cassandra.never_purge_tombstones}} flag, we 
should probably move discussion of a more long term solution to the problem of 
improving compaction throughput with many sstables to another ticket.


was (Author: pauloricardomg):
LGTM, just two minor nits:
* On the unit tests, it seems you're setting 
{{cassandra.never_purge_tombstones}} globally via system properties, won't this 
propagate to other tests? Perhaps it's better to set the property directly 
during these tests on {{CompactionController}} ? Maybe that's why some testing 
are failing on 
[trunk|http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-11831-trunk-testall/lastCompletedBuild/testReport/]
 ?
* To avoid doing useless tombstone compactions that won't drop tombstones, 
should we maybe also disable tombstone compactions on 
{{AbstractCompactionStrategy}} if {{cassandra.never_purge_tombstones=true}} ?

> Ability to disable purgeable tombstone check via startup flag
> -
>
> Key: CASSANDRA-11831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11831
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Ryan Svihla
>Assignee: Marcus Eriksson
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> On Cassandra 2.1.14 hen a node gets way behind and has 10s of thousand 
> sstables it appears a lot of the CPU time is spent doing checks like this on 
> a call to getMaxPurgeableTimestamp 
>  org.apache.cassandra.utils.Murmur3BloomFilter.hash(java.nio.ByteBuffer, 
> int, int, long, long[]) @bci=13, line=57 (Compiled frame; information may be 
> imprecise)
> - org.apache.cassandra.utils.BloomFilter.indexes(java.nio.ByteBuffer) 
> @bci=22, line=82 (Compiled frame)
> - org.apache.cassandra.utils.BloomFilter.isPresent(java.nio.ByteBuffer) 
> @bci=2, line=107 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionController.maxPurgeableTimestamp(org.apache.cassandra.db.DecoratedKey)
>  @bci=89, line=186 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.getMaxPurgeableTimestamp()
>  @bci=21, line=99 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.access$300(org.apache.cassandra.db.compaction.LazilyCompactedRow)
>  @bci=1, line=49 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=241, line=296 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=1, line=206 (Compiled frame)
> - org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext() 
> @bci=44, line=206 (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=645 
> (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(java.util.Iterator)
>  @bci=1, line=166 (Compiled frame)
> - org.apache.cassandra.db.compaction.LazilyCompactedRow.write(long, 
> org.apache.cassandra.io.util.DataOutputPlus) @bci=52, line=121 (Compiled 
> frame)
> - 
> org.apache.cassandra.io.sstable.SSTableWriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=18, line=193 (Compiled frame)
> - 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=13, line=127 (Compiled frame)
> - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() 
> @bci=666, line=197 (Compiled frame)
> - 

[jira] [Created] (CASSANDRA-11858) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x.static_columns_cas_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11858:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x.static_columns_cas_test
 Key: CASSANDRA-11858
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11858
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{code=2200 [Invalid query] message="Duplicate and incompatible conditions for 
column v"}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x/static_columns_cas_test

Failed on CassCI build upgrade_tests-all #46

Another problem with the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11820) Altering a column's type causes EOF

2016-05-20 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293810#comment-15293810
 ] 

Carl Yeksigian commented on CASSANDRA-11820:


The EOF is because we are reading more data from the file than is there, since 
we interpret the data as part of the length (and start reading the next 
column). When I add another column, it will continue to read and not be able to 
interpret the next columns data (I get an ArrayOutOfBoundsException).

After killing C*, it was in a bad state reading the mutation that was written 
to the commitlog, so I had to remove the data; if I had left it long enough to 
reclaim the commitlog, it could have restarted, even if we could not recover 
that data.

> Altering a column's type causes EOF
> ---
>
> Key: CASSANDRA-11820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11820
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Carl Yeksigian
> Fix For: 3.0.x, 3.x
>
>
> While working on CASSANDRA-10309, I was testing altering columns' types. This 
> series of operations fails:
> {code}
> CREATE TABLE test (a int PRIMARY KEY, b int)
> INSERT INTO test (a, b) VALUES (1, 1)
> ALTER TABLE test ALTER b TYPE BLOB
> SELECT * FROM test WHERE a = 1
> {code}
> Tried this on 3.0 and trunk, both fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11859) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x.in_order_by_without_selecting_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11859:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x.in_order_by_without_selecting_test
 Key: CASSANDRA-11859
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11859
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{code=2200 [Invalid query] message="ORDER BY can only be performed on columns 
in the select clause (got c1)"}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x/in_order_by_without_selecting_test

Failed on CassCI build upgrade_tests-all #46

Most likely just a test issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11860) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.invalid_custom_timestamp_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11860:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.invalid_custom_timestamp_test
 Key: CASSANDRA-11860
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11860
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{Expecting query to be invalid: got }}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x/invalid_custom_timestamp_test

Failed on CassCI build upgrade_tests-all #46



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11861) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.map_keys_indexing_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11861:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.map_keys_indexing_test
 Key: CASSANDRA-11861
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11861
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x/map_keys_indexing_test

Failed on CassCI build upgrade_tests-all #46



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11862) dtest failure in upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.collection_indexing_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11862:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x.collection_indexing_test
 Key: CASSANDRA-11862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11862
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{code=2200 [Invalid query] message="Indexes on collections are no yet 
supported"}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_0_x_To_indev_2_1_x/collection_indexing_test

Failed on CassCI build upgrade_tests-all #46

Test issue. There is a typo in that error message, but I bet it's from 2.0, so 
no one will care.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11863) dtest failure in upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x.static_columns_with_distinct_test

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11863:
---

 Summary: dtest failure in 
upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x.static_columns_with_distinct_test
 Key: CASSANDRA-11863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11863
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


{{Expected [[1, None]] from SELECT DISTINCT k, s FROM test, but got [[1, None], 
[1, None]]}}

example failure:

http://cassci.datastax.com/job/upgrade_tests-all/46/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_Upgrade_current_2_0_x_To_indev_2_1_x/static_columns_with_distinct_test

Failed on CassCI build upgrade_tests-all #46

This might actually be a bug, but needs looked at first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11038) Is node being restarted treated as node joining?

2016-05-20 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-11038:

Reviewer: Joel Knighton

> Is node being restarted treated as node joining?
> 
>
> Key: CASSANDRA-11038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: cheng ren
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> Hi, 
> What we found recently is that every time we restart a node, all other nodes 
> in the cluster treat the restarted node as a new node joining and issue node 
> joining notification to clients. We have traced the code path being hit when 
> a peer node detected a restarted node:
> src/java/org/apache/cassandra/gms/Gossiper.java
> {code}
> private void handleMajorStateChange(InetAddress ep, EndpointState epState)
> {
> if (!isDeadState(epState))
> {
> if (endpointStateMap.get(ep) != null)
> logger.info("Node {} has restarted, now UP", ep);
> else
> logger.info("Node {} is now part of the cluster", ep);
> }
> if (logger.isTraceEnabled())
> logger.trace("Adding endpoint state for " + ep);
> endpointStateMap.put(ep, epState);
> // the node restarted: it is up to the subscriber to take whatever 
> action is necessary
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onRestart(ep, epState);
> if (!isDeadState(epState))
> markAlive(ep, epState);
> else
> {
> logger.debug("Not marking " + ep + " alive due to dead state");
> markDead(ep, epState);
> }
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onJoin(ep, epState);
> }
> {code}
> subscriber.onJoin(ep, epState) ends up with calling onJoinCluster in 
> Server.java
> {code}
> src/java/org/apache/cassandra/transport/Server.java
> public void onJoinCluster(InetAddress endpoint)
> {
> server.connectionTracker.send(Event.TopologyChange.newNode(getRpcAddress(endpoint),
>  server.socket.getPort()));
> }
> {code}
> We have a full trace of code path and skip some intermedia function calls 
> here for being brief. 
> Upon receiving the node joining notification, clients would go and scan 
> system peer table to fetch the latest topology information. Since we have 
> tens of thousands of client connections, scans from all of them put an 
> enormous load to our cluster. 
> Although in the newer version of driver, client skips fetching peer table if 
> the new node has already existed in local metadata, we are still curious why 
> node being restarted is handled as node joining on server side? Did we hit a 
> bug or this is the way supposed to be? Our old java driver version is 1.0.4 
> and cassandra version is 2.0.12.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

2016-05-20 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-11750:

Reviewer: Paulo Motta

> Offline scrub should not abort when it hits corruption
> --
>
> Key: CASSANDRA-11750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Hattrell
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: Tools
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted

2016-05-20 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293873#comment-15293873
 ] 

Joel Knighton commented on CASSANDRA-11742:
---

[~tommy_s] Any reason you're opposed to switching to the approach I described 
above? It would guarantee that a crash during start up wouldn't leave the 
system keyspace in a failing health check state, as opposed to only shrinking 
the window. If that's fine with you, I can switch myself to assignee and push 
that branch for CI.

> Failed bootstrap results in exception when node is restarted
> 
>
> Key: CASSANDRA-11742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a 
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system 
> keyspace files, but they couldn't be loaded!}} exception when the node is 
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I 
> know that the workaround is relatively easy, just delete the system keyspace 
> in the data folder on disk and try again, but its a bit annoying that you 
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has 
> been moved to just before the bootstrap begins (in 2.1 it was done much 
> earlier) and as a result its still in the memtable och commitlog if the 
> bootstrap failes. Still a few values is inserted to the {{system.local}} 
> table at an earlier point in the startup and they have been flushed from the 
> memtable to an sstable. When the node is restarted the 
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed 
> and therefore only see the sstable with an incomplete {{system.local}} table 
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in 
> the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment

2016-05-20 Thread Rajesh Babu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293886#comment-15293886
 ] 

Rajesh Babu commented on CASSANDRA-11847:
-

Thanks Jeff, for your inputs. I'll ask my customer to replace their hardware.

Rajesh

> Cassandra dies on a specific node in a multi-DC environment
> ---
>
> Key: CASSANDRA-11847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11847
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Core
> Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15
>Reporter: Rajesh Babu
> Attachments: java_error19030.log, java_error2912.log, 
> java_error4571.log, java_error7539.log, java_error9552.log
>
>
> We've a customer who runs a 16 node 2 DC (8 nodes each) environment where 
> Cassandra pid dies randomly but on a specific node.
> Whenever Cassandra dies, admin has to manually restart Cassandra only on that 
> node.
> I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 
> (patch 79) but it still seems to be an issue. 
> Is this a known hardware related bug or should is this issue fixed in later 
> Cassandra versions? 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libjava.so+0xe027f]  _fini+0xbd5f7
> #
> # Core dump written. Default location: /tmp/core or core.19030
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f453c89f000):  JavaThread "COMMIT-LOG-WRITER" 
> [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f4542d5a27f
> Registers:
> RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, 
> RDX=0x0020
> RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, 
> RDI=0x0001
> R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, 
> R11=0x0006fae57068
> R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, 
> R15=0x7f453c89f000
> RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0014
>   TRAPNO=0x000e
> -
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  0x7f28e08787a4
> #
> # Core dump written. Default location: /tmp/core or core.2912
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f2640008000):  JavaThread "ValidationExecutor:15" 
> daemon [_thread_in_Java, id=7393, 
> stack(0x7f256fdf8000,0x7f256fe39000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f28e08787a4
> Registers:
> RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, 
> RDX=0xc770409a
> RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, 
> RDI=0x
> R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, 
> R11=0x3f8bb878
> R12=0x, R13=0x00063b820530, R14=0x000b, 
> R15=0x7f2640008000
> RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0015
>   TRAPNO=0x000e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11864) dtest failure in thrift_tests.TestMutations.test_dynamic_indexes_with_system_update_cf

2016-05-20 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-11864:
---

 Summary: dtest failure in 
thrift_tests.TestMutations.test_dynamic_indexes_with_system_update_cf
 Key: CASSANDRA-11864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11864
 Project: Cassandra
  Issue Type: Test
Reporter: Philip Thompson
Assignee: DS Test Eng


This has started pretty consistently failing on 2.2+ on CI. I am not 
reproducing it locally though, so it is flaky. The failures all related to 
failed queries, whether they don't return data when they're expected to, or 
they time out.

example failure:

http://cassci.datastax.com/job/cassandra-3.0_dtest/710/testReport/thrift_tests/TestMutations/test_dynamic_indexes_with_system_update_cf

Failed on CassCI build cassandra-3.0_dtest #710

I expect this may be a bug, but someone should go in and bisect, or get 
slightly more useful debugging data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11272) NullPointerException (NPE) during bootstrap startup in StorageService.java

2016-05-20 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293923#comment-15293923
 ] 

Joel Knighton commented on CASSANDRA-11272:
---

+1. Note to committer: patch is at the end of 
[ifesdjeen/11272-2.2|https://github.com/ifesdjeen/cassandra/tree/11272-2.2]. It 
should apply cleanly on 2.2 and merge cleanly into 3.0/3.7/trunk.

Although this just fixes the logging NPE from some other issue, we should close 
this after commit. Once this is committed, logging output will be fixed, and 
there will be nothing to lead a new reporter here.

> NullPointerException (NPE) during bootstrap startup in StorageService.java
> --
>
> Key: CASSANDRA-11272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11272
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
> Environment: debian jesse up to date
>Reporter: Jason Kania
>Assignee: Alex Petrov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> After bootstrapping fails due to stream closed error, the following error 
> results:
> {code}
> Feb 27, 2016 8:06:38 PM com.google.common.util.concurrent.ExecutionList 
> executeListener
> SEVERE: RuntimeException while executing runnable 
> com.google.common.util.concurrent.Futures$6@3d61813b with executor INSTANCE
> java.lang.NullPointerException
> at 
> org.apache.cassandra.service.StorageService$2.onFailure(StorageService.java:1284)
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
> at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
> at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
> at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
> at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:525)
> at 
> org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:645)
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:70)
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11272) NullPointerException (NPE) during bootstrap startup in StorageService.java

2016-05-20 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11272:
--
Status: Ready to Commit  (was: Patch Available)

> NullPointerException (NPE) during bootstrap startup in StorageService.java
> --
>
> Key: CASSANDRA-11272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11272
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
> Environment: debian jesse up to date
>Reporter: Jason Kania
>Assignee: Alex Petrov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> After bootstrapping fails due to stream closed error, the following error 
> results:
> {code}
> Feb 27, 2016 8:06:38 PM com.google.common.util.concurrent.ExecutionList 
> executeListener
> SEVERE: RuntimeException while executing runnable 
> com.google.common.util.concurrent.Futures$6@3d61813b with executor INSTANCE
> java.lang.NullPointerException
> at 
> org.apache.cassandra.service.StorageService$2.onFailure(StorageService.java:1284)
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
> at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
> at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
> at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
> at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:525)
> at 
> org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:645)
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:70)
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293927#comment-15293927
 ] 

Alex Petrov edited comment on CASSANDRA-4 at 5/20/16 6:30 PM:
--

I've also changed the wording a bit:

bq.  Other columns are free from those restrictions (no validation of existing 
data is performed), but it is usually a bad idea to change the type to a 
non-compatible one, unless no data have been inserted for that column yet, as 
this could confuse CQL drivers/tools.

is now:

bq. To change the type of a column, the column must already exist in type 
definition and its type should be compatible with the new type. The 
compatibility table is available below.

As previously it sounded as if the {{ALTER TABLE}} statement would actually 
succeed for those types. Although since it's going to succeed only in case when 
types are compatible (or same), the reference to the compatibility table is 
added.

I've triggered a CI just to make sure that the newly written tests are working:

|[trunk|https://github.com/ifesdjeen/cassandra/commits/4-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-4-trunk-testall/]|


was (Author: ifesdjeen):
I've also changed the wording a bit:

.bq  Other columns are free from those restrictions (no validation of existing 
data is performed), but it is usually a bad idea to change the type to a 
non-compatible one, unless no data have been inserted for that column yet, as 
this could confuse CQL drivers/tools.

is now:

.bq To change the type of a column, the column must already exist in type 
definition and its type should be compatible with the new type. The 
compatibility table is available below.

As previously it sounded as if the {{ALTER TABLE}} statement would actually 
succeed for those types. Although since it's going to succeed only in case when 
types are compatible (or same), the reference to the compatibility table is 
added.

I've triggered a CI just to make sure that the newly written tests are working:

|[trunk|https://github.com/ifesdjeen/cassandra/commits/4-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-4-trunk-testall/]|

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293927#comment-15293927
 ] 

Alex Petrov commented on CASSANDRA-4:
-

I've also changed the wording a bit:

.bq  Other columns are free from those restrictions (no validation of existing 
data is performed), but it is usually a bad idea to change the type to a 
non-compatible one, unless no data have been inserted for that column yet, as 
this could confuse CQL drivers/tools.

is now:

.bq To change the type of a column, the column must already exist in type 
definition and its type should be compatible with the new type. The 
compatibility table is available below.

As previously it sounded as if the {{ALTER TABLE}} statement would actually 
succeed for those types. Although since it's going to succeed only in case when 
types are compatible (or same), the reference to the compatibility table is 
added.

I've triggered a CI just to make sure that the newly written tests are working:

|[trunk|https://github.com/ifesdjeen/cassandra/commits/4-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-4-trunk-testall/]|

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7190) Add schema to snapshot manifest

2016-05-20 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293929#comment-15293929
 ] 

Robert Stupp commented on CASSANDRA-7190:
-

Could this be done as a {{CREATE OR REPLACE TABLE ...}} statement? So it would 
then effectively replace an existing table definition (or just create the 
table).
But thinking further (considering UDTs, maybe UDFs in the future for functional 
indexes/MVs using functions), it would require something like a {{CREATE OR 
REPLACE KEYSPACE}} functionality. But that feels dirty and essentially complex.
I *think* there was an idea for a DDL statements batch - not sure whether it 
was a ticket, some discussion or just in my thoughts. Anyway, such a {{BEGIN 
DDL BATCH}} could contain a bunch of {{CREATE}}/{{ALTER}}/{{DROP}} statements 
and would be applied as a single schema mutation. Not sure, if it's worth the 
effort - just an idea.

Another thing that just came to mind is: would it be a good idea to also 
include the {{cfId}}?

> Add schema to snapshot manifest
> ---
>
> Key: CASSANDRA-7190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
>
> followup from CASSANDRA-6326



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293927#comment-15293927
 ] 

Alex Petrov edited comment on CASSANDRA-4 at 5/20/16 6:31 PM:
--

I've also changed the wording a bit:

bq.  Other columns are free from those restrictions (no validation of existing 
data is performed), but it is usually a bad idea to change the type to a 
non-compatible one, unless no data have been inserted for that column yet, as 
this could confuse CQL drivers/tools.

is now:

bq. To change the type of any other column, the column must already exist in 
type definition and its type should be compatible with the new type. The 
compatibility table is available below.

As previously it sounded as if the {{ALTER TABLE}} statement would actually 
succeed for those types. Although since it's going to succeed only in case when 
types are compatible (or same), the reference to the compatibility table is 
added.

I've triggered a CI just to make sure that the newly written tests are working:

|[trunk|https://github.com/ifesdjeen/cassandra/commits/4-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-4-trunk-testall/]|


was (Author: ifesdjeen):
I've also changed the wording a bit:

bq.  Other columns are free from those restrictions (no validation of existing 
data is performed), but it is usually a bad idea to change the type to a 
non-compatible one, unless no data have been inserted for that column yet, as 
this could confuse CQL drivers/tools.

is now:

bq. To change the type of a column, the column must already exist in type 
definition and its type should be compatible with the new type. The 
compatibility table is available below.

As previously it sounded as if the {{ALTER TABLE}} statement would actually 
succeed for those types. Although since it's going to succeed only in case when 
types are compatible (or same), the reference to the compatibility table is 
added.

I've triggered a CI just to make sure that the newly written tests are working:

|[trunk|https://github.com/ifesdjeen/cassandra/commits/4-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-4-trunk-testall/]|

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11831) Ability to disable purgeable tombstone check via startup flag

2016-05-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293993#comment-15293993
 ] 

Marcus Eriksson commented on CASSANDRA-11831:
-

bq. this exercise also prompts us that we may want to consider adding some 
dtest to cover the scenario when we have thousands of tiny SSTables
yeah, [~tjake] is working on this in CASSANDRA-11844

[~pauloricardomg] I think it should be ok with the tests since we have 
{{}} - we fork a new jvm for 
every test, but I'll investigate it

And yes, I'll disable the tombstone compactions, and log a big warning for 
every compaction where {{never_purge_tombstones}} is enabled.

> Ability to disable purgeable tombstone check via startup flag
> -
>
> Key: CASSANDRA-11831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11831
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Ryan Svihla
>Assignee: Marcus Eriksson
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> On Cassandra 2.1.14 hen a node gets way behind and has 10s of thousand 
> sstables it appears a lot of the CPU time is spent doing checks like this on 
> a call to getMaxPurgeableTimestamp 
>  org.apache.cassandra.utils.Murmur3BloomFilter.hash(java.nio.ByteBuffer, 
> int, int, long, long[]) @bci=13, line=57 (Compiled frame; information may be 
> imprecise)
> - org.apache.cassandra.utils.BloomFilter.indexes(java.nio.ByteBuffer) 
> @bci=22, line=82 (Compiled frame)
> - org.apache.cassandra.utils.BloomFilter.isPresent(java.nio.ByteBuffer) 
> @bci=2, line=107 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionController.maxPurgeableTimestamp(org.apache.cassandra.db.DecoratedKey)
>  @bci=89, line=186 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.getMaxPurgeableTimestamp()
>  @bci=21, line=99 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.access$300(org.apache.cassandra.db.compaction.LazilyCompactedRow)
>  @bci=1, line=49 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=241, line=296 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.LazilyCompactedRow$Reducer.getReduced() 
> @bci=1, line=206 (Compiled frame)
> - org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext() 
> @bci=44, line=206 (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=645 
> (Compiled frame)
> - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
> line=143 (Compiled frame)
> - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
> (Compiled frame)
> - 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(java.util.Iterator)
>  @bci=1, line=166 (Compiled frame)
> - org.apache.cassandra.db.compaction.LazilyCompactedRow.write(long, 
> org.apache.cassandra.io.util.DataOutputPlus) @bci=52, line=121 (Compiled 
> frame)
> - 
> org.apache.cassandra.io.sstable.SSTableWriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=18, line=193 (Compiled frame)
> - 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.compaction.AbstractCompactedRow)
>  @bci=13, line=127 (Compiled frame)
> - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() 
> @bci=666, line=197 (Compiled frame)
> - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=6, line=73 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=2, line=59 (Compiled frame)
> - 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
>  @bci=125, line=264 (Compiled frame)
> - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
> - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
> - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Compiled frame)
> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> If we could at least on startup pass a fl

[jira] [Commented] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Giampaolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294019#comment-15294019
 ] 

Giampaolo commented on CASSANDRA-4:
---

Yes, please. I didn't think about a unit test to enforce docs. It's a good idea.

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Giampaolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294023#comment-15294023
 ] 

Giampaolo commented on CASSANDRA-4:
---

This sentence was taken directly from current documentation. I just removed the 
sentence about limits on clustering column since it's not true. I think yours 
is better, but what about removing the part on existing data?

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11844) Create compaction-stress

2016-05-20 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294107#comment-15294107
 ] 

T Jake Luciani edited comment on CASSANDRA-11844 at 5/20/16 8:16 PM:
-

pushed https://github.com/tjake/cassandra/tree/compaction-stress

[testall|http://cassci.datastax.com/job/tjake-compaction-stress-testall/]
[dtest|http://cassci.datastax.com/job/tjake-compaction-stress-dtest/]

Example use (see help for all options):
{code}
#write 5g of sstables using 4 writers
./tools/bin/compaction-stress write -d /tmp/compaction -g 5 -p 
https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml
 -t 4

#Compact the data using 4 compactors
./bin/compaction-stress compact -d /tmp/compaction -p 
https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml
 -t 4
{code}

The output of the compact command, besides stdout, is the compaction.log from 
CASSANDRA-10805.  I think we should extend the compaction log to include more 
information like row/partition data.

/cc for input [~krummas] [~pauloricardomg]




was (Author: tjake):
pushed https://github.com/tjake/cassandra/tree/compaction-stress

[testall|http://cassci.datastax.com/job/tjake-compaction-stress-testall/]
[dtest|http://cassci.datastax.com/job/tjake-compaction-stress-dtest/]

Example use (see help for all options):
{quote}
#write 5g of sstables using 4 writers
./tools/bin/compaction-stress write -d /tmp/compaction -g 5 -p 
https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml
 -t 4

#Compact the data using 4 compactors
./bin/compaction-stress compact -d /tmp/compaction -p 
https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml
 -t 4
{quote}

The output of the compact command, besides stdout, is the compaction.log from 
CASSANDRA-10805.  I think we should extend the compaction log to include more 
information like row/partition data.

/cc for input [~krummas] [~pauloricardomg]



> Create compaction-stress
> 
>
> Key: CASSANDRA-11844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11844
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 3.x
>
>
> A tool like cassandra-stress that works with stress yaml but:
>   * writes directly to a specified dir using CQLSSTableWriter. 
>   * lets you run just compaction on that directory and generates a report on 
> compaction throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11844) Create compaction-stress

2016-05-20 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-11844:
---
 Reviewer: Carl Yeksigian
Fix Version/s: 3.x
   Status: Patch Available  (was: Open)

pushed https://github.com/tjake/cassandra/tree/compaction-stress

[testall|http://cassci.datastax.com/job/tjake-compaction-stress-testall/]
[dtest|http://cassci.datastax.com/job/tjake-compaction-stress-dtest/]

Example use (see help for all options):
{quote}
#write 5g of sstables using 4 writers
./tools/bin/compaction-stress write -d /tmp/compaction -g 5 -p 
https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml
 -t 4

#Compact the data using 4 compactors
./bin/compaction-stress compact -d /tmp/compaction -p 
https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml
 -t 4
{quote}

The output of the compact command, besides stdout, is the compaction.log from 
CASSANDRA-10805.  I think we should extend the compaction log to include more 
information like row/partition data.

/cc for input [~krummas] [~pauloricardomg]



> Create compaction-stress
> 
>
> Key: CASSANDRA-11844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11844
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
> Fix For: 3.x
>
>
> A tool like cassandra-stress that works with stress yaml but:
>   * writes directly to a specified dir using CQLSSTableWriter. 
>   * lets you run just compaction on that directory and generates a report on 
> compaction throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294120#comment-15294120
 ] 

Alex Petrov commented on CASSANDRA-4:
-

Yes, I know it was there.. Just tried to reword it (actually took the other 
wording from DataStax documentation on {{ALTER}} statement) to reflect the 
current situation.

You're right about the validation of existing data part. I've changed it to:

bq. To change the type of any other column, the column must already exist in 
type definition and its type should be compatible with the new type. No 
validation of existing data is performed. The compatibility table is available 
below.

|[trunk|https://github.com/ifesdjeen/cassandra/commits/4-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-4-trunk-testall/]|

CI has passed the added tests. So if you +1, we can move it to ready to commit 
state.

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11799) dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error

2016-05-20 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294133#comment-15294133
 ] 

Michael Shuler commented on CASSANDRA-11799:


We finally figured out how to reproduce this. When using {{--with-xunit}} and 
we {{| tee -a logfile}}, this is when we expose the problem.

All looks fine locally:
{noformat}
$ nosetests --with-xunit -vs 
cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_unicode_invalid_request_error 
cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_unicode_syntax_error
test_unicode_invalid_request_error (cqlsh_tests.cqlsh_tests.TestCqlsh) ... (EE) 
 :2:InvalidRequest: code=2200 [Invalid query] message=""ä" is not a 
valid keyspace name"(EE)  ok
test_unicode_syntax_error (cqlsh_tests.cqlsh_tests.TestCqlsh) ... (EE)  
:2:Invalid syntax at char 1(EE)  :2:  ä;(EE)  :2:  ^(EE)  
ok

--
XML: /home/mshuler/git/cassandra-dtest/nosetests.xml
--
Ran 2 tests in 24.650s

OK
{noformat}

Pipe to tee and we repro the problem in CI:
{noformat}
$ nosetests --with-xunit -vs 
cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_unicode_invalid_request_error 
cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_unicode_syntax_error | tee -a logfile
test_unicode_invalid_request_error (cqlsh_tests.cqlsh_tests.TestCqlsh) ... (EE) 
 ERROR
test_unicode_syntax_error (cqlsh_tests.cqlsh_tests.TestCqlsh) ... (EE)  
:2:Invalid syntax at char 1(EE)  ERROR

==
ERROR: test_unicode_invalid_request_error (cqlsh_tests.cqlsh_tests.TestCqlsh)
--
Traceback (most recent call last):
  File "/home/mshuler/git/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", line 
476, in test_unicode_invalid_request_error
output, err = node1.run_cqlsh(cmds=cmd, return_output=True)
  File "/home/mshuler/git/ccm/ccmlib/node.py", line 816, in run_cqlsh
print_("(EE) ", err, end='')
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 
62: ordinal not in range(128)
 >> begin captured logging << 
dtest: DEBUG: cluster ccm directory: /tmp/dtest-Wvh3PT
dtest: DEBUG: Custom init_config not found. Setting defaults.
dtest: DEBUG: Done setting configuration options:
{   'initial_token': None,
'num_tokens': '256',
'phi_convict_threshold': 5,
'range_request_timeout_in_ms': 1,
'read_request_timeout_in_ms': 1,
'request_timeout_in_ms': 1,
'truncate_request_timeout_in_ms': 1,
'write_request_timeout_in_ms': 1}
- >> end captured logging << -

==
ERROR: test_unicode_syntax_error (cqlsh_tests.cqlsh_tests.TestCqlsh)
--
Traceback (most recent call last):
  File "/home/mshuler/git/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", line 
456, in test_unicode_syntax_error
output, err = node1.run_cqlsh(cmds=u"ä;".encode('utf8'), return_output=True)
  File "/home/mshuler/git/ccm/ccmlib/node.py", line 816, in run_cqlsh
print_("(EE) ", err, end='')
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
  File "/usr/lib/python2.7/dist-packages/nose/plugins/xunit.py", line 126, in 
write
s.write(data)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 
12: ordinal not in range(128)
 >> begin captured logging << 
dtest: DEBUG: cluster ccm directory: /tmp/dtest-lh6EEU
dtest: DEBUG: Custom init_config not found. Setting defaults.
dtest: DEBUG: Done setting configuration options:
{   'initial_token': None,
'num_tokens': '256',
'phi_convict_threshold': 5,
'range_request_timeout_in_ms': 1,
'read_request_timeout_in_ms': 1,
'request_timeout_in_ms': 1,
'truncate_request_timeout_in_ms': 1,
'write_request_timeout_in_ms': 1}
- >> end captured logging << -

--
XML:

[jira] [Commented] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Giampaolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294137#comment-15294137
 ] 

Giampaolo commented on CASSANDRA-4:
---

Yes, thanks for your review. :)

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11114) Document which type conversions are allowed

2016-05-20 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-4:

Status: Ready to Commit  (was: Patch Available)

> Document which type conversions are allowed
> ---
>
> Key: CASSANDRA-4
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Documentation and Website
>Reporter: Sylvain Lebresne
>Assignee: Giampaolo
>Priority: Minor
>  Labels: doc-impacting, lhf
> Attachments: cassandra-4-trunk.patch
>
>
> We allow only some type conversion through {{ALTER TABLE}} and type casts, 
> the ones that don't break stuff, but we don't currently document which ones 
> those are. We should add it to 
> http://cassandra.apache.org/doc/cql3/CQL-3.0.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[cassandra] Git Push Summary

2016-05-20 Thread jake
Repository: cassandra
Updated Tags:  refs/tags/3.6-tentative [deleted] c17cbe187


[cassandra] Git Push Summary

2016-05-20 Thread jake
Repository: cassandra
Updated Tags:  refs/tags/3.6-tentative [created] dee84ccc5


[jira] [Commented] (CASSANDRA-11824) If repair fails no way to run repair again

2016-05-20 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294182#comment-15294182
 ] 

Paulo Motta commented on CASSANDRA-11824:
-

Overall +1, just some minor nits:
* I think always registering with the FD/Gossiper at {{ActiveRepairService}} 
construction will make the code simpler, instead of registering only in the 
first submitted session and keeping the {{registeredForEndpointChanges}} 
variable. The penalty will be negligible if there is no repair running, while 
keeping less state in {{ActiveRepairService}}.
* To make the dtest more deterministic, instead of sleeping {{3 seconds}} can 
you {{watch_log_for("Requesting merkle trees for")}} instead? We could maybe 
also check for {{"Removing .* in parent repair sessions"}} in the log of 
participant nodes, to make sure FD is killing the repair session. In my box, 
for example, the coordinator was killed before any message was sent to the 
participants.

> If repair fails no way to run repair again
> --
>
> Key: CASSANDRA-11824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11824
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
>  Labels: fallout
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> I have a test that disables gossip and runs repair at the same time. 
> {quote}
> WARN  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> StorageService.java:384 - Stopping gossip by operator request
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> Gossiper.java:1463 - Announcing shutdown
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 
> StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown
> INFO  [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
> INFO  [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76
> INFO  [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting 
> repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting 
> repair command #2, repairing keyspace stresscql with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting 
> repair command #3, repairing keyspace system_traces with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2)
> {quote}
> This ends up failing:
> {quote}
> 16:54:44.844 INFO  serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] 
> Starting repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [172.31.24.76, 172.31.17.32]
> [2016-05-17 16:57:21,945] null
> {quote}
> Subsequent calls to repair with all nodes up still fails:
> {quote}
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 
> CompactionManager.java:1193 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - 
> Failed creating a merkle tree for [repair 
> #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1, 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11865) Improve compaction logging details

2016-05-20 Thread T Jake Luciani (JIRA)
T Jake Luciani created CASSANDRA-11865:
--

 Summary: Improve compaction logging details
 Key: CASSANDRA-11865
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11865
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Carl Yeksigian


I'd like to see per compaction entry:

  * Partitions processed
  * Rows processed
  * Partition merge stats
  * If a wide row was detected
  * The partition min/max/avg size
  * The min/max/avg row could per partition

Anything else [~krummas]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >