[jira] [Updated] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

2016-04-22 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-11623:
-
Reviewer: Marcus Eriksson

> Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
> ---
>
> Key: CASSANDRA-11623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tom Petracca
>Priority: Minor
> Attachments: compactiontask_profile.png
>
>
> Been doing some performance tuning and profiling of my cassandra cluster and 
> noticed that compaction speeds for my tables that I know to have very short 
> rows were going particularly slowly.  Profiling shows a ton of time being 
> spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a 
> CompactionTask shows that a majority of time is being spent lseek (called by 
> getOnDiskFilePointer), and not read or write.
> Going deeper it looks like we call getOnDiskFilePointer each row (sometimes 
> multiple times per row) in order to see if we've reached our expected sstable 
> size and should start a new writer.  This is pretty unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (CASSANDRA-11024) Unexpected exception during request; java.lang.StackOverflowError: null

2016-04-22 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov reopened CASSANDRA-11024:
-

I may have interpreted the description with a bit of a bias towards the insert 
statements.

I've tried to reproduce this once again with 2.2.4: adding, deleting rows and 
values, inserting many thousands of rows and then reading them all together. 
All of it in a very wide table (over 1000 columns).

{{INSERT}} statements fail when too many rows are inserted (as fixed in 
#11621), although {{SELECT}} statements are working without issues no matter 
how many tombstones (judging from the last log statement) or columns there are 
in the table, with or without nulls.

[~depend] could you try running {{SELECT *}} from cqlsh and show what cqlsh 
says?

> Unexpected exception during request; java.lang.StackOverflowError: null
> ---
>
> Key: CASSANDRA-11024
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11024
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 7, Java x64 1.8.0_65
>Reporter: Kai Wang
>Priority: Minor
>
> This happened when I run a "SELECT *" query on a very wide table. The table 
> has over 1000 columns and a lot of nulls. If I run "SELECT * ... LIMIT 10" or 
> "SELECT a,b,c FROM ...", then it's fine. The data is being actively inserted 
> when I run the query. Will try later when compaction (LCS) catches up.
> {noformat}
> ERROR [SharedPool-Worker-5] 2016-01-15 20:49:08,212 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x8e11d570, 
> /192.168.0.3:50332 => /192.168.0.11:9042]
> java.lang.StackOverflowError: null
>   at 
> com.google.common.base.Preconditions.checkPositionIndex(Preconditions.java:339)
>  ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.AbstractIndexedListIterator.(AbstractIndexedListIterator.java:69)
>  ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$11.(Iterators.java:1048) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators.forArray(Iterators.java:1048) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.RegularImmutableList.listIterator(RegularImmutableList.java:106)
>  ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.listIterator(ImmutableList.java:344) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.iterator(ImmutableList.java:340) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.iterator(ImmutableList.java:61) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables.iterators(Iterables.java:504) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables.access$100(Iterables.java:60) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$2.iterator(Iterables.java:494) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$3.transform(Iterables.java:508) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$3.transform(Iterables.java:505) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>  ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:543) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0

[jira] [Assigned] (CASSANDRA-8911) Consider Mutation-based Repairs

2016-04-22 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-8911:
--

Assignee: Marcus Eriksson

> Consider Mutation-based Repairs
> ---
>
> Key: CASSANDRA-8911
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Assignee: Marcus Eriksson
> Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming 
> repair.  While we're at it, we could do away with a lot of the complexity 
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" 
> one-by-one.  Instead of dealing with token ranges, make the leaves primary 
> key ranges.  The PK ranges would need to be contiguous, so that the start of 
> each range would match the end of the previous range. (The first and last 
> leaves would need to be open-ended on one end of the PK range.) This would be 
> similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the 
> other replicas along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK 
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row 
> counts don't match, in which case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation covering that 
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
>  * With the current repair behavior of streaming, vnode-enabled clusters may 
> need to stream hundreds of small SSTables.  This results in increased compact
> ion load on the receiving node.  With the mutation-based approach, memtables 
> would naturally merge these.
>  * It's simple to throttle.  For example, you could give a number of rows/sec 
> that should be repaired.
>  * It's easy to see what PK range has been repaired so far.  This could make 
> it simpler to resume a repair that fails midway.
>  * Inconsistencies start to be repaired almost right away.
>  * Less special code \(?\)
>  * Wide partitions are no longer a problem.
> There are a few problems I can think of:
>  * Counters.  I don't know if this can be made safe, or if they need to be 
> skipped.
>  * To support incremental repair, we need to be able to read from only 
> repaired sstables.  Probably not too difficult to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253514#comment-15253514
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


Can we keep the conversation going to get this patch into the next 2.x release?

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11427) Range slice queries CL > ONE trigger read-repair of purgeable tombstones

2016-04-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253550#comment-15253550
 ] 

Stefan Podkowinski commented on CASSANDRA-11427:


bq. this only concerns partition and range tombstones, so this is imo a fairly 
minor efficiency problem, not a correction issue.

Although the performance overhead implied by this bug should be fairly low in 
most cases, it's leaving people confused why these read repairs happen in first 
place. If you have some monitoring in place telling you the system constantly 
has to read repair data, you should get to investigate. That's what I did.

bq. So that I wonder if it's worth taking any risk just for 2.2. 

Any tests I can add to reduce the risks of possible side effects?


> Range slice queries CL > ONE trigger read-repair of purgeable tombstones
> 
>
> Key: CASSANDRA-11427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11427
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11427-2.1.patch, 11427-2.2_v2.patch
>
>
> Range queries will trigger read repairs for purgeable tombstones on hosts 
> that already compacted given tombstones. Clusters with periodical jobs for 
> scanning data ranges will likely see tombstones ressurected through RRs just 
> to have them compacted again later at the destination host.
> Executing range queries (e.g. for reading token ranges) will compare the 
> actual data instead of using digests when executed with CL > ONE. Responses 
> will be consolidated by {{RangeSliceResponseResolver.Reducer}}, where the 
> result of {{RowDataResolver.resolveSuperset}} is used as the reference 
> version for the results. {{RowDataResolver.scheduleRepairs}} will then send 
> the superset to all nodes that returned a different result before. 
> Unfortunately this does also involve cases where the superset is just made up 
> of purgeable tombstone(s) that already have been compacted on the other 
> nodes. In this case a read-repair will be triggered for transfering the 
> purgeable tombstones to all other nodes nodes that returned an empty result.
> The issue can be reproduced with the provided dtest or manually using the 
> following steps:
> {noformat}
> create keyspace test1 with replication = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 2 };
> use test1;
> create table test1 ( a text, b text, primary key(a, b) ) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} AND 
> dclocal_read_repair_chance = 0 AND gc_grace_seconds = 0;
> delete from test1 where a = 'a';
> {noformat}
> {noformat}
> ccm flush;
> ccm node2 compact;
> {noformat}
> {noformat}
> use test1;
> consistency all;
> tracing on;
> select * from test1;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11594) Too many open files on directories

2016-04-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253569#comment-15253569
 ] 

Alex Petrov commented on CASSANDRA-11594:
-

I've noticed there're many handles open against directories (24K vs 2,5K opened 
against files in this directory, similar situation with another directory).
Stressing node so far revealed no obvious leaks. Are {{emails_by_user_id}} and 
{{email_logs_query}} MVs?

> Too many open files on directories
> --
>
> Key: CASSANDRA-11594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11594
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: n0rad
>Priority: Critical
> Attachments: openfiles.zip, screenshot.png
>
>
> I have a 6 nodes cluster in prod in 3 racks.
> each node :
> - 4Gb commitlogs on 343 files
> - 275Gb data on 504 files 
> On saturday, 1 node in each rack crash with with too many open files (seems 
> to be the similar node in each rack).
> {code}
> lsof -n -p $PID give me 66899 out of 65826 max
> {code}
> it contains 64527 open directories (2371 uniq)
> a part of the list :
> {code}
> java19076 root 2140r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2141r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2142r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2143r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2144r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2145r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2146r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2147r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2148r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2149r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2150r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2151r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2152r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2153r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2154r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2155r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> {code}
> The 3 others nodes crashes 4 hours later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11594) Too many open files on directories

2016-04-22 Thread n0rad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253573#comment-15253573
 ] 

n0rad commented on CASSANDRA-11594:
---

emails_by_user_id is is a MV.

email_logs_query is the keyspace's name.



> Too many open files on directories
> --
>
> Key: CASSANDRA-11594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11594
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: n0rad
>Priority: Critical
> Attachments: openfiles.zip, screenshot.png
>
>
> I have a 6 nodes cluster in prod in 3 racks.
> each node :
> - 4Gb commitlogs on 343 files
> - 275Gb data on 504 files 
> On saturday, 1 node in each rack crash with with too many open files (seems 
> to be the similar node in each rack).
> {code}
> lsof -n -p $PID give me 66899 out of 65826 max
> {code}
> it contains 64527 open directories (2371 uniq)
> a part of the list :
> {code}
> java19076 root 2140r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2141r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2142r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2143r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2144r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2145r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2146r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2147r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2148r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2149r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2150r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2151r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2152r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2153r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2154r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2155r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> {code}
> The 3 others nodes crashes 4 hours later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8911) Consider Mutation-based Repairs

2016-04-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253574#comment-15253574
 ] 

Marcus Eriksson commented on CASSANDRA-8911:


wip branch for this here: 
https://github.com/krummas/cassandra/commits/marcuse/8911

It mostly follows what [~thobbs] outlined above:
* The repairing node pages through its local data with page size = 
{{WINDOW_SIZE}}, [calculating a 
hash|https://github.com/krummas/cassandra/blob/marcuse/8911/src/java/org/apache/cassandra/repair/mutation/MBRService.java#L149-L155]
 for a page and sends the hash to its replicas. We figure out the {{start}} and 
{{end}} "keys" (partition key + clustering) that the remote nodes should 
compare the hash for. Repairing node sends a 
[RepairPage|https://github.com/krummas/cassandra/blob/marcuse/8911/src/java/org/apache/cassandra/repair/mutation/MBRRepairPage.java#L47-L52]
 containing the needed information.
* The replicas [read 
up|https://github.com/krummas/cassandra/blob/marcuse/8911/src/java/org/apache/cassandra/repair/mutation/MBRVerbHandler.java#L58]
 the local data between the {{start}}/{{end}} keys above with a limit of 
{{WINDOW_SIZE * 2}}
** If the hashes match, we reply that the data matched
** If we hit the limit when reading within the {{start}}/{{end}}, we consider 
this a "huge" response and we handle that separately - we reply to the 
repairing node that we have many rows between {{start}}/{{end}}, and the 
repairing node will page back the data from that node. This can happen if the 
repairing node has lost an sstable for example.
** If the hashes don't match, we reply with the data and the repairing node 
will diff its data within the window with the remote data and only write the 
differences to the memtable

Regarding page cache pollution I think we should handle this as normal reads, 
first, the intention is to read through the data slowly so we won't blow out 
all the 'real' pages in a short time, and second, we will read the data twice 
within a very short time span if there is a mismatch, so the page cache should 
make the impact of this smaller.

*Discussion points:*
* How do we make repair invisible to the operator? (continuous process, 
calculate how many rows/s we need to repair etc)
* Can we handle gcgs in a better way with this?
* Can we avoid anticompaction? Do we really need it to be incremental? 
(probably, but having a single thread page through the entire dataset should be 
something the cluster can handle)

TODO:
* Properly handle the "huge" responses above - need to have a way for the 
remote paging to return {{UnfilteredPartitionIterator}}s
* Make it incremental - currently reads all data and puts the differences in 
the regular memtable. We could probably have a separate memtable that is 
flushed to a repaired sstable.
* Avoid breaking DTCS etc, since all mutations go into the same memtable, the 
flushed sstable will cover a big time window. Best solution to this would 
probably be to make flushing write to several sstables as that would help with 
other DTCS issues as well (read repair, USING TIMESTAMP)
* Write tests / benchmarks / ...
* More metrics - we can get very accurate metrics on how much data was actually 
diffing during the repair

If anyone wants to try it out, there is a JMX method 
{{enableMutationBasedRepair}} on the {{ColumnFamilyStoreMBean}} to enable it 
(it pages through and repairs all the data once). 

> Consider Mutation-based Repairs
> ---
>
> Key: CASSANDRA-8911
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Assignee: Marcus Eriksson
> Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming 
> repair.  While we're at it, we could do away with a lot of the complexity 
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" 
> one-by-one.  Instead of dealing with token ranges, make the leaves primary 
> key ranges.  The PK ranges would need to be contiguous, so that the start of 
> each range would match the end of the previous range. (The first and last 
> leaves would need to be open-ended on one end of the PK range.) This would be 
> similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the 
> other replicas along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK 
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row 
> counts don't match, in which case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation

[jira] [Comment Edited] (CASSANDRA-11594) Too many open files on directories

2016-04-22 Thread n0rad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253573#comment-15253573
 ] 

n0rad edited comment on CASSANDRA-11594 at 4/22/16 9:01 AM:


emails_by_user_id is is a MV from emails.

email_logs_query is the keyspace's name.




was (Author: n0rad):
emails_by_user_id is is a MV.

email_logs_query is the keyspace's name.



> Too many open files on directories
> --
>
> Key: CASSANDRA-11594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11594
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: n0rad
>Priority: Critical
> Attachments: openfiles.zip, screenshot.png
>
>
> I have a 6 nodes cluster in prod in 3 racks.
> each node :
> - 4Gb commitlogs on 343 files
> - 275Gb data on 504 files 
> On saturday, 1 node in each rack crash with with too many open files (seems 
> to be the similar node in each rack).
> {code}
> lsof -n -p $PID give me 66899 out of 65826 max
> {code}
> it contains 64527 open directories (2371 uniq)
> a part of the list :
> {code}
> java19076 root 2140r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2141r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2142r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2143r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2144r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2145r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2146r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2147r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2148r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2149r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2150r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2151r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2152r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2153r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2154r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2155r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> {code}
> The 3 others nodes crashes 4 hours later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11594) Too many open files on directories

2016-04-22 Thread n0rad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253573#comment-15253573
 ] 

n0rad edited comment on CASSANDRA-11594 at 4/22/16 9:02 AM:


`emails_by_user_id` is a MV from `emails`.

`email_logs_query` is the keyspace's name.




was (Author: n0rad):
emails_by_user_id is is a MV from emails.

email_logs_query is the keyspace's name.



> Too many open files on directories
> --
>
> Key: CASSANDRA-11594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11594
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: n0rad
>Priority: Critical
> Attachments: openfiles.zip, screenshot.png
>
>
> I have a 6 nodes cluster in prod in 3 racks.
> each node :
> - 4Gb commitlogs on 343 files
> - 275Gb data on 504 files 
> On saturday, 1 node in each rack crash with with too many open files (seems 
> to be the similar node in each rack).
> {code}
> lsof -n -p $PID give me 66899 out of 65826 max
> {code}
> it contains 64527 open directories (2371 uniq)
> a part of the list :
> {code}
> java19076 root 2140r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2141r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2142r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2143r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2144r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2145r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2146r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2147r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2148r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2149r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2150r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2151r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2152r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2153r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2154r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2155r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> {code}
> The 3 others nodes crashes 4 hours later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11594) Too many open files on directories

2016-04-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253579#comment-15253579
 ] 

Alex Petrov commented on CASSANDRA-11594:
-

Thanks, I've meant {{emails}} not {{email_logs_query}}, typed it in wrong.

> Too many open files on directories
> --
>
> Key: CASSANDRA-11594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11594
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: n0rad
>Priority: Critical
> Attachments: openfiles.zip, screenshot.png
>
>
> I have a 6 nodes cluster in prod in 3 racks.
> each node :
> - 4Gb commitlogs on 343 files
> - 275Gb data on 504 files 
> On saturday, 1 node in each rack crash with with too many open files (seems 
> to be the similar node in each rack).
> {code}
> lsof -n -p $PID give me 66899 out of 65826 max
> {code}
> it contains 64527 open directories (2371 uniq)
> a part of the list :
> {code}
> java19076 root 2140r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2141r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2142r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2143r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2144r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2145r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2146r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2147r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2148r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2149r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2150r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2151r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2152r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2153r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2154r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java19076 root 2155r  DIR   8,17  143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> {code}
> The 3 others nodes crashes 4 hours later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11194) materialized views - support explode() on collections

2016-04-22 Thread Keith Wansbrough (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251727#comment-15251727
 ] 

Keith Wansbrough edited comment on CASSANDRA-11194 at 4/22/16 9:11 AM:
---

We would also find it very useful to be able to create a materialized view on a 
set. The {{explode}} syntax looks good for this:
{code}
CREATE TABLE customers (
  id text PRIMARY KEY,
  data text,
  phones frozen>
);
  
CREATE MATERIALIZED VIEW customers_by_phone AS
  SELECT explode(phones), id
  FROM customers
  WHERE phones IS NOT NULL;
{code}

We have a database of customers with an ID as primary key. Each customer has 
zero or more phone numbers. We would like to be able to create a materialized 
view so we can look up by phone number.

Our current schema uses a frozen set for this, but either frozen or unfrozen 
would be fine.

Edit: PostgreSQL calls this function 
[{{unnest}}|http://www.postgresql.org/docs/current/static/functions-array.html#ARRAY-FUNCTIONS-TABLE].


was (Author: kw217):
We would also find it very useful to be able to create a materialized view on a 
set. The {{explode}} syntax looks good for this:
{code}
CREATE TABLE customers (
  id text PRIMARY KEY,
  data text,
  phones frozen>
);
  
CREATE MATERIALIZED VIEW customers_by_phone AS
  SELECT explode(phones), id
  FROM customers
  WHERE phones IS NOT NULL;
{code}

We have a database of customers with an ID as primary key. Each customer has 
zero or more phone numbers. We would like to be able to create a materialized 
view so we can look up by phone number.

Our current schema uses a frozen set for this, but either frozen or unfrozen 
would be fine.

> materialized views - support explode() on collections
> -
>
> Key: CASSANDRA-11194
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11194
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jon Haddad
>
> I'm working on a database design to model a product catalog.  Products can 
> belong to categories.  Categories can belong to multiple sub categories 
> (think about Amazon's complex taxonomies).
> My category table would look like this, giving me individual categories & 
> their parents:
> {code}
> CREATE TABLE category (
> category_id uuid primary key,
> name text,
> parents set
> );
> {code}
> To get a list of all the children of a particular category, I need a table 
> that looks like the following:
> {code}
> CREATE TABLE categories_by_parent (
> parent_id uuid,
> category_id uuid,
> name text,
> primary key (parent_id, category_id)
> );
> {code}
> The important thing to note here is that a single category can have multiple 
> parents.
> I'd like to propose support for collections in materialized views via an 
> explode() function that would create 1 row per item in the collection.  For 
> instance, I'll insert the following 3 rows (2 parents, 1 child) into the 
> category table:
> {code}
> insert into category (category_id, name, parents) values 
> (009fe0e1-5b09-4efc-a92d-c03720324a4f, 'Parent', null);
> insert into category (category_id, name, parents) values 
> (1f2914de-0adf-4afc-b7ad-ddd8dc876ab1, 'Parent2', null);
> insert into category (category_id, name, parents) values 
> (1f93bc07-9874-42a5-a7d1-b741dc9c509c, 'Child', 
> {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 
> });
> cqlsh:test> select * from category;
>  category_id  | name| parents
> --+-+--
>  009fe0e1-5b09-4efc-a92d-c03720324a4f |  Parent | 
> null
>  1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 | 
> null
>  1f93bc07-9874-42a5-a7d1-b741dc9c509c |   Child | 
> {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1}
> (3 rows)
> {code}
> Given the following CQL to select the child category, utilizing an explode 
> function, I would expect to get back 2 rows, 1 for each parent:
> {code}
> select category_id, name, explode(parents) as parent_id from category where 
> category_id = 1f93bc07-9874-42a5-a7d1-b741dc9c509c;
> category_id  | name  | parent_id
> --+---+--
> 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 
> 009fe0e1-5b09-4efc-a92d-c03720324a4f
> 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 
> 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1
> (2 rows)
> {code}
> This functionality would ideally apply to materialized views, since the 
> ability to control partitioning here would allow us to efficiently qu

[jira] [Created] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Robert Stupp (JIRA)
Robert Stupp created CASSANDRA-11631:


 Summary: cqlsh COPY FROM fails for null values with non-prepared 
statements
 Key: CASSANDRA-11631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Stupp
Assignee: Robert Stupp
Priority: Minor


cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
contains null values. Reason is that the {{','.join(r)}} in 
{{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
results in this error message.
{code}
Failed to import 1 rows: TypeError - sequence item 2: expected string, NoneType 
found,  given up without retries
{code}

Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11631:
-
Fix Version/s: 3.x
   Status: Patch Available  (was: Open)

[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11631-copy-from-None-trunk]
[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11631-copy-from-None-trunk-dtest/lastBuild/]


> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania reassigned CASSANDRA-11631:


Assignee: Stefania  (was: Robert Stupp)

> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Stefania
>Priority: Minor
> Fix For: 3.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-11631:
-
Assignee: Robert Stupp  (was: Stefania)

> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-11631:
-
Reviewer: Stefania

> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11629) java.lang.UnsupportedOperationException when selecting rows with counters

2016-04-22 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-11629:
-
Assignee: Alex Petrov

> java.lang.UnsupportedOperationException when selecting rows with counters
> -
>
> Key: CASSANDRA-11629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11629
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 16.04 LTS
> Cassandra 3.0.5 Community Edition
>Reporter: Arnd Hannemann
>Assignee: Alex Petrov
>  Labels: 3.0.5
>
> When selecting a non empty set of rows with counters a exception occurs:
> {code}
> WARN  [SharedPool-Worker-2] 2016-04-21 23:47:47,542 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: java.lang.UnsupportedOperationException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_45]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.5.jar:3.0.5]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.5.jar:3.0.5]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.lang.UnsupportedOperationException: null
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCustom(AbstractType.java:172)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.marshal.AbstractType.compare(AbstractType.java:158) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareForCQL(AbstractType.java:202)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.cql3.Operator.isSatisfiedBy(Operator.java:169) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$SimpleExpression.isSatisfiedBy(RowFilter.java:619)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$CQLFilter$1IsSatisfiedFilter.applyToRow(RowFilter.java:258)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.BaseRows.applyOne(BaseRows.java:95) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:86) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:21) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.Transformation.add(Transformation.java:136) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:102)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$CQLFilter$1IsSatisfiedFilter.applyToPartition(RowFilter.java:246)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$CQLFilter$1IsSatisfiedFilter.applyToPartition(RowFilter.java:236)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:76)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1792)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2445)
>  ~[apache-cassandra-

[jira] [Commented] (CASSANDRA-11629) java.lang.UnsupportedOperationException when selecting rows with counters

2016-04-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253631#comment-15253631
 ] 

Sylvain Lebresne commented on CASSANDRA-11629:
--

Haven't checked but I believe we used to not allow that kind of queries and now 
we do but we need to special case counters in 
{{RowFilter.SimpleExpression.isSatisfiedBy}} for it to work as we need to 
extract the actual value from the counter internal representation (using 
{{CounterContext.total()}}) and probably use {{LongType}} for the comparison 
itself.

> java.lang.UnsupportedOperationException when selecting rows with counters
> -
>
> Key: CASSANDRA-11629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11629
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 16.04 LTS
> Cassandra 3.0.5 Community Edition
>Reporter: Arnd Hannemann
>Assignee: Alex Petrov
>  Labels: 3.0.5
>
> When selecting a non empty set of rows with counters a exception occurs:
> {code}
> WARN  [SharedPool-Worker-2] 2016-04-21 23:47:47,542 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: java.lang.UnsupportedOperationException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_45]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.5.jar:3.0.5]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.5.jar:3.0.5]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.lang.UnsupportedOperationException: null
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareCustom(AbstractType.java:172)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.marshal.AbstractType.compare(AbstractType.java:158) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.marshal.AbstractType.compareForCQL(AbstractType.java:202)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.cql3.Operator.isSatisfiedBy(Operator.java:169) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$SimpleExpression.isSatisfiedBy(RowFilter.java:619)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$CQLFilter$1IsSatisfiedFilter.applyToRow(RowFilter.java:258)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.BaseRows.applyOne(BaseRows.java:95) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:86) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:21) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.Transformation.add(Transformation.java:136) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:102)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$CQLFilter$1IsSatisfiedFilter.applyToPartition(RowFilter.java:246)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.filter.RowFilter$CQLFilter$1IsSatisfiedFilter.applyToPartition(RowFilter.java:236)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:76)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> 

[jira] [Commented] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253662#comment-15253662
 ] 

Stefania commented on CASSANDRA-11631:
--

The patch looks good.

We need to extend the dtests introduced for CASSANDRA-11549 to cover non 
prepared statements and nested types with null sub-fields as well. I can work 
on the tests next week.

> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253677#comment-15253677
 ] 

Robert Stupp commented on CASSANDRA-11631:
--

Is the fix version 3.6 correct? Or would it be 3.0?

> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8911) Consider Mutation-based Repairs

2016-04-22 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253684#comment-15253684
 ] 

Aleksey Yeschenko commented on CASSANDRA-8911:
--

bq. Avoid breaking DTCS etc, since all mutations go into the same memtable

They are now, but don't have to be with this ticket. Should probably just have 
its own set of memtables for repair, so that we can avoid messing up compaction 
strategies, and in general isolate regular write path from repair mutation 
write path for load control purposes. For the same reason, should not be 
reusing {{MUTATION}} verb.

> Consider Mutation-based Repairs
> ---
>
> Key: CASSANDRA-8911
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Assignee: Marcus Eriksson
> Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming 
> repair.  While we're at it, we could do away with a lot of the complexity 
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" 
> one-by-one.  Instead of dealing with token ranges, make the leaves primary 
> key ranges.  The PK ranges would need to be contiguous, so that the start of 
> each range would match the end of the previous range. (The first and last 
> leaves would need to be open-ended on one end of the PK range.) This would be 
> similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the 
> other replicas along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK 
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row 
> counts don't match, in which case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation covering that 
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
>  * With the current repair behavior of streaming, vnode-enabled clusters may 
> need to stream hundreds of small SSTables.  This results in increased compact
> ion load on the receiving node.  With the mutation-based approach, memtables 
> would naturally merge these.
>  * It's simple to throttle.  For example, you could give a number of rows/sec 
> that should be repaired.
>  * It's easy to see what PK range has been repaired so far.  This could make 
> it simpler to resume a repair that fails midway.
>  * Inconsistencies start to be repaired almost right away.
>  * Less special code \(?\)
>  * Wide partitions are no longer a problem.
> There are a few problems I can think of:
>  * Counters.  I don't know if this can be made safe, or if they need to be 
> skipped.
>  * To support incremental repair, we need to be able to read from only 
> repaired sstables.  Probably not too difficult to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11632) Commitlog corruption

2016-04-22 Thread DOAN DuyHai (JIRA)
DOAN DuyHai created CASSANDRA-11632:
---

 Summary: Commitlog corruption
 Key: CASSANDRA-11632
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11632
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 3.6 SNAPSHOT
Reporter: DOAN DuyHai


{noformat}
INFO  10:00:08 Replaying 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
ERROR 10:00:08 Exiting due to error while processing commit log during 
initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat. 
 This may be caused by replaying a mutation against a table with the same name 
but incompatible schema.  Exception follows: 
org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 0th 
field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
[main/:na]
{noformat}

The commilog files are downloadable here: 
https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11632) Commitlog corruption

2016-04-22 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-11632:

Description: 
{noformat}
INFO  10:00:08 Replaying 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
ERROR 10:00:08 Exiting due to error while processing commit log during 
initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat. 
 This may be caused by replaying a mutation against a table with the same name 
but incompatible schema.  Exception follows: 
org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 0th 
field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
[main/:na]
{noformat}

I'm using the 3.6_SNAPSHOT at `ae063e8 JSON datetime formatting needs timezone`

The commilog files are downloadable here: 
https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing


  was:
{noformat}
INFO  10:00:08 Replaying 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
ERROR 10:00:08 Exiting due to error while processing commit log during 
initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat. 
 This may be caused by replaying a mutation against a table with the same name 
but incompatible schema.  Exception follows: 
org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 0th 
field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
[main/:na]
{noformat}

The commilog files are downloadable here: 
https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing



> Commitlog corruption
> 
>
> Key: CASSANDRA-11632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11632
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.6 SNAPSHOT
>Reporter: DOAN DuyHai
>
> {noformat}
> INFO  10:00:08 Replaying 
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
>  
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
> ERROR 10:00:08 Exiting due to error while processing commit log during 
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected error deserializing mu

[jira] [Updated] (CASSANDRA-11632) Commitlog corruption

2016-04-22 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-11632:

Description: 
{noformat}
INFO  10:00:08 Replaying 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
ERROR 10:00:08 Exiting due to error while processing commit log during 
initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat. 
 This may be caused by replaying a mutation against a table with the same name 
but incompatible schema.  Exception follows: 
org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 0th 
field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
[main/:na]
{noformat}

I'm using the 3.6_SNAPSHOT at _ae063e8 JSON datetime formatting needs timezone_

The commilog files are downloadable here: 
https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing


  was:
{noformat}
INFO  10:00:08 Replaying 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
 
/Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
ERROR 10:00:08 Exiting due to error while processing commit log during 
initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Unexpected error deserializing mutation; saved to 
/var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat. 
 This may be caused by replaying a mutation against a table with the same name 
but incompatible schema.  Exception follows: 
org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 0th 
field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
 [main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
[main/:na]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
[main/:na]
{noformat}

I'm using the 3.6_SNAPSHOT at `ae063e8 JSON datetime formatting needs timezone`

The commilog files are downloadable here: 
https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing



> Commitlog corruption
> 
>
> Key: CASSANDRA-11632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11632
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.6 SNAPSHOT
>Reporter: DOAN DuyHai
>
> {noformat}
> INFO  10:00:08 Replaying 
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
>  
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
> ERROR 10:00:08 Exiting due to error while processing commit log during 
> initialization.
> org.apache.cassandra.db.commitlog

[jira] [Updated] (CASSANDRA-11460) memory leak

2016-04-22 Thread stone (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stone updated CASSANDRA-11460:
--
Tester: Benedict
Status: Awaiting Feedback  (was: In Progress)

> memory leak
> ---
>
> Key: CASSANDRA-11460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11460
> Project: Cassandra
>  Issue Type: Bug
>Reporter: stone
>Priority: Critical
> Attachments: aaa.jpg
>
>
> env:
> cassandra3.3
> jdk8
> 8G Ram
> so set
> MAX_HEAP_SIZE="2G"
> HEAP_NEWSIZE="400M"
> 1.met same problem about this:
> https://issues.apache.org/jira/browse/CASSANDRA-9549
> I confuse about that this was fixed in release 3.3 according this page:
> https://github.com/apache/cassandra/blob/trunk/CHANGES.txt
> so I change to 3.4,and also have  found this problem again 
> I think this fix should be included in 3.3.3.4
> can you explain about this?
> 2.our write rate exceed the value that our cassandra env can support,
> but i think it should descrese the write rate,or block.consumer the writed 
> data,keep the memory down,then go on writing,not cause out-of-memory instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11615) cassandra-stress blocks when connecting to a big cluster

2016-04-22 Thread Eduard Tudenhoefner (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner updated CASSANDRA-11615:

Attachment: 11615-3.0-2nd.patch

[~tjake] attached a second patch, where it can be controlled with an option. 
I'm also go ahead and talk to the C* Java driver guys about it

> cassandra-stress blocks when connecting to a big cluster
> 
>
> Key: CASSANDRA-11615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11615
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.x
>
> Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch
>
>
> I had a *100* node cluster and was running 
> {code}
> cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' 
> 'limit=1000/s'
> {code}
> Based on the thread dump it looks like it's been blocked at 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96
> {code}
> "Thread-20" #245 prio=5 os_prio=0 tid=0x7f3781822000 nid=0x46c4 waiting 
> for monitor entry [0x7f36cc788000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> "Thread-19" #244 prio=5 os_prio=0 tid=0x7f378182 nid=0x46c3 waiting 
> for monitor entry [0x7f36cc889000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> {code}
> I was trying the same with with a smaller cluster (50 nodes) and it was 
> working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10134) Always require replace_address to replace existing address

2016-04-22 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253719#comment-15253719
 ] 

Sam Tunnicliffe commented on CASSANDRA-10134:
-

Thanks [~jkni]

On reflection, the concern about MV builds when operators stop gossip is a fair 
one. Reviewing SS, it seems to me that {{isInitialized}} is poorly named and 
only really used to check the status of gossip, so I've renamed it to 
{{gossipActive}} and only set it when gossip is actually started, rather than 
on entry of {{initServer}}. With that renaming, I've re-purposed 
{{isInitialized}} to indicate that {{initServer}} has completed and while I was 
at it, renamed {{isSetupCompleted}} to {{isDaemonSetupCompleted}}, which I 
think is more accurate. With those changes, MV builds will not be submitted 
before SS is initialized, but they will continue to run as expected when gossip 
is disabled. I think the only contentious issue should be that 
{{isInitialized()}} is part of the {{StorageServiceMBean}} interface and this 
changes its semantics slightly. In mitigation of that, I would argue that its 
semantics were fairly unclear before, with the flag being set so early. The 
bean also has redundant methods to obtain the gossip status already, so there's 
no loss of information for clients here.

A side effect of setting {{isInitialized}} is that it also gives us what's 
suggested [in this 
comment|https://issues.apache.org/jira/browse/CASSANDRA-11537?focusedCommentId=15245412]
 for CASSANDRA-11537.

I've also fixed all the other nits, and kicked off CI runs.


> Always require replace_address to replace existing address
> --
>
> Key: CASSANDRA-10134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10134
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
>  Labels: docs-impacting
> Fix For: 3.x
>
>
> Normally, when a node is started from a clean state with the same address as 
> an existing down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception 
> encountered during startup
> java.lang.RuntimeException: A node with address /127.0.0.3 already exists, 
> cancelling join. Use cassandra.replace_address if you want to replace this 
> node.
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) 
> [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed 
> list, it will not throw this error and will start normally.  The new node 
> then takes over the host ID of the old node (even if the tokens are 
> different), and the only message you will see is a warning in the other 
> nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, 
> hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information 
> for a down node without replacing it.  To fix this, we should check for an 
> endpoint collision even if {{auto_bootstrap}} is false or the node is a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11615) cassandra-stress blocks when connecting to a big cluster

2016-04-22 Thread Eduard Tudenhoefner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253718#comment-15253718
 ] 

Eduard Tudenhoefner edited comment on CASSANDRA-11615 at 4/22/16 10:43 AM:
---

[~tjake] attached a second patch, where it can be controlled with an option. I 
will also go ahead and talk to the C* Java driver guys about it


was (Author: eduard.tudenhoefner):
[~tjake] attached a second patch, where it can be controlled with an option. 
I'm also go ahead and talk to the C* Java driver guys about it

> cassandra-stress blocks when connecting to a big cluster
> 
>
> Key: CASSANDRA-11615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11615
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.x
>
> Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch
>
>
> I had a *100* node cluster and was running 
> {code}
> cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' 
> 'limit=1000/s'
> {code}
> Based on the thread dump it looks like it's been blocked at 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96
> {code}
> "Thread-20" #245 prio=5 os_prio=0 tid=0x7f3781822000 nid=0x46c4 waiting 
> for monitor entry [0x7f36cc788000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> "Thread-19" #244 prio=5 os_prio=0 tid=0x7f378182 nid=0x46c3 waiting 
> for monitor entry [0x7f36cc889000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> {code}
> I was trying the same with with a smaller cluster (50 nodes) and it was 
> working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11470) dtest failure in materialized_views_test.TestMaterializedViews.base_replica_repair_test

2016-04-22 Thread Alexander Nechiporuk (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253732#comment-15253732
 ] 

Alexander Nechiporuk commented on CASSANDRA-11470:
--

Hi, probably issue still exists in 3.0.5, but appears in different place.

{code}
nodetool cfstats XXX.YYY
Keyspace: XXX
Read Count: 75443752
Read Latency: 27.975226176251148 ms.
Write Count: 66394239
Write Latency: 0.03455783252519846 ms.
Pending Flushes: 0
Table: 
SSTable count: 778
SSTables in each level: [20/4, 109/10, 105/100, 542, 0, 0, 0, 
0, 0]
Space used (live): 152332003329
Space used (total): 152332003329
error: Failed to list directory files in /place/cassandra/data// 
inconsistent disk state for transaction 
[/place/cassandra/data///ma_txn_compaction_2695f850-0874-11e6-b5e6-1335f8eceb06.log]
-- StackTrace --
java.lang.RuntimeException: Failed to list directory files in 
/place/cassandra/data/XXX/YYY, inconsistent disk state for transaction 
[/place/cassan
dra/data/XXX/YYY/ma_txn_compaction_2695f850-0874-11e6-b5e6-1335f8eceb06.log]
at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:149)
at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:103)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:71)
at 
org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:49)
at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
at 
org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
at 
org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
at 
org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981)
at 
org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
at 
org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883)
at 
org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
at 
org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637)
at 
org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634)
at 
org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge.getValue(CassandraMetricsRegistry.java:235)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnect

[jira] [Created] (CASSANDRA-11633) cqlsh COPY FROM fails with []{} chars in UDT/tuple fields/values

2016-04-22 Thread Robert Stupp (JIRA)
Robert Stupp created CASSANDRA-11633:


 Summary: cqlsh COPY FROM fails with []{} chars in UDT/tuple 
fields/values
 Key: CASSANDRA-11633
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11633
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Stupp
Assignee: Robert Stupp
Priority: Minor


Assuming you have a table with a UDT and the import CSV line looks like this:

{code}
ff92ee2f-2aa1-4008-bba7-5600112233b9,"{udt_field: 'N[24-26', other_field: '24', 
more_data: '}",,some,more,data,follows
{code}

cqlsh COPY FROM raises {{IndexError: tuple index out of range}} with a 
traceback like this:
{code}
  File "/Users/snazy/devel/cassandra/trunk/bin/../pylib/cqlshlib/copyutil.py", 
line 2271, in make_prepared_batch_statement
statement._statements_and_parameters = [(True, query.query_id, 
query.bind(r).values) for r in batch['rows']]
  File 
"/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/query.py",
 line 411, in bind
return BoundStatement(self).bind(values)
  File 
"/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/query.py",
 line 531, in bind
self.values.append(col_spec.type.serialize(value, proto_version))
  File 
"/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cqltypes.py",
 line 686, in serialize
return cls.serialize_safe(val, protocol_version)
  File 
"/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cqltypes.py",
 line 906, in serialize_safe
item = val[i]
{code}

Reason is in {{ImportConversion._get_converter.split}} that accidentally 
recognizes square and curly brackets in quoted strings. Attached patch should 
fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11633) cqlsh COPY FROM fails with []{} chars in UDT/tuple fields/values

2016-04-22 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11633:
-
Status: Patch Available  (was: Open)

[branch|https://github.com/apache/cassandra/compare/trunk...snazy:11633-cqlsh-split-trunk]
[dtest|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-11633-cqlsh-split-trunk-dtest/lastBuild/]

Fix version 3.x or 3.0.x?

> cqlsh COPY FROM fails with []{} chars in UDT/tuple fields/values
> 
>
> Key: CASSANDRA-11633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11633
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
>
> Assuming you have a table with a UDT and the import CSV line looks like this:
> {code}
> ff92ee2f-2aa1-4008-bba7-5600112233b9,"{udt_field: 'N[24-26', other_field: 
> '24', more_data: '}",,some,more,data,follows
> {code}
> cqlsh COPY FROM raises {{IndexError: tuple index out of range}} with a 
> traceback like this:
> {code}
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../pylib/cqlshlib/copyutil.py", line 
> 2271, in make_prepared_batch_statement
> statement._statements_and_parameters = [(True, query.query_id, 
> query.bind(r).values) for r in batch['rows']]
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/query.py",
>  line 411, in bind
> return BoundStatement(self).bind(values)
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/query.py",
>  line 531, in bind
> self.values.append(col_spec.type.serialize(value, proto_version))
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cqltypes.py",
>  line 686, in serialize
> return cls.serialize_safe(val, protocol_version)
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cqltypes.py",
>  line 906, in serialize_safe
> item = val[i]
> {code}
> Reason is in {{ImportConversion._get_converter.split}} that accidentally 
> recognizes square and curly brackets in quoted strings. Attached patch should 
> fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11631) cqlsh COPY FROM fails for null values with non-prepared statements

2016-04-22 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11631:
-
Fix Version/s: (was: 3.x)
   2.1.x
   Status: Open  (was: Patch Available)

(canceling patch, requires back port to 2.1)

> cqlsh COPY FROM fails for null values with non-prepared statements
> --
>
> Key: CASSANDRA-11631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11631
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 2.1.x
>
>
> cqlsh's {{COPY FROM ... WITH PREPAREDSTATEMENTS = False}} fails if the row 
> contains null values. Reason is that the {{','.join(r)}} in 
> {{make_non_prepared_batch_statement}} doesn't seem to handle {{None}}, which 
> results in this error message.
> {code}
> Failed to import 1 rows: TypeError - sequence item 2: expected string, 
> NoneType found,  given up without retries
> {code}
> Attached patch should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11633) cqlsh COPY FROM fails with []{} chars in UDT/tuple fields/values

2016-04-22 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11633:
-
Fix Version/s: 2.1.x
   Status: Open  (was: Patch Available)

(canceling patch, requires back port to 2.1)

> cqlsh COPY FROM fails with []{} chars in UDT/tuple fields/values
> 
>
> Key: CASSANDRA-11633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11633
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 2.1.x
>
>
> Assuming you have a table with a UDT and the import CSV line looks like this:
> {code}
> ff92ee2f-2aa1-4008-bba7-5600112233b9,"{udt_field: 'N[24-26', other_field: 
> '24', more_data: '}",,some,more,data,follows
> {code}
> cqlsh COPY FROM raises {{IndexError: tuple index out of range}} with a 
> traceback like this:
> {code}
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../pylib/cqlshlib/copyutil.py", line 
> 2271, in make_prepared_batch_statement
> statement._statements_and_parameters = [(True, query.query_id, 
> query.bind(r).values) for r in batch['rows']]
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/query.py",
>  line 411, in bind
> return BoundStatement(self).bind(values)
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/query.py",
>  line 531, in bind
> self.values.append(col_spec.type.serialize(value, proto_version))
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cqltypes.py",
>  line 686, in serialize
> return cls.serialize_safe(val, protocol_version)
>   File 
> "/Users/snazy/devel/cassandra/trunk/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cqltypes.py",
>  line 906, in serialize_safe
> item = val[i]
> {code}
> Reason is in {{ImportConversion._get_converter.split}} that accidentally 
> recognizes square and curly brackets in quoted strings. Attached patch should 
> fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253812#comment-15253812
 ] 

Robert Stupp commented on CASSANDRA-11547:
--

I'm in favor of having this patch. Bad things happen in the wild and I agree to 
the argument that having something that gives an operator an indication that 
the wall clock somehow went too fast or too slow can definitely explain things 
that would otherwise be considered as "magic" or "strange behavior".
Making it a bit more configurable to the points mentioned by Sylvain would be a 
good thing.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: NullPointerException if metrics reporter config file doesn't exist

2016-04-22 Thread snazy
Repository: cassandra
Updated Branches:
  refs/heads/trunk a18402ca0 -> 7afc1571b


NullPointerException if metrics reporter config file doesn't exist

patch by Christopher Batey; reviewed by Robert Stupp for CASSANDRA-11544


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7afc1571
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7afc1571
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7afc1571

Branch: refs/heads/trunk
Commit: 7afc1571bb465ba45572f849e6f285182b6c1e7d
Parents: a18402c
Author: Christopher Batey 
Authored: Fri Apr 22 14:21:23 2016 +0200
Committer: Robert Stupp 
Committed: Fri Apr 22 14:21:23 2016 +0200

--
 .../org/apache/cassandra/service/CassandraDaemon.java | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/7afc1571/src/java/org/apache/cassandra/service/CassandraDaemon.java
--
diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java 
b/src/java/org/apache/cassandra/service/CassandraDaemon.java
index 160235a..0c233a7 100644
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java
@@ -22,6 +22,7 @@ import java.io.IOException;
 import java.lang.management.ManagementFactory;
 import java.lang.management.MemoryPoolMXBean;
 import java.net.InetAddress;
+import java.net.URL;
 import java.net.UnknownHostException;
 import java.rmi.registry.LocateRegistry;
 import java.rmi.server.RMIServerSocketFactory;
@@ -327,8 +328,13 @@ public class CassandraDaemon
 logger.info("Trying to load metrics-reporter-config from file: 
{}", metricsReporterConfigFile);
 try
 {
-String reportFileLocation = 
CassandraDaemon.class.getClassLoader().getResource(metricsReporterConfigFile).getFile();
-
ReporterConfig.loadFromFile(reportFileLocation).enableAll(CassandraMetricsRegistry.Metrics);
+URL resource = 
CassandraDaemon.class.getClassLoader().getResource(metricsReporterConfigFile);
+if (resource == null) {
+logger.warn("Failed to load metrics-reporter-config, file 
does not exist: {}", metricsReporterConfigFile);
+} else {
+String reportFileLocation = resource.getFile();
+
ReporterConfig.loadFromFile(reportFileLocation).enableAll(CassandraMetricsRegistry.Metrics);
+}
 }
 catch (Exception e)
 {



[jira] [Updated] (CASSANDRA-11544) NullPointerException if metrics reporter config file doesn't exist

2016-04-22 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11544:
-
Reviewer: Robert Stupp

> NullPointerException if metrics reporter config file doesn't exist
> --
>
> Key: CASSANDRA-11544
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11544
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Christopher Batey
>Assignee: Christopher Batey
>Priority: Minor
> Fix For: 3.6
>
> Attachments: 
> 0001-Avoid-NPE-exception-when-metrics-reporter-config-doe.patch
>
>
> Patch attached or at 
> https://github.com/chbatey/cassandra-1/tree/npe-when-metrics-file-not-exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11544) NullPointerException if metrics reporter config file doesn't exist

2016-04-22 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11544:
-
   Resolution: Fixed
Fix Version/s: 3.6
   Status: Resolved  (was: Patch Available)

+1

committed as 7afc1571bb465ba45572f849e6f285182b6c1e7d to trunk

> NullPointerException if metrics reporter config file doesn't exist
> --
>
> Key: CASSANDRA-11544
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11544
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Christopher Batey
>Assignee: Christopher Batey
>Priority: Minor
> Fix For: 3.6
>
> Attachments: 
> 0001-Avoid-NPE-exception-when-metrics-reporter-config-doe.patch
>
>
> Patch attached or at 
> https://github.com/chbatey/cassandra-1/tree/npe-when-metrics-file-not-exist



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253840#comment-15253840
 ] 

Alex Petrov commented on CASSANDRA-11547:
-

There's another issue, possibly related to this one: 
[6680|https://issues.apache.org/jira/browse/CASSANDRA-6680] and 
[9655|https://issues.apache.org/jira/browse/CASSANDRA-9655], which are talking 
more about node clocks being out of sync and how it plays together with LWT. 

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11632) Commitlog corruption

2016-04-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253844#comment-15253844
 ] 

Sylvain Lebresne commented on CASSANDRA-11632:
--

Additional information like whether this is easily reproducible or not (and if 
so, reproduction steps) and maybe some infos on the schema of the table 
involved would be pretty useful here (the latter is in particular kind of 
needed to make any sense of those commit log files).

The error message also says "This may be caused by replaying a mutation against 
a table with the same name but incompatible schema": are you absolutely sure 
you didn't messed up with the schema in any way (dropping a table and 
recreating one with the same name but slightly different schema)?

> Commitlog corruption
> 
>
> Key: CASSANDRA-11632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11632
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.6 SNAPSHOT
>Reporter: DOAN DuyHai
>
> {noformat}
> INFO  10:00:08 Replaying 
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
>  
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
> ERROR 10:00:08 Exiting due to error while processing commit log during 
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected error deserializing mutation; saved to 
> /var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat.
>   This may be caused by replaying a mutation against a table with the same 
> name but incompatible schema.  Exception follows: 
> org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 
> 0th field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
> [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
> [main/:na]
> {noformat}
> I'm using the 3.6_SNAPSHOT at _ae063e8 JSON datetime formatting needs 
> timezone_
> The commilog files are downloadable here: 
> https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11537) Give clear error when certain nodetool commands are issued before server is ready

2016-04-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253848#comment-15253848
 ] 

Sylvain Lebresne commented on CASSANDRA-11537:
--

bq. I think the user wants the stack trace. IE if my job is to script a cluster 
wide compaction using nodetool, the 'compact' command should let me know that 
this failed. The best is likely the non 0 result and message bubble all the way 
out to the user.

I completely agree that we want a non 0 return status for the command on such 
error, but we can totally have that without a trace.

> Give clear error when certain nodetool commands are issued before server is 
> ready
> -
>
> Key: CASSANDRA-11537
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11537
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>  Labels: lhf
>
> As an ops person upgrading and servicing Cassandra servers, I require a more 
> clear message when I issue a nodetool command that the server is not ready 
> for it so that I am not confused.
> Technical description:
> If you deploy a new binary, restart, and issue nodetool 
> scrub/compact/updatess etc you get unfriendly assertion. An exception would 
> be easier to understand. Also if a user has turned assertions off it is 
> unclear what might happen. 
> {noformat}
> EC1: Throw exception to make it clear server is still in start up process. 
> :~# nodetool upgradesstables
> error: null
> -- StackTrace --
> java.lang.AssertionError
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:97)
> at 
> org.apache.cassandra.service.StorageService.getValidKeyspace(StorageService.java:2573)
> at 
> org.apache.cassandra.service.StorageService.getValidColumnFamilies(StorageService.java:2661)
> at 
> org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2421)
> {noformat}
> EC1: 
> Patch against 2.1 (branch)
> https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:exception-on-startup?expand=1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11537) Give clear error when certain nodetool commands are issued before server is ready

2016-04-22 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253854#comment-15253854
 ] 

Robert Stupp commented on CASSANDRA-11537:
--

I think the better indicator if C* startup has completed, is 
{{o.a.c.service.CassandraDaemon#setupCompleted}}.

I'd rather not like to see a new, custom exception being thrown and serialized 
to nodetool or another management tool. Just stick with the Java ones, probably 
IllegalStateException. Java serialization issues could otherwise hide the real 
cause (so, instead of NotInitializedException you'd maybe get a strange 
ClassNotFoundExcepion).

You could also expose {{o.a.c.service.StorageService#isSetupCompleted}} via 
JMX, which would be a nice LHF IMO. This would also allow tools to check this 
status _before_ these actually try to execute a command.

> Give clear error when certain nodetool commands are issued before server is 
> ready
> -
>
> Key: CASSANDRA-11537
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11537
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>  Labels: lhf
>
> As an ops person upgrading and servicing Cassandra servers, I require a more 
> clear message when I issue a nodetool command that the server is not ready 
> for it so that I am not confused.
> Technical description:
> If you deploy a new binary, restart, and issue nodetool 
> scrub/compact/updatess etc you get unfriendly assertion. An exception would 
> be easier to understand. Also if a user has turned assertions off it is 
> unclear what might happen. 
> {noformat}
> EC1: Throw exception to make it clear server is still in start up process. 
> :~# nodetool upgradesstables
> error: null
> -- StackTrace --
> java.lang.AssertionError
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:97)
> at 
> org.apache.cassandra.service.StorageService.getValidKeyspace(StorageService.java:2573)
> at 
> org.apache.cassandra.service.StorageService.getValidColumnFamilies(StorageService.java:2661)
> at 
> org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2421)
> {noformat}
> EC1: 
> Patch against 2.1 (branch)
> https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:exception-on-startup?expand=1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

2016-04-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253910#comment-15253910
 ] 

Marcus Eriksson commented on CASSANDRA-11623:
-

Nice catch!

It looks like we could just use the 
[chunkOffset|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java#L143]
 directly instead though? Old getOnDiskFilePointer() should always be [exactly 
the 
same|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java#L104]
 as the chunkOffset

> Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
> ---
>
> Key: CASSANDRA-11623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tom Petracca
>Priority: Minor
> Attachments: compactiontask_profile.png
>
>
> Been doing some performance tuning and profiling of my cassandra cluster and 
> noticed that compaction speeds for my tables that I know to have very short 
> rows were going particularly slowly.  Profiling shows a ton of time being 
> spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a 
> CompactionTask shows that a majority of time is being spent lseek (called by 
> getOnDiskFilePointer), and not read or write.
> Going deeper it looks like we call getOnDiskFilePointer each row (sometimes 
> multiple times per row) in order to see if we've reached our expected sstable 
> size and should start a new writer.  This is pretty unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

2016-04-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253919#comment-15253919
 ] 

Marcus Eriksson commented on CASSANDRA-11623:
-

Btw, if the {{getOnDiskFilePointer}} is expensive, we should probably 
investigate if we can remove that {{seekToChunkStart}} call

> Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
> ---
>
> Key: CASSANDRA-11623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tom Petracca
>Priority: Minor
> Attachments: compactiontask_profile.png
>
>
> Been doing some performance tuning and profiling of my cassandra cluster and 
> noticed that compaction speeds for my tables that I know to have very short 
> rows were going particularly slowly.  Profiling shows a ton of time being 
> spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a 
> CompactionTask shows that a majority of time is being spent lseek (called by 
> getOnDiskFilePointer), and not read or write.
> Going deeper it looks like we call getOnDiskFilePointer each row (sometimes 
> multiple times per row) in order to see if we've reached our expected sstable 
> size and should start a new writer.  This is pretty unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253933#comment-15253933
 ] 

T Jake Luciani commented on CASSANDRA-11547:


I don't see how sleeping and waking up can possibly work in conjunction with 
GC. The idea id jHiccup is to detect latency hits not clock drift.
Stress for example has this notion of [timing | 
https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
 which does account for GC but only in terms of data collected for an interval 
not actual clock values.

I like the approach of 6680 to do this.  It's about the relative clock drift 
compared to other nodes.  Riemann for example uses [this approach| 
http://riemann.io/api/riemann.streams.html#var-clock-skew] which worked well 
for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but 
large ones (100s of millis - seconds)

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253933#comment-15253933
 ] 

T Jake Luciani edited comment on CASSANDRA-11547 at 4/22/16 1:35 PM:
-

I don't see how sleeping and waking up can possibly work reliably in 
conjunction with GC. The idea od jHiccup is to detect latency hits not clock 
drift.
Stress for example has this notion of [timing | 
https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
 which does account for GC but only in terms of data collected for an interval 
not actual clock values.

I like the approach of 6680 to do this.  It's about the relative clock drift 
compared to other nodes.  Riemann for example uses [this approach| 
http://riemann.io/api/riemann.streams.html#var-clock-skew] which worked well 
for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but 
large ones (100s of millis - seconds)


was (Author: tjake):
I don't see how sleeping and waking up can possibly work in conjunction with 
GC. The idea id jHiccup is to detect latency hits not clock drift.
Stress for example has this notion of [timing | 
https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
 which does account for GC but only in terms of data collected for an interval 
not actual clock values.

I like the approach of 6680 to do this.  It's about the relative clock drift 
compared to other nodes.  Riemann for example uses [this approach| 
http://riemann.io/api/riemann.streams.html#var-clock-skew] which worked well 
for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but 
large ones (100s of millis - seconds)

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253933#comment-15253933
 ] 

T Jake Luciani edited comment on CASSANDRA-11547 at 4/22/16 1:36 PM:
-

I don't see how sleeping and waking up can possibly work reliably in 
conjunction with GC. The idea of jHiccup is to detect latency hits not clock 
drift.
Stress for example has this notion of [timing | 
https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
 which does account for GC but only in terms of data collected for an interval 
not actual clock values.

I like the approach of 6680 to do this.  It's about the relative clock drift 
compared to other nodes.  Riemann for example uses [this approach| 
http://riemann.io/api/riemann.streams.html#var-clock-skew] which worked well 
for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but 
large ones (100s of millis - seconds)


was (Author: tjake):
I don't see how sleeping and waking up can possibly work reliably in 
conjunction with GC. The idea od jHiccup is to detect latency hits not clock 
drift.
Stress for example has this notion of [timing | 
https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
 which does account for GC but only in terms of data collected for an interval 
not actual clock values.

I like the approach of 6680 to do this.  It's about the relative clock drift 
compared to other nodes.  Riemann for example uses [this approach| 
http://riemann.io/api/riemann.streams.html#var-clock-skew] which worked well 
for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but 
large ones (100s of millis - seconds)

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

2016-04-22 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253951#comment-15253951
 ] 

Branimir Lambov commented on CASSANDRA-9669:


Rebased patches with a couple of extra tests:

|[2.2|https://github.com/blambov/cassandra/tree/belliottsmith-9669-2.2-rebased-2]|[utest|http://cassci.datastax.com/job/blambov-belliottsmith-9669-2.2-rebased-2-testall/]|[dtest|http://cassci.datastax.com/job/blambov-belliottsmith-9669-2.2-rebased-2-dtest/]|
|[3.0|https://github.com/blambov/cassandra/tree/belliottsmith-9669-3.0-rebased-2]|[utest|http://cassci.datastax.com/job/blambov-belliottsmith-9669-3.0-rebased-2-testall/]|[dtest|http://cassci.datastax.com/job/blambov-belliottsmith-9669-3.0-rebased-2-dtest/]|

The code looks good to me in both, tests are still running.

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> ---
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benedict
>Priority: Critical
>  Labels: correctness
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253959#comment-15253959
 ] 

Jack Krupansky commented on CASSANDRA-11547:


It would be nice to have three distinct layers of defense for clock drift:

1. External monitoring service to alert users when the clocks on a cluster may 
be drifting and a super-alert when any clock in the cluster gets too far out of 
range. Hopefully catch and correct clock drift before cluster gets into trouble.

2. A warning from Cassandra itself if node clock gets more than a minor 
threshold out of sync with the majority of the cluster.

3. A strong warning or even freeze if node's clock is more than a major 
threshold out of sync with majority of cluster.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253981#comment-15253981
 ] 

Jason Brown commented on CASSANDRA-11547:
-

bq. [~snazy] Making it a bit more configurable

I thought it was reasonably configurable :). I'd be happy to add more if think 
it's reasonable.

bq. [~tjake] I don't think we can reliably detect minor differences

I agree, and that's why the patch wakes up every five minutes, iirc; the wake 
period is comfigurable. With this patch, we're not trying catch things at the 
smallest size, a la jHiccup, but really just want to catch things after large 
enough time distances. The defaults were intended to work around/with "large" 
GC pauses, and we can change the wording for log messages to include references 
to that; but at the end of the day if you've got 5 minute GC pauses, you've got 
problems, anyway.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253982#comment-15253982
 ] 

Robert Stupp commented on CASSANDRA-11547:
--

bq. strong warning or even freeze

I'm not excited about freezing a node, if some {{if (clockDrift > X)}} 
triggers. This can (and in most installations will) lead to a complete outage 
of the cluster.

bq. warning ... out of sync with the majority of the cluster

Is it the majority (quorum?) of all nodes, of all live nodes, of all reachable 
nodes? I think that is way too complicated.

Issuing a warning as in this patch is absolutely fine IMO. If someone wants to 
freeze a node if such a warning is issued, it's still possible by monitoring 
the log file. It's also possible to send an alert by monitoring the log file 
(as many people already do : monitoring the log file for errors & warnings).

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253981#comment-15253981
 ] 

Jason Brown edited comment on CASSANDRA-11547 at 4/22/16 2:02 PM:
--

bq. [~snazy] Making it a bit more configurable

I thought it was reasonably configurable :). I'd be happy to add more if we 
think it's reasonable.

bq. [~tjake] I don't think we can reliably detect minor differences

I agree, and that's why the patch wakes up every five minutes, iirc; the wake 
period is comfigurable. With this patch, we're not trying catch things at the 
smallest size, a la jHiccup, but really just want to catch things after large 
enough time distances. The defaults were intended to work around/with "large" 
GC pauses, and we can change the wording for log messages to include references 
to that; but at the end of the day if you've got 5 minute GC pauses, you've got 
problems, anyway.


was (Author: jasobrown):
bq. [~snazy] Making it a bit more configurable

I thought it was reasonably configurable :). I'd be happy to add more if think 
it's reasonable.

bq. [~tjake] I don't think we can reliably detect minor differences

I agree, and that's why the patch wakes up every five minutes, iirc; the wake 
period is comfigurable. With this patch, we're not trying catch things at the 
smallest size, a la jHiccup, but really just want to catch things after large 
enough time distances. The defaults were intended to work around/with "large" 
GC pauses, and we can change the wording for log messages to include references 
to that; but at the end of the day if you've got 5 minute GC pauses, you've got 
problems, anyway.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253981#comment-15253981
 ] 

Jason Brown edited comment on CASSANDRA-11547 at 4/22/16 2:03 PM:
--

bq. [~snazy] Making it a bit more configurable

I thought it was reasonably configurable :). I'd be happy to add more if we 
think it's reasonable.

bq. [~tjake] I don't think we can reliably detect minor differences

I agree, and that's why the patch wakes up every five minutes, iirc; the wake 
period is comfigurable. With this patch, we're not trying catch things at the 
smallest size, a la jHiccup, but really just want to catch things after large 
enough time distances. The defaults were intended to work around/with "large" 
GC pauses, and we can change the wording for log messages to include references 
to that; but at the end of the day if you've got 5 minute GC pauses, you've got 
problems, anyway.


was (Author: jasobrown):
bq. [~snazy] Making it a bit more configurable

I thought it was reasonably configurable :). I'd be happy to add more if we 
think it's reasonable.

bq. [~tjake] I don't think we can reliably detect minor differences

I agree, and that's why the patch wakes up every five minutes, iirc; the wake 
period is comfigurable. With this patch, we're not trying catch things at the 
smallest size, a la jHiccup, but really just want to catch things after large 
enough time distances. The defaults were intended to work around/with "large" 
GC pauses, and we can change the wording for log messages to include references 
to that; but at the end of the day if you've got 5 minute GC pauses, you've got 
problems, anyway.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253988#comment-15253988
 ] 

Jason Brown commented on CASSANDRA-11547:
-

bq. by monitoring the log file

There's also metrics included with this patch, so you can more easily monitor 
those.

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253991#comment-15253991
 ] 

Jason Brown commented on CASSANDRA-11547:
-

bq. out of sync with the majority of the cluster

As [~snazy] says, this is a very hard problem. I know of some research going on 
in this area, but it's far from trivial (and far from complete!)

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254015#comment-15254015
 ] 

Jason Brown commented on CASSANDRA-11547:
-

[~tjake] Sorry, I missed your earlier comment about Reimann and " It's about 
the relative clock drift compared to other nodes". I'll think about that a bit 
and comment. 

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254030#comment-15254030
 ] 

Stefan Podkowinski commented on CASSANDRA-11547:


bq. With this patch, we're not trying catch things at the smallest size, a la 
jHiccup, but really just want to catch things after large enough time distances.

Is this how clock drift actually happens? I was assuming clocks between systems 
are drifting apart slowly over time, instead of just jumping seconds forward or 
back.


> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254037#comment-15254037
 ] 

Jason Brown commented on CASSANDRA-11547:
-

[~spo...@gmail.com] clock drift can happen in many ways: either slowly losing 
syntonization or  massive jumps (by hours or centuries). This article was a 
good read, if not wholly relevant: http://queue.acm.org/detail.cfm?id=2878574

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254037#comment-15254037
 ] 

Jason Brown edited comment on CASSANDRA-11547 at 4/22/16 2:44 PM:
--

[~spo...@gmail.com] clock skew can happen in many ways: either slowly losing 
syntonization or  massive jumps (by hours or centuries). This article was a 
good read, if not wholly relevant: http://queue.acm.org/detail.cfm?id=2878574


was (Author: jasobrown):
[~spo...@gmail.com] clock drift can happen in many ways: either slowly losing 
syntonization or  massive jumps (by hours or centuries). This article was a 
good read, if not wholly relevant: http://queue.acm.org/detail.cfm?id=2878574

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254043#comment-15254043
 ] 

Jason Brown edited comment on CASSANDRA-11547 at 4/22/16 2:47 PM:
--

Hmm, starting to think I should have titled this ticket referencing "clock 
skew" rather than "clock drift". "drift" seems to indicate capture the 
minuscule differences that arise over time, rather than the more generic 
"skew", which would cover all clock differences, regardless of size. It is the 
larger differences, measured in minutes, for which this ticket was indicated to 
catch.


was (Author: jasobrown):
Hmm, starting to think I should have titled this ticket referencing "clock 
skew" rather than "clock drift". "drift" seems to indicate capture the 
minuscule differences that arise over time, rather than the more generic 
"skew", which would cover all clock differences, regardless of size. (It is the 
larger differences, measured in minutes, for which this ticket was indicated to 
catch.)

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11547) Add background thread to check for clock drift

2016-04-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254043#comment-15254043
 ] 

Jason Brown commented on CASSANDRA-11547:
-

Hmm, starting to think I should have titled this ticket referencing "clock 
skew" rather than "clock drift". "drift" seems to indicate capture the 
minuscule differences that arise over time, rather than the more generic 
"skew", which would cover all clock differences, regardless of size. (It is the 
larger differences, measured in minutes, for which this ticket was indicated to 
catch.)

> Add background thread to check for clock drift
> --
>
> Key: CASSANDRA-11547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11634) Add write timestamp to trace

2016-04-22 Thread Christopher Batey (JIRA)
Christopher Batey created CASSANDRA-11634:
-

 Summary: Add write timestamp to trace
 Key: CASSANDRA-11634
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11634
 Project: Cassandra
  Issue Type: Improvement
  Components: Observability
Reporter: Christopher Batey
Assignee: Christopher Batey
Priority: Minor


Diagnosing issues with clock drift would be easier if trace had the mutation 
timestamp. I'll add a patch for this soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11635) test-clientutil-jar unit test fails

2016-04-22 Thread Michael Shuler (JIRA)
Michael Shuler created CASSANDRA-11635:
--

 Summary: test-clientutil-jar unit test fails
 Key: CASSANDRA-11635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11635
 Project: Cassandra
  Issue Type: Test
  Components: Testing
Reporter: Michael Shuler
 Fix For: 3.x


{noformat}
test-clientutil-jar:
[junit] Testsuite: org.apache.cassandra.serializers.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
0.314 sec
[junit] 
[junit] Testcase: test(org.apache.cassandra.serializers.ClientUtilsTest):   
Caused an ERROR
[junit] org/apache/cassandra/utils/SigarLibrary
[junit] java.lang.NoClassDefFoundError: 
org/apache/cassandra/utils/SigarLibrary
[junit] at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:328)
[junit] at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:307)
[junit] at 
org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:256)
[junit] at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:39)
[junit] at 
org.apache.cassandra.serializers.ClientUtilsTest.test(ClientUtilsTest.java:56)
[junit] Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.utils.SigarLibrary
[junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[junit] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
[junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.serializers.ClientUtilsTest FAILED

BUILD FAILED
{noformat}

I'll see if I can find a spot where this passes, but it appears to have been 
failing for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11632) Commitlog corruption

2016-04-22 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai resolved CASSANDRA-11632.
-
Resolution: Cannot Reproduce

> Commitlog corruption
> 
>
> Key: CASSANDRA-11632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11632
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.6 SNAPSHOT
>Reporter: DOAN DuyHai
>
> {noformat}
> INFO  10:00:08 Replaying 
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
>  
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
> ERROR 10:00:08 Exiting due to error while processing commit log during 
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected error deserializing mutation; saved to 
> /var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat.
>   This may be caused by replaying a mutation against a table with the same 
> name but incompatible schema.  Exception follows: 
> org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 
> 0th field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
> [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
> [main/:na]
> {noformat}
> I'm using the 3.6_SNAPSHOT at _ae063e8 JSON datetime formatting needs 
> timezone_
> The commilog files are downloadable here: 
> https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11632) Commitlog corruption

2016-04-22 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254113#comment-15254113
 ] 

DOAN DuyHai commented on CASSANDRA-11632:
-

Ok, I can't reproduce it anymore, it's probably a issue with 
dropping/recreating schema. Closing it now as not reproductible

> Commitlog corruption
> 
>
> Key: CASSANDRA-11632
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11632
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.6 SNAPSHOT
>Reporter: DOAN DuyHai
>
> {noformat}
> INFO  10:00:08 Replaying 
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041762.log,
>  
> /Users/archinnovinfo/perso/cassandra/data/commitlog/CommitLog-6-1461260041763.log
> ERROR 10:00:08 Exiting due to error while processing commit log during 
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected error deserializing mutation; saved to 
> /var/folders/9s/gkchxg6x7qb0vh0k6_cmdl54gn/T/mutation916280897052665587dat.
>   This may be caused by replaying a mutation against a table with the same 
> name but incompatible schema.  Exception follows: 
> org.apache.cassandra.serializers.MarshalException: Not enough bytes to read 
> 0th field java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:611)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:568)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:521)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:407)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:236)
>  [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:193) 
> [main/:na]
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:558)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:687) 
> [main/:na]
> {noformat}
> I'm using the 3.6_SNAPSHOT at _ae063e8 JSON datetime formatting needs 
> timezone_
> The commilog files are downloadable here: 
> https://drive.google.com/file/d/0B6wR2aj4Cb6wUXpZc1dQcmZvb1U/view?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10756) Timeout failures in NativeTransportService.testConcurrentDestroys unit test

2016-04-22 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254114#comment-15254114
 ] 

Michael Shuler commented on CASSANDRA-10756:


I patched trunk at commit 7afc157 and looped over {{ant test 
-Dtest.name=NativeTransportServiceTest}} 80+ times and {{ant test-compression 
-Dtest.name=NativeTransportServiceTest}} 110+ times with no failures. LGTM from 
a test perspective.

> Timeout failures in NativeTransportService.testConcurrentDestroys unit test
> ---
>
> Key: CASSANDRA-10756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10756
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Alex Petrov
> Fix For: 3.x
>
>
> History of test on trunk 
> [here|http://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/NativeTransportServiceTest/testConcurrentDestroys/history/].
> I've seen these failures across 3.0/trunk for a while. I ran the test looping 
> locally for a while and the timeout is fairly easy to reproduce. The timeout 
> appears to be an indefinite hang and not a timing issue.
> When the timeout occurs, the following stack trace is at the end of the logs 
> for the unit test.
> {code}
> ERROR [ForkJoinPool.commonPool-worker-1] 2015-11-22 21:30:53,635 Failed to 
> submit a listener notification task. Event loop shut down?
> java.util.concurrent.RejectedExecutionException: event executor terminated
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:745)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:322)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:728)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.execute(DefaultPromise.java:671) 
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:641)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138) 
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroupFuture.(DefaultChannelGroupFuture.java:116)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroup.close(DefaultChannelGroup.java:275)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroup.close(DefaultChannelGroup.java:167)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> org.apache.cassandra.transport.Server$ConnectionTracker.closeAll(Server.java:277)
>  [main/:na]
>   at org.apache.cassandra.transport.Server.close(Server.java:180) 
> [main/:na]
>   at org.apache.cassandra.transport.Server.stop(Server.java:116) 
> [main/:na]
>   at java.util.Collections$SingletonSet.forEach(Collections.java:4767) 
> ~[na:1.8.0_60]
>   at 
> org.apache.cassandra.service.NativeTransportService.stop(NativeTransportService.java:136)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.NativeTransportService.destroy(NativeTransportService.java:144)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.NativeTransportServiceTest.lambda$withService$102(NativeTransportServiceTest.java:201)
>  ~[classes/:na]
>   at java.util.stream.IntPipeline$3$1.accept(IntPipeline.java:233) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>  ~[na:1.8.0_60]
>   at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
> ~[na:1.8.0_60]
>   at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) 
> ~[na:1.8.0_60]
>   at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) 
> ~[na:1.8.0_60]
>   at java.util.stream.AbstractTask.compute(AbstractTask.java:316) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) 
> ~[na:1.8.0_60]
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.jav

[jira] [Commented] (CASSANDRA-11634) Add write timestamp to trace

2016-04-22 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254115#comment-15254115
 ] 

Aleksey Yeschenko commented on CASSANDRA-11634:
---

Just beware that there is no such thing as 'mutation timestamp'. You can have 
multiple rows/cells in the same {{Mutation}} all having different timestamps.

> Add write timestamp to trace
> 
>
> Key: CASSANDRA-11634
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11634
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Christopher Batey
>Assignee: Christopher Batey
>Priority: Minor
>
> Diagnosing issues with clock drift would be easier if trace had the mutation 
> timestamp. I'll add a patch for this soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10756) Timeout failures in NativeTransportService.testConcurrentDestroys unit test

2016-04-22 Thread Michael Shuler (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Shuler updated CASSANDRA-10756:
---
Tester: Michael Shuler

> Timeout failures in NativeTransportService.testConcurrentDestroys unit test
> ---
>
> Key: CASSANDRA-10756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10756
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Alex Petrov
> Fix For: 3.x
>
>
> History of test on trunk 
> [here|http://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/NativeTransportServiceTest/testConcurrentDestroys/history/].
> I've seen these failures across 3.0/trunk for a while. I ran the test looping 
> locally for a while and the timeout is fairly easy to reproduce. The timeout 
> appears to be an indefinite hang and not a timing issue.
> When the timeout occurs, the following stack trace is at the end of the logs 
> for the unit test.
> {code}
> ERROR [ForkJoinPool.commonPool-worker-1] 2015-11-22 21:30:53,635 Failed to 
> submit a listener notification task. Event loop shut down?
> java.util.concurrent.RejectedExecutionException: event executor terminated
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:745)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:322)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:728)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.execute(DefaultPromise.java:671) 
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:641)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138) 
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroupFuture.(DefaultChannelGroupFuture.java:116)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroup.close(DefaultChannelGroup.java:275)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.group.DefaultChannelGroup.close(DefaultChannelGroup.java:167)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> org.apache.cassandra.transport.Server$ConnectionTracker.closeAll(Server.java:277)
>  [main/:na]
>   at org.apache.cassandra.transport.Server.close(Server.java:180) 
> [main/:na]
>   at org.apache.cassandra.transport.Server.stop(Server.java:116) 
> [main/:na]
>   at java.util.Collections$SingletonSet.forEach(Collections.java:4767) 
> ~[na:1.8.0_60]
>   at 
> org.apache.cassandra.service.NativeTransportService.stop(NativeTransportService.java:136)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.NativeTransportService.destroy(NativeTransportService.java:144)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.NativeTransportServiceTest.lambda$withService$102(NativeTransportServiceTest.java:201)
>  ~[classes/:na]
>   at java.util.stream.IntPipeline$3$1.accept(IntPipeline.java:233) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>  ~[na:1.8.0_60]
>   at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
> ~[na:1.8.0_60]
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) 
> ~[na:1.8.0_60]
>   at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) 
> ~[na:1.8.0_60]
>   at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) 
> ~[na:1.8.0_60]
>   at java.util.stream.AbstractTask.compute(AbstractTask.java:316) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) 
> ~[na:1.8.0_60]
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
> ~[na:1.8.0_60]
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ForkJoin

[jira] [Updated] (CASSANDRA-11574) clqsh: COPY FROM throws TypeError with Cython extensions enabled

2016-04-22 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-11574:

Fix Version/s: (was: 2.1.x)
   2.1.15

> clqsh: COPY FROM throws TypeError with Cython extensions enabled
> 
>
> Key: CASSANDRA-11574
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11574
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Operating System: Ubuntu Server 14.04
> JDK: Oracle JDK 8 update 77
> Python: 2.7.6
>Reporter: Mahafuzur Rahman
>Assignee: Stefania
>  Labels: cqlsh
> Fix For: 2.1.15, 2.2.6, 3.6, 3.0.6
>
>
> Any COPY FROM command in cqlsh is throwing the following error:
> "get_num_processes() takes no keyword arguments"
> Example command: 
> COPY inboxdata 
> (to_user_id,to_user_network,created_time,attachments,from_user_id,from_user_name,from_user_network,id,message,to_user_name,updated_time)
>  FROM 'inbox.csv';
> Similar commands worked parfectly in the previous versions such as 3.0.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

2016-04-22 Thread Tom Petracca (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254135#comment-15254135
 ] 

Tom Petracca commented on CASSANDRA-11623:
--

Agreed on the chunkOffset thing always equaling getOnDiskFilePointer().  I more 
went the route I did as it seems like the whole reason getOnDiskFilePointer was 
being called in the first place was to know how big the file would be if we 
stopped writing (in other words, should I start a new sstable?) and so I 
included the impact the buffer would have on eventual on disk size.  I'm happy 
to switch it to just be chunkOffset though, because at the end of the day none 
of it needs to be exact.  Either way it would stay in that 
getEffectiveOnDiskBytes method, because for the reasons stated in the next 
paragraph I think the getOnDiskFilePointer method itself still needs to 
directly hit lseek for CompressedSequentialWriter's.

And yea so a while back I tried removing the seekToChunkStart call on 1.2 
(because presumably it doesn't need to exist), but ended up causing weirdness 
where truncate calls would cause corrupt sstables.  However I didn't dig into 
it any further or even try it on later versions.  It gets called way less 
frequently (only every flush as opposed to every row write), so for now I'd say 
it's not important.

> Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
> ---
>
> Key: CASSANDRA-11623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tom Petracca
>Priority: Minor
> Attachments: compactiontask_profile.png
>
>
> Been doing some performance tuning and profiling of my cassandra cluster and 
> noticed that compaction speeds for my tables that I know to have very short 
> rows were going particularly slowly.  Profiling shows a ton of time being 
> spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a 
> CompactionTask shows that a majority of time is being spent lseek (called by 
> getOnDiskFilePointer), and not read or write.
> Going deeper it looks like we call getOnDiskFilePointer each row (sometimes 
> multiple times per row) in order to see if we've reached our expected sstable 
> size and should start a new writer.  This is pretty unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11635) test-clientutil-jar unit test fails

2016-04-22 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254176#comment-15254176
 ] 

Michael Shuler commented on CASSANDRA-11635:


{noformat}
47e8ef9e9ce70c54115681f854f483a53992c988 is the first bad commit
commit 47e8ef9e9ce70c54115681f854f483a53992c988
Author: Sylvain Lebresne 
Date:   Fri Jan 15 15:25:03 2016 +0100

Make UUID LSB unique per-process

patch by slebresne; reviewed by benedict for CASSANDRA-7925

:100644 100644 2bfba80011c5b6c5db1508772ef54de7a75fbcc2 
f571c295cccb0be22bf46526cc93d890811550ac M  CHANGES.txt
:04 04 4b557fdd87eccbc61931d6ec52ba7c511adb47a0 
e807214b140475a51f628deac65b8f8c60b77541 M  src
bisect run success
{noformat}

> test-clientutil-jar unit test fails
> ---
>
> Key: CASSANDRA-11635
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11635
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Michael Shuler
>  Labels: unittest
> Fix For: 3.x
>
>
> {noformat}
> test-clientutil-jar:
> [junit] Testsuite: org.apache.cassandra.serializers.ClientUtilsTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 0.314 sec
> [junit] 
> [junit] Testcase: test(org.apache.cassandra.serializers.ClientUtilsTest): 
>   Caused an ERROR
> [junit] org/apache/cassandra/utils/SigarLibrary
> [junit] java.lang.NoClassDefFoundError: 
> org/apache/cassandra/utils/SigarLibrary
> [junit] at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:328)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:307)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:256)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:39)
> [junit] at 
> org.apache.cassandra.serializers.ClientUtilsTest.test(ClientUtilsTest.java:56)
> [junit] Caused by: java.lang.ClassNotFoundException: 
> org.apache.cassandra.utils.SigarLibrary
> [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [junit] at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.serializers.ClientUtilsTest FAILED
> BUILD FAILED
> {noformat}
> I'll see if I can find a spot where this passes, but it appears to have been 
> failing for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11615) cassandra-stress blocks when connecting to a big cluster

2016-04-22 Thread Andy Tolbert (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254175#comment-15254175
 ] 

Andy Tolbert commented on CASSANDRA-11615:
--

Was digging into this with [~eduard.tudenhoefner], and I suspect this is being 
caused by [JAVA-1002|https://datastax-oss.atlassian.net/browse/JAVA-1002], 
which will be fixed in 3.0.1.  I tried this out with a 100 node simulated 
cluster (not using stress in this case) a single threaded netty event loop 
group in the driver (to amplify the impact), and timed how long a 
session.prepare takes when the keyspace is set on the connection.  It took 
203ms with the fix for JAVA-1002, otherwise it takes a very very long time.   I 
think this is the source of the issue, but we'll need to confirm again when we 
get a large cluster up again in the next week or so.

{noformat}
3.0.0 - 100 nodes, no keyspace set on session - 206ms

42776  [main] INFO  OneHundredNodeSimulation - Done Initing Cluster..Preparing 
Statement
42982  [main] INFO  OneHundredNodeSimulation - Done Preparing 
Statement...making query

3.0.0 - 100 nodes, keyspace set on session - too long..ms

46276  [main] INFO  OneHundredNodeSimulation - Done Initing Cluster..Preparing 
Statement
58429  [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.1:9042-3, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
70510  [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.3:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
82609  [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.4:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
94725  [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.5:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
106818 [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.6:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
118908 [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.7:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
131008 [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.8:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
143109 [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.9:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
155207 [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.10:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
167308 [cluster1-nio-worker-0] WARN  com.datastax.driver.core.Connection - 
Timeout while setting keyspace on Connection[/127.0.1.11:9042-1, inFlight=1, 
closed=false]. This should not happen but is not critical (it will be retried)
...

3.0.1rc (has JAVA-1002 fix) - 100 nodes, keyspace set on session - 203ms

46000  [main] INFO  OneHundredNodeSimulation - Done Initing Cluster..Preparing 
Statement
46203  [main] INFO  OneHundredNodeSimulation - Done Preparing 
Statement...making query
{noformat}

> cassandra-stress blocks when connecting to a big cluster
> 
>
> Key: CASSANDRA-11615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11615
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.x
>
> Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch
>
>
> I had a *100* node cluster and was running 
> {code}
> cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' 
> 'limit=1000/s'
> {code}
> Based on the thread dump it looks like it's been blocked at 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96
> {code}
> "Thread-20" #245 prio=5 os_prio=0 tid=0x7f3781822000 nid=0x46c4 waiting 
> for monitor entry [0x7f36cc788000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.

[jira] [Commented] (CASSANDRA-11635) test-clientutil-jar unit test fails

2016-04-22 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254179#comment-15254179
 ] 

Michael Shuler commented on CASSANDRA-11635:


Looking at the commit that bisect shows, this makes sense to me.

> test-clientutil-jar unit test fails
> ---
>
> Key: CASSANDRA-11635
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11635
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Michael Shuler
>  Labels: unittest
> Fix For: 3.x
>
>
> {noformat}
> test-clientutil-jar:
> [junit] Testsuite: org.apache.cassandra.serializers.ClientUtilsTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 0.314 sec
> [junit] 
> [junit] Testcase: test(org.apache.cassandra.serializers.ClientUtilsTest): 
>   Caused an ERROR
> [junit] org/apache/cassandra/utils/SigarLibrary
> [junit] java.lang.NoClassDefFoundError: 
> org/apache/cassandra/utils/SigarLibrary
> [junit] at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:328)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:307)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:256)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:39)
> [junit] at 
> org.apache.cassandra.serializers.ClientUtilsTest.test(ClientUtilsTest.java:56)
> [junit] Caused by: java.lang.ClassNotFoundException: 
> org.apache.cassandra.utils.SigarLibrary
> [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [junit] at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.serializers.ClientUtilsTest FAILED
> BUILD FAILED
> {noformat}
> I'll see if I can find a spot where this passes, but it appears to have been 
> failing for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11621) Stack Overflow inserting value with many columns

2016-04-22 Thread Andrew Jefferson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254180#comment-15254180
 ] 

Andrew Jefferson commented on CASSANDRA-11621:
--

I know you already closed but thought I would shout out that I grabbed the 
latest 2.2.6-SNAPSHOT from Jenkins and it the bug no longer occurs for me.

Amazing turnaround!

> Stack Overflow inserting value with many columns
> 
>
> Key: CASSANDRA-11621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11621
> Project: Cassandra
>  Issue Type: Bug
> Environment: CQL 3
> C* 2.2.5
>Reporter: Andrew Jefferson
>Assignee: Alex Petrov
> Fix For: 2.2.7
>
>
> I am using CQL to insert into a table that has ~4000 columns
> {code}
>   TABLE_DEFINITION = "
>   id uuid,
>   "dimension_n" for n in _.range(N_DIMENSIONS)
>   ETAG timeuuid,
>   PRIMARY KEY (id)
> "
> {code}
> I am using the node.js library from Datastax to execute CQL. This creates a 
> prepared statement and then uses it to perform an insert. It works fine on C* 
> 2.1 but after upgrading to C* 2.2.5 I get the stack overflow below.
> I know enough Java to think that recursing an iterator is bad form and should 
> be easy to fix.
> {code}
> ERROR 14:59:01 Unexpected exception during request; channel = [id: 
> 0xaac42a5d, /10.0.7.182:58736 => /10.0.0.87:9042]
> java.lang.StackOverflowError: null
>   at 
> com.google.common.base.Preconditions.checkPositionIndex(Preconditions.java:339)
>  ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.AbstractIndexedListIterator.(AbstractIndexedListIterator.java:69)
>  ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$11.(Iterators.java:1048) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators.forArray(Iterators.java:1048) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.RegularImmutableList.listIterator(RegularImmutableList.java:106)
>  ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.listIterator(ImmutableList.java:344) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.iterator(ImmutableList.java:340) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.iterator(ImmutableList.java:61) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables.iterators(Iterables.java:504) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables.access$100(Iterables.java:60) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$2.iterator(Iterables.java:494) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$3.transform(Iterables.java:508) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$3.transform(Iterables.java:505) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>  ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:543) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
> ...
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
> at 
> org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:168)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:223)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:257) 
> ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:242) 
> ~[main/:na]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:123)
>  ~[main/:na]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507)
>  [main/:na]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401)
>  [main/:na]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.ja

[jira] [Commented] (CASSANDRA-11635) test-clientutil-jar unit test fails

2016-04-22 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254185#comment-15254185
 ] 

Michael Shuler commented on CASSANDRA-11635:


Setting fixver from git without testing them all.

{noformat}
$ git branch -r --contains 47e8ef9
  origin/HEAD -> origin/trunk
  origin/cassandra-2.2
  origin/cassandra-3.0
  origin/cassandra-3.5
  origin/trunk
{noformat}

> test-clientutil-jar unit test fails
> ---
>
> Key: CASSANDRA-11635
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11635
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Michael Shuler
>  Labels: unittest
> Fix For: 3.x
>
>
> {noformat}
> test-clientutil-jar:
> [junit] Testsuite: org.apache.cassandra.serializers.ClientUtilsTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 0.314 sec
> [junit] 
> [junit] Testcase: test(org.apache.cassandra.serializers.ClientUtilsTest): 
>   Caused an ERROR
> [junit] org/apache/cassandra/utils/SigarLibrary
> [junit] java.lang.NoClassDefFoundError: 
> org/apache/cassandra/utils/SigarLibrary
> [junit] at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:328)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:307)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:256)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:39)
> [junit] at 
> org.apache.cassandra.serializers.ClientUtilsTest.test(ClientUtilsTest.java:56)
> [junit] Caused by: java.lang.ClassNotFoundException: 
> org.apache.cassandra.utils.SigarLibrary
> [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [junit] at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.serializers.ClientUtilsTest FAILED
> BUILD FAILED
> {noformat}
> I'll see if I can find a spot where this passes, but it appears to have been 
> failing for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11635) test-clientutil-jar unit test fails

2016-04-22 Thread Michael Shuler (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Shuler updated CASSANDRA-11635:
---
Fix Version/s: 3.0.x
   2.2.x

> test-clientutil-jar unit test fails
> ---
>
> Key: CASSANDRA-11635
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11635
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Michael Shuler
>  Labels: unittest
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> {noformat}
> test-clientutil-jar:
> [junit] Testsuite: org.apache.cassandra.serializers.ClientUtilsTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 0.314 sec
> [junit] 
> [junit] Testcase: test(org.apache.cassandra.serializers.ClientUtilsTest): 
>   Caused an ERROR
> [junit] org/apache/cassandra/utils/SigarLibrary
> [junit] java.lang.NoClassDefFoundError: 
> org/apache/cassandra/utils/SigarLibrary
> [junit] at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:328)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:307)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:256)
> [junit] at 
> org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:39)
> [junit] at 
> org.apache.cassandra.serializers.ClientUtilsTest.test(ClientUtilsTest.java:56)
> [junit] Caused by: java.lang.ClassNotFoundException: 
> org.apache.cassandra.utils.SigarLibrary
> [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [junit] at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.serializers.ClientUtilsTest FAILED
> BUILD FAILED
> {noformat}
> I'll see if I can find a spot where this passes, but it appears to have been 
> failing for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11628) Fix the regression to CASSANDRA-3983 that got introduced by CASSANDRA-10679

2016-04-22 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-11628:
---
 Reviewer: Yuki Morishita
Fix Version/s: 3.x
   3.0.x
   2.2.x
   2.1.x

> Fix the regression to CASSANDRA-3983 that got introduced by CASSANDRA-10679
> ---
>
> Key: CASSANDRA-11628
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11628
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Wei Deng
>Assignee: Wei Deng
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> It appears that the commit from CASSANDRA-10679 accidentally cancelled out 
> the effect that was originally intended by CASSANDRA-3983. In this case, we 
> would like to address the following situation:
> When you already have a C* package installed (which will deploy a file as 
> /usr/share/cassandra/cassandra.in.sh), but also attempt to run from a binary 
> download from http://cassandra.apache.org/download/, many tools like 
> cassandra-stress, sstablescrub, etal. will search the packaged dir 
> (/usr/share/cassandra/cassandra.in.sh) for 'cassandra.in.sh' before searching 
> the dir in your binary download or source build. We should reverse the order 
> of that search so it checks locally first. Otherwise you will encounter some 
> error like the following:
> {noformat}
> root@node0:~/apache-cassandra-3.6-SNAPSHOT# tools/bin/cassandra-stress -h
> Error: Could not find or load main class org.apache.cassandra.stress.Stress
> {noformat}
> {noformat}
> root@node0:~/apache-cassandra-3.6-SNAPSHOT# bin/sstableverify -h
> Error: Could not find or load main class 
> org.apache.cassandra.tools.StandaloneVerifier
> {noformat}
> The goal for CASSANDRA-10679 is still a good one: "For the most part all of 
> our shell scripts do the same thing, load the cassandra.in.sh and then call 
> something out of a jar. They should all look the same." But in this case, we 
> should correct them all to look the same but making them to look local dir 
> first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11615) cassandra-stress blocks when connecting to a big cluster

2016-04-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254220#comment-15254220
 ] 

T Jake Luciani commented on CASSANDRA-11615:


I'd prefer the upstream fix to adding this workaround option. Agreed?


> cassandra-stress blocks when connecting to a big cluster
> 
>
> Key: CASSANDRA-11615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11615
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.x
>
> Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch
>
>
> I had a *100* node cluster and was running 
> {code}
> cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' 
> 'limit=1000/s'
> {code}
> Based on the thread dump it looks like it's been blocked at 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96
> {code}
> "Thread-20" #245 prio=5 os_prio=0 tid=0x7f3781822000 nid=0x46c4 waiting 
> for monitor entry [0x7f36cc788000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> "Thread-19" #244 prio=5 os_prio=0 tid=0x7f378182 nid=0x46c3 waiting 
> for monitor entry [0x7f36cc889000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> {code}
> I was trying the same with with a smaller cluster (50 nodes) and it was 
> working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11199) rolling_upgrade_with_internode_ssl_test flaps, timing out waiting for node to start

2016-04-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch resolved CASSANDRA-11199.

Resolution: Not A Problem

looks likely this cleared up around the same time as CASSANDRA-11162. closing 
for now, and we can reopen if there's recurrence.

> rolling_upgrade_with_internode_ssl_test flaps, timing out waiting for node to 
> start
> ---
>
> Key: CASSANDRA-11199
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11199
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: Russ Hatch
>  Labels: dtest
>
> Here's an example of this failure:
> http://cassci.datastax.com/job/upgrade_tests-all/9/testReport/junit/upgrade_tests.upgrade_through_versions_test/ProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/rolling_upgrade_with_internode_ssl_test/
> And here are the two particular test I've seen flap:
> http://cassci.datastax.com/job/upgrade_tests-all/9/testReport/upgrade_tests.upgrade_through_versions_test/ProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/rolling_upgrade_with_internode_ssl_test/history/
> http://cassci.datastax.com/job/upgrade_tests-all/9/testReport/upgrade_tests.upgrade_through_versions_test/ProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/rolling_upgrade_with_internode_ssl_test/history/
> I haven't reproduced this locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11615) cassandra-stress blocks when connecting to a big cluster

2016-04-22 Thread Andy Tolbert (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254245#comment-15254245
 ] 

Andy Tolbert edited comment on CASSANDRA-11615 at 4/22/16 5:18 PM:
---

I agree, I proposed using setPrepareOnAllHosts(false) as it is a good work 
around for testing, but now that I think we understand the issue more and that 
it is on the driver side, it would be better to incorporate java-driver 3.0.1 
when it is released (no definitive date on that yet, but should be in coming 
weeks).


was (Author: andrew.tolbert):
I agree, I proposed using setPrepareOnAllHosts as it is a good work around for 
testing, but now that I think we understand the issue more and that it is on 
the driver side, it would be better to incorporate java-driver 3.0.1 when it is 
released (no definitive date on that yet, but should be in coming weeks).

> cassandra-stress blocks when connecting to a big cluster
> 
>
> Key: CASSANDRA-11615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11615
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.x
>
> Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch
>
>
> I had a *100* node cluster and was running 
> {code}
> cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' 
> 'limit=1000/s'
> {code}
> Based on the thread dump it looks like it's been blocked at 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96
> {code}
> "Thread-20" #245 prio=5 os_prio=0 tid=0x7f3781822000 nid=0x46c4 waiting 
> for monitor entry [0x7f36cc788000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> "Thread-19" #244 prio=5 os_prio=0 tid=0x7f378182 nid=0x46c3 waiting 
> for monitor entry [0x7f36cc889000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> {code}
> I was trying the same with with a smaller cluster (50 nodes) and it was 
> working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11615) cassandra-stress blocks when connecting to a big cluster

2016-04-22 Thread Andy Tolbert (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254245#comment-15254245
 ] 

Andy Tolbert commented on CASSANDRA-11615:
--

I agree, I proposed using setPrepareOnAllHosts as it is a good work around for 
testing, but now that I think we understand the issue more and that it is on 
the driver side, it would be better to incorporate java-driver 3.0.1 when it is 
released (no definitive date on that yet, but should be in coming weeks).

> cassandra-stress blocks when connecting to a big cluster
> 
>
> Key: CASSANDRA-11615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11615
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.x
>
> Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch
>
>
> I had a *100* node cluster and was running 
> {code}
> cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' 
> 'limit=1000/s'
> {code}
> Based on the thread dump it looks like it's been blocked at 
> https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96
> {code}
> "Thread-20" #245 prio=5 os_prio=0 tid=0x7f3781822000 nid=0x46c4 waiting 
> for monitor entry [0x7f36cc788000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> "Thread-19" #244 prio=5 os_prio=0 tid=0x7f378182 nid=0x46c3 waiting 
> for monitor entry [0x7f36cc889000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96)
> - waiting to lock <0x0005c003d920> (a 
> java.util.concurrent.ConcurrentHashMap)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109)
> at 
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261)
> at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327)
> {code}
> I was trying the same with with a smaller cluster (50 nodes) and it was 
> working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11499) dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup

2016-04-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch reassigned CASSANDRA-11499:
--

Assignee: Russ Hatch  (was: DS Test Eng)

> dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup
> --
>
> Key: CASSANDRA-11499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11499
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: Russ Hatch
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/341/testReport/commitlog_test/TestCommitLog/test_commitlog_replay_on_startup
> Failed on CassCI build trunk_novnode_dtest #341
> {noformat}
> Error Message
> 03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:
> INFO  [main] 2016-04-03 16:22:39,826 YamlConfigura.
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-1UTelU
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}
> dtest: DEBUG: Insert data
> dtest: DEBUG: Verify data is present
> dtest: DEBUG: Stop node abruptly
> dtest: DEBUG: Verify commitlog was written before abrupt stop
> dtest: DEBUG: Verify no SSTables were flushed before abrupt stop
> dtest: DEBUG: Verify commit log was replayed on startup
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/commitlog_test.py", line 193, in 
> test_commitlog_replay_on_startup
> node1.watch_log_for("Log replay complete")
>   File "/home/automaton/ccm/ccmlib/node.py", line 425, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:\nINFO  [main] 
> 2016-04-03 16:22:39,826 YamlConfigura.\nSee system.log for 
> remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-1UTelU\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}\ndtest: 
> DEBUG: Insert data\ndtest: DEBUG: Verify data is present\ndtest: DEBUG: Stop 
> node abruptly\ndtest: DEBUG: Verify commitlog was written before abrupt 
> stop\ndtest: DEBUG: Verify no SSTables were flushed before abrupt 
> stop\ndtest: DEBUG: Verify commit log was replayed on 
> startup\n- >> end captured logging << 
> -"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11499) dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup

2016-04-22 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254317#comment-15254317
 ] 

Russ Hatch commented on CASSANDRA-11499:


running a bulk job here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/81/

> dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup
> --
>
> Key: CASSANDRA-11499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11499
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: Russ Hatch
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/341/testReport/commitlog_test/TestCommitLog/test_commitlog_replay_on_startup
> Failed on CassCI build trunk_novnode_dtest #341
> {noformat}
> Error Message
> 03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:
> INFO  [main] 2016-04-03 16:22:39,826 YamlConfigura.
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-1UTelU
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}
> dtest: DEBUG: Insert data
> dtest: DEBUG: Verify data is present
> dtest: DEBUG: Stop node abruptly
> dtest: DEBUG: Verify commitlog was written before abrupt stop
> dtest: DEBUG: Verify no SSTables were flushed before abrupt stop
> dtest: DEBUG: Verify commit log was replayed on startup
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/commitlog_test.py", line 193, in 
> test_commitlog_replay_on_startup
> node1.watch_log_for("Log replay complete")
>   File "/home/automaton/ccm/ccmlib/node.py", line 425, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:\nINFO  [main] 
> 2016-04-03 16:22:39,826 YamlConfigura.\nSee system.log for 
> remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-1UTelU\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}\ndtest: 
> DEBUG: Insert data\ndtest: DEBUG: Verify data is present\ndtest: DEBUG: Stop 
> node abruptly\ndtest: DEBUG: Verify commitlog was written before abrupt 
> stop\ndtest: DEBUG: Verify no SSTables were flushed before abrupt 
> stop\ndtest: DEBUG: Verify commit log was replayed on 
> startup\n- >> end captured logging << 
> -"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10845) jmxmetrics_test.TestJMXMetrics.begin_test is failing

2016-04-22 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-10845:

Assignee: DS Test Eng  (was: Philip Thompson)

> jmxmetrics_test.TestJMXMetrics.begin_test is failing
> 
>
> Key: CASSANDRA-10845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10845
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Philip Thompson
>Assignee: DS Test Eng
>  Labels: dtest
>
> This test is failing on 2.1-head. There appear to be structural issues with 
> the test, and no C* bug to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11636) dtest failure in auth_test.TestAuth.restart_node_doesnt_lose_auth_data_test

2016-04-22 Thread Michael Shuler (JIRA)
Michael Shuler created CASSANDRA-11636:
--

 Summary: dtest failure in 
auth_test.TestAuth.restart_node_doesnt_lose_auth_data_test
 Key: CASSANDRA-11636
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11636
 Project: Cassandra
  Issue Type: Test
Reporter: Michael Shuler
Assignee: DS Test Eng


example failure:

http://cassci.datastax.com/job/cassandra-2.1_dtest/448/testReport/auth_test/TestAuth/restart_node_doesnt_lose_auth_data_test

Failed on CassCI build cassandra-2.1_dtest #448 - 2.1.14-tentative

{noformat}
Error Message

Problem stopping node node1
 >> begin captured logging << 
dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-sLlSHx
dtest: DEBUG: Custom init_config not found. Setting defaults.
dtest: DEBUG: Done setting configuration options:
{   'initial_token': None,
'num_tokens': '32',
'phi_convict_threshold': 5,
'range_request_timeout_in_ms': 1,
'read_request_timeout_in_ms': 1,
'request_timeout_in_ms': 1,
'truncate_request_timeout_in_ms': 1,
'write_request_timeout_in_ms': 1}
dtest: DEBUG: Default role created by node1
- >> end captured logging << -
Stacktrace

  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
testMethod()
  File "/home/automaton/cassandra-dtest/auth_test.py", line 910, in 
restart_node_doesnt_lose_auth_data_test
self.cluster.stop()
  File "/home/automaton/ccm/ccmlib/cluster.py", line 376, in stop
if not node.stop(wait, gently=gently):
  File "/home/automaton/ccm/ccmlib/node.py", line 677, in stop
raise NodeError("Problem stopping node %s" % self.name)
"Problem stopping node node1\n >> begin captured logging << 
\ndtest: DEBUG: cluster ccm directory: 
/mnt/tmp/dtest-sLlSHx\ndtest: DEBUG: Custom init_config not found. Setting 
defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
5,\n'range_request_timeout_in_ms': 1,\n
'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n
'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
1}\ndtest: DEBUG: Default role created by node1\n- >> 
end captured logging << -"
{noformat}

This test was successful in the next build on a commit that does not appear to 
be auth-related, and the test does not appear to be flappy. Looping over the 
test, I have not gotten a failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11024) Unexpected exception during request; java.lang.StackOverflowError: null

2016-04-22 Thread Kai Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254408#comment-15254408
 ] 

Kai Wang commented on CASSANDRA-11024:
--

Alex, sorry but I wasn't able to reproduce this error any more

> Unexpected exception during request; java.lang.StackOverflowError: null
> ---
>
> Key: CASSANDRA-11024
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11024
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 7, Java x64 1.8.0_65
>Reporter: Kai Wang
>Priority: Minor
>
> This happened when I run a "SELECT *" query on a very wide table. The table 
> has over 1000 columns and a lot of nulls. If I run "SELECT * ... LIMIT 10" or 
> "SELECT a,b,c FROM ...", then it's fine. The data is being actively inserted 
> when I run the query. Will try later when compaction (LCS) catches up.
> {noformat}
> ERROR [SharedPool-Worker-5] 2016-01-15 20:49:08,212 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x8e11d570, 
> /192.168.0.3:50332 => /192.168.0.11:9042]
> java.lang.StackOverflowError: null
>   at 
> com.google.common.base.Preconditions.checkPositionIndex(Preconditions.java:339)
>  ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.AbstractIndexedListIterator.(AbstractIndexedListIterator.java:69)
>  ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$11.(Iterators.java:1048) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators.forArray(Iterators.java:1048) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.RegularImmutableList.listIterator(RegularImmutableList.java:106)
>  ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.listIterator(ImmutableList.java:344) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.iterator(ImmutableList.java:340) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.ImmutableList.iterator(ImmutableList.java:61) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables.iterators(Iterables.java:504) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables.access$100(Iterables.java:60) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$2.iterator(Iterables.java:494) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$3.transform(Iterables.java:508) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterables$3.transform(Iterables.java:505) 
> ~[guava-16.0.jar:na]
>   at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>  ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:543) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) 
> ~[guava-16.0.jar:na]
>   at com.google.common.

[jira] [Assigned] (CASSANDRA-11539) dtest failure in topology_test.TestTopology.movement_test

2016-04-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch reassigned CASSANDRA-11539:
--

Assignee: Russ Hatch  (was: DS Test Eng)

> dtest failure in topology_test.TestTopology.movement_test
> -
>
> Key: CASSANDRA-11539
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11539
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Michael Shuler
>Assignee: Russ Hatch
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> {noformat}
> Error Message
> values not within 16.00% of the max: (335.88, 404.31) ()
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-XGOyDd
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/topology_test.py", line 93, in 
> movement_test
> assert_almost_equal(sizes[1], sizes[2])
>   File "/home/automaton/cassandra-dtest/assertions.py", line 75, in 
> assert_almost_equal
> assert vmin > vmax * (1.0 - error) or vmin == vmax, "values not within 
> %.2f%% of the max: %s (%s)" % (error * 100, args, error_message)
> "values not within 16.00% of the max: (335.88, 404.31) 
> ()\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-XGOyDd\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None,\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"
> {noformat}
> http://cassci.datastax.com/job/cassandra-3.5_novnode_dtest/22/testReport/topology_test/TestTopology/movement_test
> 
> I dug through this test's history on the trunk, 3.5, 3.0, and 2.2 branches. 
> It appears this test is stable and passing on 3.0 & 2.2 (which could be just 
> luck). On trunk & 3.5, however, this test has flapped a small number of times.
> The test's threshold is 16% and I found test failures in the 3.5 branch of 
> 16.2%, 16.9%, and 18.3%. In trunk I found 17.4% and 23.5% diff failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11499) dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup

2016-04-22 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254440#comment-15254440
 ] 

Russ Hatch commented on CASSANDRA-11499:


all passed. resolving.

> dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup
> --
>
> Key: CASSANDRA-11499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11499
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: Russ Hatch
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/341/testReport/commitlog_test/TestCommitLog/test_commitlog_replay_on_startup
> Failed on CassCI build trunk_novnode_dtest #341
> {noformat}
> Error Message
> 03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:
> INFO  [main] 2016-04-03 16:22:39,826 YamlConfigura.
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-1UTelU
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}
> dtest: DEBUG: Insert data
> dtest: DEBUG: Verify data is present
> dtest: DEBUG: Stop node abruptly
> dtest: DEBUG: Verify commitlog was written before abrupt stop
> dtest: DEBUG: Verify no SSTables were flushed before abrupt stop
> dtest: DEBUG: Verify commit log was replayed on startup
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/commitlog_test.py", line 193, in 
> test_commitlog_replay_on_startup
> node1.watch_log_for("Log replay complete")
>   File "/home/automaton/ccm/ccmlib/node.py", line 425, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:\nINFO  [main] 
> 2016-04-03 16:22:39,826 YamlConfigura.\nSee system.log for 
> remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-1UTelU\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}\ndtest: 
> DEBUG: Insert data\ndtest: DEBUG: Verify data is present\ndtest: DEBUG: Stop 
> node abruptly\ndtest: DEBUG: Verify commitlog was written before abrupt 
> stop\ndtest: DEBUG: Verify no SSTables were flushed before abrupt 
> stop\ndtest: DEBUG: Verify commit log was replayed on 
> startup\n- >> end captured logging << 
> -"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11499) dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup

2016-04-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch resolved CASSANDRA-11499.

Resolution: Cannot Reproduce

> dtest failure in commitlog_test.TestCommitLog.test_commitlog_replay_on_startup
> --
>
> Key: CASSANDRA-11499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11499
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: Russ Hatch
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/341/testReport/commitlog_test/TestCommitLog/test_commitlog_replay_on_startup
> Failed on CassCI build trunk_novnode_dtest #341
> {noformat}
> Error Message
> 03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:
> INFO  [main] 2016-04-03 16:22:39,826 YamlConfigura.
> See system.log for remainder
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-1UTelU
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}
> dtest: DEBUG: Insert data
> dtest: DEBUG: Verify data is present
> dtest: DEBUG: Stop node abruptly
> dtest: DEBUG: Verify commitlog was written before abrupt stop
> dtest: DEBUG: Verify no SSTables were flushed before abrupt stop
> dtest: DEBUG: Verify commit log was replayed on startup
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/commitlog_test.py", line 193, in 
> test_commitlog_replay_on_startup
> node1.watch_log_for("Log replay complete")
>   File "/home/automaton/ccm/ccmlib/node.py", line 425, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "03 Apr 2016 16:32:49 [node1] Missing: ['Log replay complete']:\nINFO  [main] 
> 2016-04-03 16:22:39,826 YamlConfigura.\nSee system.log for 
> remainder\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-1UTelU\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None, 'phi_convict_threshold': 5, 'start_rpc': 'true'}\ndtest: 
> DEBUG: Insert data\ndtest: DEBUG: Verify data is present\ndtest: DEBUG: Stop 
> node abruptly\ndtest: DEBUG: Verify commitlog was written before abrupt 
> stop\ndtest: DEBUG: Verify no SSTables were flushed before abrupt 
> stop\ndtest: DEBUG: Verify commit log was replayed on 
> startup\n- >> end captured logging << 
> -"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11539) dtest failure in topology_test.TestTopology.movement_test

2016-04-22 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254454#comment-15254454
 ] 

Russ Hatch commented on CASSANDRA-11539:


[~philipthompson]'s comment on this "my first guess there is just key count
It's only writing 10k keys"

> dtest failure in topology_test.TestTopology.movement_test
> -
>
> Key: CASSANDRA-11539
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11539
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
>Reporter: Michael Shuler
>Assignee: Russ Hatch
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> {noformat}
> Error Message
> values not within 16.00% of the max: (335.88, 404.31) ()
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-XGOyDd
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/topology_test.py", line 93, in 
> movement_test
> assert_almost_equal(sizes[1], sizes[2])
>   File "/home/automaton/cassandra-dtest/assertions.py", line 75, in 
> assert_almost_equal
> assert vmin > vmax * (1.0 - error) or vmin == vmax, "values not within 
> %.2f%% of the max: %s (%s)" % (error * 100, args, error_message)
> "values not within 16.00% of the max: (335.88, 404.31) 
> ()\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-XGOyDd\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None,\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"
> {noformat}
> http://cassci.datastax.com/job/cassandra-3.5_novnode_dtest/22/testReport/topology_test/TestTopology/movement_test
> 
> I dug through this test's history on the trunk, 3.5, 3.0, and 2.2 branches. 
> It appears this test is stable and passing on 3.0 & 2.2 (which could be just 
> luck). On trunk & 3.5, however, this test has flapped a small number of times.
> The test's threshold is 16% and I found test failures in the 3.5 branch of 
> 16.2%, 16.9%, and 18.3%. In trunk I found 17.4% and 23.5% diff failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11636) dtest failure in auth_test.TestAuth.restart_node_doesnt_lose_auth_data_test

2016-04-22 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254457#comment-15254457
 ] 

Philip Thompson commented on CASSANDRA-11636:
-

The log for that node is just this message repeated over and over:

{code}
WARN  [Thread-2] 2016-04-19 14:52:05,149 CustomTThreadPoolServer.ja
va:122 - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: No underlying serv
er socket.
at org.apache.cassandra.thrift.TCustomServerSocket.acceptIm
pl(TCustomServerSocket.java:96) ~[main/:na]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptIm
pl(TCustomServerSocket.java:36) ~[main/:na]
at org.apache.thrift.transport.TServerTransport.accept(TSer
verTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serv
e(CustomTThreadPoolServer.java:110) ~[main/:na]
at org.apache.cassandra.thrift.ThriftServer$ThriftServerThr
ead.run(ThriftServer.java:137) [main/:na]
{code}

Literally nothing else is in there.

> dtest failure in auth_test.TestAuth.restart_node_doesnt_lose_auth_data_test
> ---
>
> Key: CASSANDRA-11636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11636
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: DS Test Eng
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_dtest/448/testReport/auth_test/TestAuth/restart_node_doesnt_lose_auth_data_test
> Failed on CassCI build cassandra-2.1_dtest #448 - 2.1.14-tentative
> {noformat}
> Error Message
> Problem stopping node node1
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-sLlSHx
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Default role created by node1
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/auth_test.py", line 910, in 
> restart_node_doesnt_lose_auth_data_test
> self.cluster.stop()
>   File "/home/automaton/ccm/ccmlib/cluster.py", line 376, in stop
> if not node.stop(wait, gently=gently):
>   File "/home/automaton/ccm/ccmlib/node.py", line 677, in stop
> raise NodeError("Problem stopping node %s" % self.name)
> "Problem stopping node node1\n >> begin captured logging 
> << \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-sLlSHx\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\ndtest: DEBUG: Default role created by node1\n- >> 
> end captured logging << -"
> {noformat}
> This test was successful in the next build on a commit that does not appear 
> to be auth-related, and the test does not appear to be flappy. Looping over 
> the test, I have not gotten a failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11605) dtest failure in repair_tests.repair_test.TestRepair.dc_parallel_repair_test

2016-04-22 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254468#comment-15254468
 ] 

Philip Thompson commented on CASSANDRA-11605:
-

[~JoshuaMcKenzie], it was my understanding that we shouldn't see errors like 
this on 2.2+ on Windows anymore?

{code}
ERROR [NonPeriodicTasks:1] 2016-03-31 23:44:07,371 SSTableDeletingTask.java:83 
- Unable to delete 
d:\temp\dtest-mr7s9s\test\node2\data2\ks\cf-a61921b0f79911e581805f908c518710\la-1-big-Data.db
 (it will be removed on server restart; we'll also retry after GC)
{code}

> dtest failure in repair_tests.repair_test.TestRepair.dc_parallel_repair_test
> 
>
> Key: CASSANDRA-11605
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11605
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: DS Test Eng
>  Labels: dtest, windows
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_dtest_win32/217/testReport/repair_tests.repair_test/TestRepair/dc_parallel_repair_test
> Failed on CassCI build cassandra-2.2_dtest_win32 #217
> [~philipthompson] may be the person to look -- did you do the most recent 
> stuff with the repair tests?
> Here's the output:
> {code}
> Error Message
> Unexpected error in log, see stdout
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: d:\temp\dtest-mr7s9s
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Starting cluster..
> dtest: DEBUG: Inserting data...
> dtest: DEBUG: Checking data...
> dtest: DEBUG: starting repair...
> dtest: DEBUG: removing ccm cluster test at: d:\temp\dtest-mr7s9s
> dtest: DEBUG: clearing ssl stores from [d:\temp\dtest-mr7s9s] directory
> - >> end captured logging << -
> Stacktrace
>   File "C:\tools\python2\lib\unittest\case.py", line 358, in run
> self.tearDown()
>   File 
> "D:\jenkins\workspace\cassandra-2.2_dtest_win32\cassandra-dtest\dtest.py", 
> line 667, in tearDown
> raise AssertionError('Unexpected error in log, see stdout')
> "Unexpected error in log, see stdout\n >> begin captured 
> logging << \ndtest: DEBUG: cluster ccm directory: 
> d:\\temp\\dtest-mr7s9s\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\ndtest: DEBUG: Starting cluster..\ndtest: DEBUG: Inserting 
> data...\ndtest: DEBUG: Checking data...\ndtest: DEBUG: starting 
> repair...\ndtest: DEBUG: removing ccm cluster test at: 
> d:\\temp\\dtest-mr7s9s\ndtest: DEBUG: clearing ssl stores from 
> [d:\\temp\\dtest-mr7s9s] directory\n- >> end captured 
> logging << -"
> Standard Output
> Unexpected error in node2 log, error: 
> ERROR [NonPeriodicTasks:1] 2016-03-31 23:44:07,365 
> SSTableDeletingTask.java:83 - Unable to delete 
> d:\temp\dtest-mr7s9s\test\node2\data2\ks\cf-a61921b0f79911e581805f908c518710\la-2-big-Data.db
>  (it will be removed on server restart; we'll also retry after GC)
> ERROR [NonPeriodicTasks:1] 2016-03-31 23:44:07,371 
> SSTableDeletingTask.java:83 - Unable to delete 
> d:\temp\dtest-mr7s9s\test\node2\data2\ks\cf-a61921b0f79911e581805f908c518710\la-1-big-Data.db
>  (it will be removed on server restart; we'll also retry after GC)
> Standard Error
> Started: node1 with pid: 1344
> Started: node3 with pid: 4788
> Started: node2 with pid: 5552
> Started: node4 with pid: 7532
> Started: node2 with pid: 7828
> Started: node1 with pid: 5424
> Started: node3 with pid: 3448
> Started: node4 with pid: 3316
> Started: node3 with pid: 6464
> Started: node2 with pid: 2824
> Started: node4 with pid: 4692
> Started: node1 with pid: 7868
> Started: node2 with pid: 3824
> Started: node4 with pid: 6340
> Started: node1 with pid: 6412
> Started: node3 with pid: 6172
> Started: node2 with pid: 7280
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11612) dtest failure in materialized_views_test.TestMaterializedViews.interrupt_build_process_test

2016-04-22 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-11612:

Labels: dtest windows  (was: dtest)

> dtest failure in 
> materialized_views_test.TestMaterializedViews.interrupt_build_process_test
> ---
>
> Key: CASSANDRA-11612
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11612
> Project: Cassandra
>  Issue Type: Test
>Reporter: Jim Witschey
>Assignee: DS Test Eng
>  Labels: dtest, windows
>
> This has flapped a couple times so far. Example failure:
> http://cassci.datastax.com/job/trunk_dtest_win32/385/testReport/materialized_views_test/TestMaterializedViews/interrupt_build_process_test
> Failed on CassCI build trunk_dtest_win32 #385
> {code}
> Error Message
> 9847 != 1
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: d:\temp\dtest-kozpni
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Inserting initial data
> dtest: DEBUG: Create a MV
> dtest: DEBUG: Stop the cluster. Interrupt the MV build process.
> dtest: DEBUG: Restart the cluster
> dtest: DEBUG: MV shouldn't be built yet.
> dtest: DEBUG: Wait and ensure the MV build resumed. Waiting up to 2 minutes.
> dtest: DEBUG: Verify all data
> - >> end captured logging << -
> Stacktrace
>   File "C:\tools\python2\lib\unittest\case.py", line 329, in run
> testMethod()
>   File 
> "D:\jenkins\workspace\trunk_dtest_win32\cassandra-dtest\materialized_views_test.py",
>  line 700, in interrupt_build_process_test
> self.assertEqual(result[0].count, 1)
>   File "C:\tools\python2\lib\unittest\case.py", line 513, in assertEqual
> assertion_func(first, second, msg=msg)
>   File "C:\tools\python2\lib\unittest\case.py", line 506, in _baseAssertEqual
> raise self.failureException(msg)
> "9847 != 1\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> d:\\temp\\dtest-kozpni\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'num_tokens': '32',\n'phi_convict_threshold': 
> 5,\n'range_request_timeout_in_ms': 1,\n
> 'read_request_timeout_in_ms': 1,\n'request_timeout_in_ms': 1,\n   
>  'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\ndtest: DEBUG: Inserting initial data\ndtest: DEBUG: Create a 
> MV\ndtest: DEBUG: Stop the cluster. Interrupt the MV build process.\ndtest: 
> DEBUG: Restart the cluster\ndtest: DEBUG: MV shouldn't be built yet.\ndtest: 
> DEBUG: Wait and ensure the MV build resumed. Waiting up to 2 minutes.\ndtest: 
> DEBUG: Verify all data\n- >> end captured logging << 
> -"
> Standard Error
> Started: node1 with pid: 6056
> Started: node3 with pid: 7728
> Started: node2 with pid: 6428
> Started: node1 with pid: 6088
> Started: node3 with pid: 6824
> Started: node2 with pid: 4024
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10134) Always require replace_address to replace existing address

2016-04-22 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-10134:
--
Status: Ready to Commit  (was: Patch Available)

> Always require replace_address to replace existing address
> --
>
> Key: CASSANDRA-10134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10134
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
>  Labels: docs-impacting
> Fix For: 3.x
>
>
> Normally, when a node is started from a clean state with the same address as 
> an existing down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception 
> encountered during startup
> java.lang.RuntimeException: A node with address /127.0.0.3 already exists, 
> cancelling join. Use cassandra.replace_address if you want to replace this 
> node.
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) 
> [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed 
> list, it will not throw this error and will start normally.  The new node 
> then takes over the host ID of the old node (even if the tokens are 
> different), and the only message you will see is a warning in the other 
> nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, 
> hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information 
> for a down node without replacing it.  To fix this, we should check for an 
> endpoint collision even if {{auto_bootstrap}} is false or the node is a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-04-22 Thread Fabien Rousseau (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254670#comment-15254670
 ] 

Fabien Rousseau commented on CASSANDRA-11349:
-

Sorry for not being reactive lately, I'm rather busy atm...

I'd be more than happy[1] to see this patch in the next release.
I haven't tested it yet and probably can find some time next week to test it on 
a dev cluster if it can help.
Nevertheless, I won't be able to tell if it really worked because there will 
still have some mismatches (due to CASSANDRA-11477).

I have started working on a patch which should be able to handle both 
CASSANDRA-11477 and the last edge case.

What it basically does:
 - Tracker is now an interface
 - there are two implementations: one called RegularCompactionTracker and 
another one ValidationCompactionTracker
 - the ColumnIndexer.Builder has one more optional parameter : a boolean to 
know if it is built for validation
 - the RegularCompactionTracker is identical to the existing Tracker + one 
empty method
 - the ValidationCompactionTracker is similar to the existing Tracker but 
retain only opened tombstones (most methods are thus empty)
 - the Reducer slightly changed but its behaviour is the same regarding the 
regular compactions

I can share it if you're interested (code compiles but I still haven't tested 
it at all and plan to do it soon and share it after).

[1] Just to share more information: those issues are important to us, because a 
few of our clusters are impacted and a few days after filing the bug, we 
decided to temporarily stop repairing some tables (knowing that we could live 
with inconsistencies on those tables)  which were heavily impacted by those 
bugs (each repair increased disk occupancy by a few percent), and did a major 
compaction. This resulted in two to three times less disk occupancy (One table 
shrinked from 243GB to 79GB. Note that this was not due to tombstones 
reclaiming old data because, it's been nearly a month now, and the big SSTable 
resulting from the major compaction is still there but disk usage has not grown 
that much). 

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences of this are a costly repair, accumulating many small SSTables 
> (up to thousands for a rather short period of time when using VNodes, the 
> time for compaction to absorb those small files), but also an increased size 
> on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11546) Stress doesn't respect case-sensitive column names when building insert queries

2016-04-22 Thread Giampaolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giampaolo updated CASSANDRA-11546:
--
Attachment: example.yaml
cassandra-11546-trunk-giampaolo-trapasso.patch

Patch is available also at 
[https://github.com/radicalbit/cassandra/tree/CASSANDRA-11546-trunk].

I've tested with `stress user profile=example.yaml ops(insert=1,simple=9) -node 
127.0.0.1 -log` and solves the problem.

I had not problem with ant test, while I could not run full dtest on my limited 
working machine.

Note: Maybe the static method  `quoteIdentifier(String identifier)` could be 
useful in some other place, but I think a wider refactoring is beyond the goal 
of this issue.


> Stress doesn't respect case-sensitive column names when building insert 
> queries
> ---
>
> Key: CASSANDRA-11546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Joel Knighton
>Assignee: Giampaolo
>Priority: Trivial
>  Labels: lhf
> Attachments: cassandra-11546-trunk-giampaolo-trapasso.patch, 
> example.yaml
>
>
> When using a custom stress profile, if the schema uses case sensitive column 
> names, stress doesn't respect case sensitivity when building insert/update 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11432) Counter values become under-counted when running repair.

2016-04-22 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254686#comment-15254686
 ] 

Dikang Gu commented on CASSANDRA-11432:
---

[~iamaleksey] Here are the answers, please let me know if you want more info. 
Thanks!

1. Is you cluster a fresh 2.2 one? More specifically, does it by any chance 
have 2.0 or older generated counters?
>> it's a fresh 2.2.5 cluster, no legacy counters there.
2. How large is larger than 1%?
>> One example, for counter value 50K, the difference could be as large as 20K.
3. Can you observe the same thing without repair running?
>> If repair is not running, the difference is very small, only 10s.
4. Have you observed any timeouts? What to you do in case of a timeout? Ignore 
or retry? Counter updates are not idempotent, so if you retry a timed out 
increment, you have a real risk of overcounting (in case the update made it, 
but client timed out). If you ignore instead, than a missed increment would 
undercount. Another case that would cause an undercount is a retried decrement, 
of course.
>> Yes, we do see large amount of time out during repair.
5. What's your commit log policy? If sync, what the sync period? Have you 
observed any node failures during the experiment that would cause any commit 
log loss?
>> we use commitlog_sync: periodic and commitlog_sync_period_in_ms: 1, I 
>> think nodes are still alive during the experiment, just will timeout during 
>> the repair.

> Counter values become under-counted when running repair.
> 
>
> Key: CASSANDRA-11432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11432
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
>Assignee: Aleksey Yeschenko
>
> We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have 6 
> nodes, across three different regions, and in each region, the replication 
> factor is 2. Basically, each nodes holds a full copy of the data.
> We are writing to cluster with CL = 2, and reading with CL = 1. 
> When are doing 30k/s counter increment/decrement per node, and at the 
> meanwhile, we are double writing to our mysql tier, so that we can measure 
> the accuracy of C* counter, compared to mysql.
> The experiment result was great at the beginning, the counter value in C* and 
> mysql are very close. The difference is less than 0.1%. 
> But when we start to run the repair on one node, the counter value in C* 
> become much less than the value in mysql,  the difference becomes larger than 
> 1%.
> My question is that is it a known problem that the counter value will become 
> under-counted if repair is running? Should we avoid running repair for 
> counter tables?
> Thanks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11546) Stress doesn't respect case-sensitive column names when building insert queries

2016-04-22 Thread Giampaolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254682#comment-15254682
 ] 

Giampaolo edited comment on CASSANDRA-11546 at 4/22/16 9:04 PM:


Patch is available also at 
[https://github.com/radicalbit/cassandra/tree/CASSANDRA-11546-trunk].

I've tested with {{stress user profile=example.yaml ops(insert=1,simple=9) 
-node 127.0.0.1 -log}}.

I had not problem with {{ant test}}, while I could not run full dtest due to 
limitation of my working machine.

Note: Maybe the static method  {{quoteIdentifier(String identifier)}} could be 
useful in some other place, but I think a wider refactoring is beyond the goal 
of this issue.



was (Author: giampaolo):
Patch is available also at 
[https://github.com/radicalbit/cassandra/tree/CASSANDRA-11546-trunk].

I've tested with `stress user profile=example.yaml ops(insert=1,simple=9) -node 
127.0.0.1 -log` and solves the problem.

I had not problem with ant test, while I could not run full dtest on my limited 
working machine.

Note: Maybe the static method  `quoteIdentifier(String identifier)` could be 
useful in some other place, but I think a wider refactoring is beyond the goal 
of this issue.


> Stress doesn't respect case-sensitive column names when building insert 
> queries
> ---
>
> Key: CASSANDRA-11546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Joel Knighton
>Assignee: Giampaolo
>Priority: Trivial
>  Labels: lhf
> Attachments: cassandra-11546-trunk-giampaolo-trapasso.patch, 
> example.yaml
>
>
> When using a custom stress profile, if the schema uses case sensitive column 
> names, stress doesn't respect case sensitivity when building insert/update 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10134) Always require replace_address to replace existing address

2016-04-22 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254689#comment-15254689
 ] 

Joel Knighton commented on CASSANDRA-10134:
---

CI looks good.

I really like the route you took here - simple and separates things that have 
no business being conflated. I'm not worried about the slight change in 
semantics of isInitialized() - as you mentioned, there are existing methods to 
obtain gossip status.

There's a bit of an interesting situation, that if anything, reinforces your 
point that the semantics were unclear. In 
{{DynamicEndpointSnitch.updateScores()}}, we checked the initialization status 
of StorageService before proceeding. I was curious whether this was intended to 
be coupled to gossip or StorageService initialization; history suggests that 
the intent was to couple it to StorageService initialization to accommodate 
weird singleton initialization patterns which I believe are no longer relevant 
([CASSANDRA-1756]). That said, your patch coupling it to gossip preserves 
existing behavior, so that change in semantics doesn't particularly concern me. 
On the other hand, if you want to change that check to 
{{StorageService.isInitialized()}} and rerun CI, that would also work with me. 
Your call [~beobal]. In either case, I think a follow-up investigating whether 
the check is safe to remove is worthwhile.

If you opt to keep the check coupled to {{isGossipActive()}}, feel free to 
commit.

> Always require replace_address to replace existing address
> --
>
> Key: CASSANDRA-10134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10134
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
>  Labels: docs-impacting
> Fix For: 3.x
>
>
> Normally, when a node is started from a clean state with the same address as 
> an existing down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception 
> encountered during startup
> java.lang.RuntimeException: A node with address /127.0.0.3 already exists, 
> cancelling join. Use cassandra.replace_address if you want to replace this 
> node.
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) 
> [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed 
> list, it will not throw this error and will start normally.  The new node 
> then takes over the host ID of the old node (even if the tokens are 
> different), and the only message you will see is a warning in the other 
> nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, 
> hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information 
> for a down node without replacing it.  To fix this, we should check for an 
> endpoint collision even if {{auto_bootstrap}} is false or the node is a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10134) Always require replace_address to replace existing address

2016-04-22 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-10134:
--
Status: In Progress  (was: Ready to Commit)

> Always require replace_address to replace existing address
> --
>
> Key: CASSANDRA-10134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10134
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
>  Labels: docs-impacting
> Fix For: 3.x
>
>
> Normally, when a node is started from a clean state with the same address as 
> an existing down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception 
> encountered during startup
> java.lang.RuntimeException: A node with address /127.0.0.3 already exists, 
> cancelling join. Use cassandra.replace_address if you want to replace this 
> node.
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) 
> [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed 
> list, it will not throw this error and will start normally.  The new node 
> then takes over the host ID of the old node (even if the tokens are 
> different), and the only message you will see is a warning in the other 
> nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, 
> hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information 
> for a down node without replacing it.  To fix this, we should check for an 
> endpoint collision even if {{auto_bootstrap}} is false or the node is a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >