[jira] [Updated] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2018-06-17 Thread Jaydeepkumar Chovatia (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaydeepkumar Chovatia updated CASSANDRA-14526:
--
Status: Patch Available  (was: Open)

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-06-17 Thread Jaydeepkumar Chovatia (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515354#comment-16515354
 ] 

Jaydeepkumar Chovatia commented on CASSANDRA-14525:
---

[~jasobrown] I've added a dtest for this which passes with this fix and fails 
on current trunk, please find dtest here:
||dtest||
|[patch 
|https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|

 

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.j

[jira] [Updated] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2018-06-17 Thread Jaydeepkumar Chovatia (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaydeepkumar Chovatia updated CASSANDRA-14526:
--
Description: 
Please find dtest here:

|| dtest ||
| [patch 
|https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14423) SSTables stop being compacted

2018-06-17 Thread Kurt Greaves (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515286#comment-16515286
 ] 

Kurt Greaves commented on CASSANDRA-14423:
--

[~spo...@gmail.com] That change was to ensure that we didn't send repair status 
change notifications where the SSTable was already marked as repaired. My 
change doesn't revert that,  as we no longer pass through the repaired SSTables 
to {{performAntiCompaction}} so there is no need to filter them out. Granted it 
does change the behaviour of {{performAntiCompaction}} and if someone were to 
call it and pass in repaired SSTables, it would produce the old behaviour. But 
arguably you should never be passing already repaired SSTables to 
{{performAntiCompaction}}. At the moment {{performAntiCompaction}} is only ever 
used by submitAntiCompaction in the codebase, so it's only a problem if 3rd 
party tools are using it. If we're worried maybe adding a test to the start of 
{{performAntiCompaction}} to check the SSTables aren't already repaired would 
be the way to go?

utests:
[3.11|https://circleci.com/gh/kgreav/cassandra/163]
[3.0|https://circleci.com/gh/kgreav/cassandra/167]
[2.2|https://circleci.com/gh/kgreav/cassandra/165]

> SSTables stop being compacted
> -
>
> Key: CASSANDRA-14423
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14423
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 2.2.13, 3.0.17, 3.11.3
>
>
> So seeing a problem in 3.11.0 where SSTables are being lost from the view and 
> not being included in compactions/as candidates for compaction. It seems to 
> get progressively worse until there's only 1-2 SSTables in the view which 
> happen to be the most recent SSTables and thus compactions completely stop 
> for that table.
> The SSTables seem to still be included in reads, just not compactions.
> The issue can be fixed by restarting C*, as it will reload all SSTables into 
> the view, but this is only a temporary fix. User defined/major compactions 
> still work - not clear if they include the result back in the view but is not 
> a good work around.
> This also results in a discrepancy between SSTable count and SSTables in 
> levels for any table using LCS.
> {code:java}
> Keyspace : xxx
> Read Count: 57761088
> Read Latency: 0.10527088681224288 ms.
> Write Count: 2513164
> Write Latency: 0.018211106398149903 ms.
> Pending Flushes: 0
> Table: xxx
> SSTable count: 10
> SSTables in each level: [2, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 894498746
> Space used (total): 894498746
> Space used by snapshots (total): 0
> Off heap memory used (total): 11576197
> SSTable Compression Ratio: 0.6956629530569777
> Number of keys (estimate): 3562207
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 87
> Local read count: 57761088
> Local read latency: 0.108 ms
> Local write count: 2513164
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 86.33
> Bloom filter false positives: 43
> Bloom filter false ratio: 0.0
> Bloom filter space used: 8046104
> Bloom filter off heap memory used: 8046024
> Index summary off heap memory used: 3449005
> Compression metadata off heap memory used: 81168
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 5722
> Compacted partition mean bytes: 175
> Average live cells per slice (last five minutes): 1.0
> Maximum live cells per slice (last five minutes): 1
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Dropped Mutations: 0
> {code}
> Also for STCS we've confirmed that SSTable count will be different to the 
> number of SSTables reported in the Compaction Bucket's. In the below example 
> there's only 3 SSTables in a single bucket - no more are listed for this 
> table. Compaction thresholds haven't been modified for this table and it's a 
> very basic KV schema.
> {code:java}
> Keyspace : yyy
> Read Count: 30485
> Read Latency: 0.06708991307200263 ms.
> Write Count: 57044
> Write Latency: 0.02204061776873992 ms.
> Pending Flushes: 0
> Table: yyy
> SSTable count: 19
> Space used (live): 18195482
> Space used (total): 18195482
> Space used by snapshots (total): 0
> Off heap memory used (total): 747376
> SSTable Compression Ratio: 0.7607394576769735
> Number of keys (estimate): 116074
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 39
> Local read count: 30485
> Local read latency: NaN ms
> Local write count:

[jira] [Commented] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor

2018-06-17 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515244#comment-16515244
 ] 

mck commented on CASSANDRA-14356:
-

[~iamaleksey], any objections if I commit this? It makes sense, looks good, to 
me, and has been tested and verified against Reaper.

> LWTs keep failing in trunk after immutable refactor
> ---
>
> Key: CASSANDRA-14356
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14356
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), 
> Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 
> 14:01)
>Reporter: Michael Burman
>Assignee: Michael Burman
>Priority: Major
>  Labels: LWT
> Fix For: 4.0
>
> Attachments: CASSANDRA-14356.diff
>
>
> In the PaxosState, the original assert check is in the form of:
> assert promised.update.metadata() == accepted.update.metadata() && 
> accepted.update.metadata() == mostRecentCommit.update.metadata();
> However, after the change to make TableMetadata immutable this no longer 
> works as these instances are not necessarily the same (or never). This causes 
> the LWTs to fail although they're still correctly targetting the same table.
> From IRC:
>  It's a bug alright. Though really, the assertion should be on the 
> metadata ids, cause TableMetadata#equals does more than what we want.
>  That is, replacing by .equals() is not ok. That would reject throw 
> on any change to a table metadata, while the spirit of the assumption was to 
> sanity check both update were on the same table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor

2018-06-17 Thread mck (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-14356:

Labels: LWT  (was: )

> LWTs keep failing in trunk after immutable refactor
> ---
>
> Key: CASSANDRA-14356
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14356
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), 
> Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 
> 14:01)
>Reporter: Michael Burman
>Assignee: Michael Burman
>Priority: Major
>  Labels: LWT
> Fix For: 4.0
>
> Attachments: CASSANDRA-14356.diff
>
>
> In the PaxosState, the original assert check is in the form of:
> assert promised.update.metadata() == accepted.update.metadata() && 
> accepted.update.metadata() == mostRecentCommit.update.metadata();
> However, after the change to make TableMetadata immutable this no longer 
> works as these instances are not necessarily the same (or never). This causes 
> the LWTs to fail although they're still correctly targetting the same table.
> From IRC:
>  It's a bug alright. Though really, the assertion should be on the 
> metadata ids, cause TableMetadata#equals does more than what we want.
>  That is, replacing by .equals() is not ok. That would reject throw 
> on any change to a table metadata, while the spirit of the assumption was to 
> sanity check both update were on the same table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor

2018-06-17 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515186#comment-16515186
 ] 

Jeff Jirsa commented on CASSANDRA-14356:


cc [~iamaleksey] for visibility.

> LWTs keep failing in trunk after immutable refactor
> ---
>
> Key: CASSANDRA-14356
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14356
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), 
> Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 
> 14:01)
>Reporter: Michael Burman
>Assignee: Michael Burman
>Priority: Major
> Fix For: 4.0
>
> Attachments: CASSANDRA-14356.diff
>
>
> In the PaxosState, the original assert check is in the form of:
> assert promised.update.metadata() == accepted.update.metadata() && 
> accepted.update.metadata() == mostRecentCommit.update.metadata();
> However, after the change to make TableMetadata immutable this no longer 
> works as these instances are not necessarily the same (or never). This causes 
> the LWTs to fail although they're still correctly targetting the same table.
> From IRC:
>  It's a bug alright. Though really, the assertion should be on the 
> metadata ids, cause TableMetadata#equals does more than what we want.
>  That is, replacing by .equals() is not ok. That would reject throw 
> on any change to a table metadata, while the spirit of the assumption was to 
> sanity check both update were on the same table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org