[jira] [Updated] (CASSANDRA-14516) filter sstables by min/max clustering bounds during reads

2019-05-20 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14516:

Resolution: Fixed
Status: Resolved  (was: Open)

Thanks for looking into this [~n.v.harikrishna]. Sorry for the false alarm.

> filter sstables by min/max clustering bounds during reads
> -
>
> Key: CASSANDRA-14516
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14516
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Blake Eggleston
>Assignee: Venkata Harikrishna Nukala
>Priority: Normal
> Fix For: 4.0
>
>
> In SinglePartitionReadCommand, we don't filter out sstables whose min/max 
> clustering bounds don't intersect with the clustering bounds being queried. 
> This causes us to do extra work on the read path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes

2019-05-15 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840757#comment-16840757
 ] 

Blake Eggleston commented on CASSANDRA-14459:
-

I finished my first round of review of the implementation. I still need to 
spend some time looking at the tests and docs.

Some general notes:
 * I don't think we need a dynamicsnitch package, the 2 implementations could 
just live in the locator package.
 * I can't find anywhere that we validate the various dynamic snitch values? I 
didn't look super hard, but if we're not, it would be good to check people are 
using sane values on startup and jmx update.
 * It's probably a good time to rebase onto current trunk.

ScheduledExecutors:
 * {{getOrCreateSharedExecutor}} could be private
 * shutdown times accumulate in the wait section. ie: if we have 2 executors 
with a shutdown time of 2 seconds, we'll wait up to 4 seconds for them to stop.
 * we should validate shutdown hasn't been called in 
{{getOrCreateSharedExecutor}}
 * I don't see any benefit to making this class lock free. Making the 
{{getOrCreateSharedExecutor}} and {{shutdownAndWait}} methods synchronized and 
just using a normal hash map for {{executors}} would make this class easier to 
reason about. As it stands, you could argue there's some raciness with creating 
and shutting down at the same time, although it's unlikely to be a problem in 
practice. I do think I might be borderline nit picking here though.

StorageService
 * {{doLocalReadTest}} method is unused

MessagingService
 * Fix class declaration indentation
 * remove unused import

DynamicEndpointSnitch
 * I don't think we need to check {{logger.isTraceEnabled}} before calling 
{{logger.trace()}}? At least I don't see any toString computation that would 
happen in the calls.
 * The class hierarchy could be improved a bit. There's code in 
{{DynamicEndpointSnitchHistogram}} that's also used in the legacy snitch, and 
code in {{DynamicEndpointSnitch}} that's only used in 
{{DynamicEndpointSnitchHistogram}}. The boundary between DynamicEndpointSnitch 
and DynamicEndpointSnitchHistogram in particular feels kind of arbitrary.

DynamicEndpointLegacySnitch
 * If we're going to keep the old behavior around as a failsafe (and we 
should), I think we should avoid improving it by changing the reset behavior. 
Only resetting under some situations intuitively feels like the right thing to 
do, but it would suck if there were unforeseen problems with it that made it a 
regression from the 3.x behavior.

> DynamicEndpointSnitch should never prefer latent nodes
> --
>
> Key: CASSANDRA-14459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14459
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Low
>  Labels: 4.0-feature-freeze-review-requested, 
> pull-request-available
> Fix For: 4.x
>
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> The DynamicEndpointSnitch has two unfortunate behaviors that allow it to 
> provide latent hosts as replicas:
>  # Loses all latency information when Cassandra restarts
>  # Clears latency information entirely every ten minutes (by default), 
> allowing global queries to be routed to _other datacenters_ (and local 
> queries cross racks/azs)
> This means that the first few queries after restart/reset could be quite slow 
> compared to average latencies. I propose we solve this by resetting to the 
> minimum observed latency instead of completely clearing the samples and 
> extending the {{isLatencyForSnitch}} idea to a three state variable instead 
> of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows 
> {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the 
> DS should use those measurements if it only has one or fewer samples for a 
> host. This fixes both problems because on process restart we send out 
> {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to 
> effectively the RTT of the hosts (also at that point normal gossip 
> {{EchoMessages}} have an opportunity to add an additional latency 
> measurement).
> This strategy also nicely deals with the "a host got slow but now it's fine" 
> problem that the DS resets were (afaik) designed to stop because the 
> {{EchoMessage}} ping latency will count only after the reset for that host. 
> Ping latency is a more reasonable lower bound on host latency (as opposed to 
> status quo of zero).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Updated] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-10 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15108:

Fix Version/s: 4.0
   Status: Resolved  (was: Ready to Commit)
   Resolution: Fixed

Thanks, committed to trunk as 
[aa762c6d5253e0cc2947d3bf2b6149197e106036|https://github.com/apache/cassandra/commit/aa762c6d5253e0cc2947d3bf2b6149197e106036],
 and dtests master as 
[b1167bef169d657dbd7d1eb09c4d4a6fa8ecf6a9|https://github.com/apache/cassandra-dtest/commit/b1167bef169d657dbd7d1eb09c4d4a6fa8ecf6a9]

> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-10 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837436#comment-16837436
 ] 

Blake Eggleston commented on CASSANDRA-15108:
-

[~djoshi3] {{computeIfAbsent}} doesn't work for this situation. Since JMX 
metrics register themselves in their ctor, we need to create the metric exactly 
once, otherwise we'll get duplicate name exceptions (the problem the 
synchronization is solving). Although computeIfAbsent is thread safe in the 
context of the map, it uses compare and swap to add the computed value to the 
map. This means it eagerly allocates new metric instances, which can cause the 
jmx name collision we're trying to avoid if multiple calls interleave.

> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-09 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836725#comment-16836725
 ] 

Blake Eggleston commented on CASSANDRA-15108:
-

That would make sense. I half reverted that 
[here|https://github.com/bdeggleston/cassandra/commit/52d0903eed4c7eac8ab1a487c0110032aba9b81b].
 Now we do realclean on the initial build, and do no cleans on the test runs. 
According to the comments, clean was only added to prevent trying to run java 
11 compiled tests in a java 8, which should no longer be an issue with the 
split workspace in this patch.

|[trunk|https://github.com/bdeggleston/cassandra/tree/15108]|[j8 
circle|https://circleci.com/workflow-run/f983d849-3678-4490-9029-45c7af3283ac]|[j11
 
circle|https://circleci.com/workflow-run/b7ebddba-b528-4e7e-a723-591a9159ca48]| 

 

> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-09 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836692#comment-16836692
 ] 

Blake Eggleston commented on CASSANDRA-15108:
-

What kind of failures was realclean causing? I'd added that because random test 
runs were using the wrong byteman version and failing.

> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14883) Let Cassandra support the new JVM, Eclipse Openj9.

2019-05-08 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston reassigned CASSANDRA-14883:
---

Assignee: (was: Blake Eggleston)

> Let Cassandra support the new JVM, Eclipse Openj9.
> --
>
> Key: CASSANDRA-14883
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14883
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: jdk8u192-b12_openj9-0.11.0
> cassandra 4.0.0_beta_20181109_build
>Reporter: Lee Sangboo
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: jamm-0.3.2.jar, jamm.zip
>
>
> Cassandra does not currently support the new JVM, Eclipse Openj9. In internal 
> testing, Openj9 outperforms Hotspot. I have deployed a modified jamm library 
> that has a problem with the current startup, but when I started Cassandra, I 
> got a log message saying "Non-Oracle JVM detected." Some features, such as 
> unimported compact SSTables, may not work as intended "If there is no 
> problem, I would also like to delete the above message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14883) Let Cassandra support the new JVM, Eclipse Openj9.

2019-05-08 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston reassigned CASSANDRA-14883:
---

Assignee: Blake Eggleston

> Let Cassandra support the new JVM, Eclipse Openj9.
> --
>
> Key: CASSANDRA-14883
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14883
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: jdk8u192-b12_openj9-0.11.0
> cassandra 4.0.0_beta_20181109_build
>Reporter: Lee Sangboo
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: jamm-0.3.2.jar, jamm.zip
>
>
> Cassandra does not currently support the new JVM, Eclipse Openj9. In internal 
> testing, Openj9 outperforms Hotspot. I have deployed a modified jamm library 
> that has a problem with the current startup, but when I started Cassandra, I 
> got a log message saying "Non-Oracle JVM detected." Some features, such as 
> unimported compact SSTables, may not work as intended "If there is no 
> problem, I would also like to delete the above message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14883) Let Cassandra support the new JVM, Eclipse Openj9.

2019-05-08 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835874#comment-16835874
 ] 

Blake Eggleston edited comment on CASSANDRA-14883 at 5/8/19 8:12 PM:
-

[~avermeerbergen] do you see this log message (logged at INFO) when starting 
your Cassandra 3.x install with OpenJ9?

 
{quote}Cannot initialize un-mmaper. (Are you using a non-Oracle JVM?) Compacted 
data files will not be removed promptly. Consider using an Oracle JVM or using 
standard disk access mode
{quote}


was (Author: bdeggleston):
[~avermeerbergen] do you see this message when starting your Cassandra 3.x 
install with OpenJ9?

 
{quote}Cannot initialize un-mmaper. (Are you using a non-Oracle JVM?) Compacted 
data files will not be removed promptly. Consider using an Oracle JVM or using 
standard disk access mode{quote}

> Let Cassandra support the new JVM, Eclipse Openj9.
> --
>
> Key: CASSANDRA-14883
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14883
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: jdk8u192-b12_openj9-0.11.0
> cassandra 4.0.0_beta_20181109_build
>Reporter: Lee Sangboo
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: jamm-0.3.2.jar, jamm.zip
>
>
> Cassandra does not currently support the new JVM, Eclipse Openj9. In internal 
> testing, Openj9 outperforms Hotspot. I have deployed a modified jamm library 
> that has a problem with the current startup, but when I started Cassandra, I 
> got a log message saying "Non-Oracle JVM detected." Some features, such as 
> unimported compact SSTables, may not work as intended "If there is no 
> problem, I would also like to delete the above message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14883) Let Cassandra support the new JVM, Eclipse Openj9.

2019-05-08 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835874#comment-16835874
 ] 

Blake Eggleston commented on CASSANDRA-14883:
-

[~avermeerbergen] do you see this message when starting your Cassandra 3.x 
install with OpenJ9?

 
{quote}Cannot initialize un-mmaper. (Are you using a non-Oracle JVM?) Compacted 
data files will not be removed promptly. Consider using an Oracle JVM or using 
standard disk access mode{quote}

> Let Cassandra support the new JVM, Eclipse Openj9.
> --
>
> Key: CASSANDRA-14883
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14883
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: jdk8u192-b12_openj9-0.11.0
> cassandra 4.0.0_beta_20181109_build
>Reporter: Lee Sangboo
>Priority: Normal
> Fix For: 4.0.x
>
> Attachments: jamm-0.3.2.jar, jamm.zip
>
>
> Cassandra does not currently support the new JVM, Eclipse Openj9. In internal 
> testing, Openj9 outperforms Hotspot. I have deployed a modified jamm library 
> that has a problem with the current startup, but when I started Cassandra, I 
> got a log message saying "Non-Oracle JVM detected." Some features, such as 
> unimported compact SSTables, may not work as intended "If there is no 
> problem, I would also like to delete the above message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14609) Update circleci builds/env to support java 11

2019-05-02 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14609:

Resolution: Duplicate
Status: Resolved  (was: Open)

This was done as part of CASSANDRA-14806

> Update circleci builds/env to support java 11
> -
>
> Key: CASSANDRA-14609
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14609
> Project: Cassandra
>  Issue Type: Task
>  Components: Legacy/Testing
>Reporter: Jason Brown
>Assignee: Sumanth Pasupuleti
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> CASSANDRA-9608 introduced java 11 support, and it needs to be added to the 
> circleci testing environment. This is a place marker for that work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-02 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15108:

Test and Documentation Plan: I've just been testing with circle. Once it's 
committed, some more detailed performance testing would be useful.
 Status: Patch Available  (was: Open)

|[trunk|https://github.com/bdeggleston/cassandra/tree/15108]|[dtests|https://github.com/bdeggleston/cassandra-dtest/tree/jdk11]|[j8
 
circle|https://circleci.com/workflow-run/4fd48a55-24cc-4920-bdb4-6e8a1c8ff90e]|[j11
 circle|https://circleci.com/workflow-run/0655865d-ff5b-4bc8-a1ff-b705c63a40ed]|

This changes around how build.xml handles different jdks. The JDK you want to 
build with is selected with $JAVA_HOME. JDK8 will work as before, but 
attempting to build with JDK11 will fail unless you signal it’s intentional by 
either setting the flag -Duse.jdk=11 and/or setting the env var 
$CASSANDRA_USE_JDK11. Attempting to build with JDK8 will fail with these flags 
set.

The commits are divided into about 3 parts. First is the actual build.xml 
changes to support jdk11 build (and update intellij files). Second is various 
adjustments to libraries and command flags to make C* and it’s tests work 
properly with jdk11. Third is changes to circle ci configs to support building 
and testing jdk8 and jdk11.

This also fixes several dtests that were failing since CASSANDRA-9608


> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-02 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15108:

 Complexity: Normal
Change Category: Operability
 Status: Open  (was: Triage Needed)

> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-02 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15108:

Reviewers: Dinesh Joshi, Sam Tunnicliffe

> Support building Cassandra with JDK 11
> --
>
> Key: CASSANDRA-15108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> With the changes in java 8 support and licensing, we should be able to build 
> and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15108) Support building Cassandra with JDK 11

2019-05-02 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-15108:
---

 Summary: Support building Cassandra with JDK 11
 Key: CASSANDRA-15108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15108
 Project: Cassandra
  Issue Type: Improvement
  Components: Build
Reporter: Blake Eggleston
Assignee: Blake Eggleston


With the changes in java 8 support and licensing, we should be able to build 
and run Cassandra with java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-25 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15059:

Fix Version/s: 4.0
   3.11.5
   3.0.19
Since Version: 3.0.0
   Status: Resolved  (was: Ready to Commit)
   Resolution: Fixed

Committed to 3.0 as 
[c3ce32e239b1ba41faf1d58a942465b9bf45b986|https://github.com/apache/cassandra/commit/c3ce32e239b1ba41faf1d58a942465b9bf45b986]
 and merged up. Thanks!

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 3.0.19, 3.11.5, 4.0
>
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-25 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15059:

Reviewers: Ariel Weisberg  (was: Ariel Weisberg, Jordan West)
   Status: Review In Progress  (was: Patch Available)

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-25 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15059:

Status: Ready to Commit  (was: Review In Progress)

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-24 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15072:

Fix Version/s: 3.11.5
   3.0.19
Since Version: 3.0.0
   Status: Resolved  (was: Ready to Commit)
   Resolution: Fixed

Committed to 3.0 as 
[d27c3ad0d2d006a5f156f0a2f2a24286d31c5069|https://github.com/apache/cassandra/commit/d27c3ad0d2d006a5f156f0a2f2a24286d31c5069]
 and merged up. Thanks!

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Fix For: 3.0.19, 3.11.5
>
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-24 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15072:

Status: Ready to Commit  (was: Review In Progress)

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-24 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15078:

Status: Resolved  (was: Ready to Commit)
Resolution: Fixed

Committed as [7d2c3c215f65ee41f86886304257647fc24b1f70 
|https://github.com/apache/cassandra/commit/7d2c3c215f65ee41f86886304257647fc24b1f70],
 thanks.

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-22 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823466#comment-16823466
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

bq. quarantineEndpoint and replacementQuarantine are private, but maybe check 
there as well? I don't feel strongly about it, but it's slightly safer when 
they are called indirectly.

{{quarantineEndpoint}} (and by extension, {{replacementQuarantine}}) modifies 
the {{justRemovedEndpoints}} map. This map is also modified in the GossipTask 
thread, and is done correctly as far as I can tell. That's why I didn't add the 
assertion there.

bq. assassinateEndpoint asserts on the thread inside the lambda for run in 
Gossip stage. Harmless, but is it too much?

Probably overkill, removed.

bq. notifyFailureDetector seems like it could tolerate having this assertion 
since it is called from VerbHandlers in the gossip stage?

Nothing happens in that method that we'd need to assert the thread for. I could 
see adding an assertion with the interface refactor, but I wouldn't want make 
noise in people's logs if we're not actually doing anything worth complaining 
about.

bq. applyNewStates is also private, but maybe check the assertion there?

Same as above

I also switched to using the NoSpamLogger for the assertion.

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-22 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823290#comment-16823290
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

I opened CASSANDRA-15095 as a follow on JIRA.

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15095) Split Gossiper into separate interfaces

2019-04-22 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-15095:
---

 Summary: Split Gossiper into separate interfaces
 Key: CASSANDRA-15095
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15095
 Project: Cassandra
  Issue Type: Improvement
Reporter: Blake Eggleston


As [~aweisberg] suggests 
[here|https://issues.apache.org/jira/browse/CASSANDRA-15059?focusedCommentId=16802986=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16802986],
 a better way to encourage Gossiper threadsafety would be to split it into 
interfaces depending on which methods are safe to be called from where. At 
minimum, one for outside the gossiper stage, on for inside.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-18 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821569#comment-16821569
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

I've updated my branches with the runtime checking, could you take a look? The 
checks discovered a few more places where we were mutating Gossip state from 
the wrong thread, which I fixed.

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15089) CassandraNetworkAuthorizer::authorize should get role details from Roles, not directly from IRoleManager

2019-04-17 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15089:

Status: Ready to Commit  (was: Review In Progress)

> CassandraNetworkAuthorizer::authorize should get role details from Roles, not 
> directly from IRoleManager
> 
>
> Key: CASSANDRA-15089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15089
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Authorization
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0
>
>
> If the network permissions cache doesn't contain any entry for a role, the 
> authorize method is invoked on the configured INetworkAuthorizer. In the case 
> of CassandraNetworkAuthorizer, this immediately checks whether the role in 
> question has the LOGIN privilege set. It does this using the configured 
> IRoleManager directly, which causes a read from the underlying table in 
> system_auth. It should fetch the flag from Roles::canLogin, which uses the 
> RolesCache, falling back to the IRoleManager if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15089) CassandraNetworkAuthorizer::authorize should get role details from Roles, not directly from IRoleManager

2019-04-17 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15089:

Status: Review In Progress  (was: Patch Available)

> CassandraNetworkAuthorizer::authorize should get role details from Roles, not 
> directly from IRoleManager
> 
>
> Key: CASSANDRA-15089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15089
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Authorization
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0
>
>
> If the network permissions cache doesn't contain any entry for a role, the 
> authorize method is invoked on the configured INetworkAuthorizer. In the case 
> of CassandraNetworkAuthorizer, this immediately checks whether the role in 
> question has the LOGIN privilege set. It does this using the configured 
> IRoleManager directly, which causes a read from the underlying table in 
> system_auth. It should fetch the flag from Roles::canLogin, which uses the 
> RolesCache, falling back to the IRoleManager if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15089) CassandraNetworkAuthorizer::authorize should get role details from Roles, not directly from IRoleManager

2019-04-17 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820518#comment-16820518
 ] 

Blake Eggleston commented on CASSANDRA-15089:
-

+1

> CassandraNetworkAuthorizer::authorize should get role details from Roles, not 
> directly from IRoleManager
> 
>
> Key: CASSANDRA-15089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15089
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Authorization
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0
>
>
> If the network permissions cache doesn't contain any entry for a role, the 
> authorize method is invoked on the configured INetworkAuthorizer. In the case 
> of CassandraNetworkAuthorizer, this immediately checks whether the role in 
> question has the LOGIN privilege set. It does this using the configured 
> IRoleManager directly, which causes a read from the underlying table in 
> system_auth. It should fetch the flag from Roles::canLogin, which uses the 
> RolesCache, falling back to the IRoleManager if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-15 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818427#comment-16818427
 ] 

Blake Eggleston commented on CASSANDRA-15078:
-

Thanks [~ifesdjeen]. Pushed up requested changes. I ended up using 
{{callsOnInstance}} for the version getter instead of {{appliesOnInstance}}

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-11 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815537#comment-16815537
 ] 

Blake Eggleston edited comment on CASSANDRA-15059 at 4/11/19 4:03 PM:
--

Maybe we could compromise a bit here. There’s definitely value in rooting out 
places where we violate the assumptions gossiper makes about concurrency, but I 
worry that if we don’t catch all of them we'll turn a race that _may_ cause a 
problem into a hard failure. What if we logged an error by default, but threw 
an exception if a system property was set? This way we can get feedback and 
scary messages in the logs, but we haven't made things any less stable, and our 
tests will fail if we’re doing something we shouldn’t be. I have no problem 
putting that in 3.x and up.

I would like to punt the refactor to 4.next though. I do think it’s valuable, 
but I think it’s a bit late in the game to add it to 4.0.

WDYT?


was (Author: bdeggleston):
Maybe we could compromise a bit here. There’s definitely value in rooting out 
places where we violate the assumptions gossiper makes about concurrency, but I 
worry that if we don’t catch all of them we'll turn a relatively low impact 
race (low enough that we haven't caught it) into a hard failure. What if we 
logged an error by default, but threw an exception if a system property was 
set? This way we can get feedback and scary messages in the logs, but we 
haven't made things any less stable, and our tests will fail if we’re doing 
something we shouldn’t be. I have no problem putting that in 3.x and up.

I would like to punt the refactor to 4.next though. I do think it’s valuable, 
but I think it’s a bit late in the game to add it to 4.0.

WDYT?

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-11 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815537#comment-16815537
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

Maybe we could compromise a bit here. There’s definitely value in rooting out 
places where we violate the assumptions gossiper makes about concurrency, but I 
worry that if we don’t catch all of them we'll turn a relatively low impact 
race (low enough that we haven't caught it) into a hard failure. What if we 
logged an error by default, but threw an exception if a system property was 
set? This way we can get feedback and scary messages in the logs, but we 
haven't made things any less stable, and our tests will fail if we’re doing 
something we shouldn’t be. I have no problem putting that in 3.x and up.

I would like to punt the refactor to 4.next though. I do think it’s valuable, 
but I think it’s a bit late in the game to add it to 4.0.

WDYT?

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-09 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813615#comment-16813615
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

I think both of these changes are worth considering, but I don’t think would be 
appropriate to include them as part of this ticket. They are both out of scope 
and too risky to be putting in 3.x, and I’d argue the same for 4.0 at this 
point as well.

Gossiper is very brittle, critical to the operation of a cluster, and has 
little to no test coverage. Any bugs that aren't caught by a manual review or 
raise any red flags in the dtests will become production issues at some point.

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-04-08 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812769#comment-16812769
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

{quote}
Future wise this doesn't do anything to address the underlying fragility in how 
Gossiper doesn't document what is safe to call from outside the Gossip thread 
and what isn't. It also doesn't validate the correct thread is running a given 
method.
{quote}

I’ve been thinking about this a lot, and I think it would be safer if we didn’t 
do this.

Adding some preconditions isn’t going to fix the underlying fragility of 
Gossiper. Given the “realities” of the Gossiper class, I think it would end up 
causing more harm that good. Just starting to pull on that thread reveals at 
least one situation where we modify gossip state out of the gossip stage that 
makes sense (on startup). There are probably one or two more (at least), and 
I’d hate to break a nodetool command or something.

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-05 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810407#comment-16810407
 ] 

Blake Eggleston edited comment on CASSANDRA-15072 at 4/5/19 4:48 PM:
-

|[3.0|https://github.com/bdeggleston/cassandra/tree/15072-3.0]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15072-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15072-3.11]|[tests|https://circleci.com/workflow-run/4567dbed-be97-49e5-8c82-66e320e074ca]|

[~beobal] do you have time to review this? It seems to be related to 
CASSANDRA-11087.

A few notes:
 * From what I can tell, returning a row per cell is the right thing to do in 
this case, so I'm using a modified result counter to only going entire 
partitions in this specific case. However, I'm not familiar enough with all the 
dark corners of the 2.1 storage engine to be sure that's appropriate, or won't 
break something else.
 * Doing a point read with the partition key also returns a row per cell, but 
works correctly because the 2.2 coordinator seems to just discard the limit in 
that case.
 * If you're not familiar with the in-jvm dtests yet, and want to run the one 
in this patch, you'll want to run {{ant dtest-jar}} on this branch and [this 
2.2 branch|https://github.com/bdeggleston/cassandra/tree/15078-2.2], and put 
the 2.2 dtest jar in the 3.0 build directory.
 * -CircleCI seems to be behind picking up new branches to test, but I'll 
update this with links to the workflows once it catches up.-


was (Author: bdeggleston):
|[3.0|https://github.com/bdeggleston/cassandra/tree/15072-3.0]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15072-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15072-3.11]|[tests|https://circleci.com/workflow-run/4567dbed-be97-49e5-8c82-66e320e074ca]|

[~beobal] do you have time to review this? It seems to be related to 
CASSANDRA-11087.

A few notes:
 * From what I can tell, returning a row per cell is the right thing to do in 
this case, so I'm using a modified result counter to only going entire 
partitions in this specific case. However, I'm not familiar enough with all the 
dark corners of the 2.1 storage engine to be sure that's appropriate, or won't 
break something else.
 * Doing a point read with the partition key also returns a row per cell, but 
works correctly because the 2.2 coordinator seems to just discard the limit in 
that case.
 * If you're not familiar with the in-jvm dtests yet, and want to run the one 
in this patch, you'll want to run {{ant dtest-jar}} on this branch and [this 
2.2 branch|https://github.com/bdeggleston/cassandra/tree/15078-2.2], and put 
the 2.2 dtest jar in the 3.0 build directory.
 * CircleCI seems to be behind picking up new branches to test, but I'll update 
this with links to the workflows once it catches up.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;

[jira] [Comment Edited] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-05 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810373#comment-16810373
 ] 

Blake Eggleston edited comment on CASSANDRA-15078 at 4/5/19 4:47 PM:
-

|[2.2|https://github.com/bdeggleston/cassandra/tree/15078-2.2]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-2.2]|
|[3.0|https://github.com/bdeggleston/cassandra/tree/15078-3.0]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15078-3.11]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15072-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/15078-trunk]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-trunk]|

[~ifesdjeen] / [~benedict] would one of you mind taking a look at this? It's 
just adding and using hooks for getting/setting messaging versions. It also 
fixes a minor issue where the {{setup}} method wasn't getting called. These 
changes are in support of a test case for CASSANDRA-15072, feel free to take a 
look and give feedback on that as well. -Circle is behind picking up new 
branches, but I'll update with the workflow links once they're available.-


was (Author: bdeggleston):
|[2.2|https://github.com/bdeggleston/cassandra/tree/15078-2.2]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-2.2]|
|[3.0|https://github.com/bdeggleston/cassandra/tree/15078-3.0]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15078-3.11]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15072-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/15078-trunk]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-trunk]|

[~ifesdjeen] / [~benedict] would one of you mind taking a look at this? It's 
just adding and using hooks for getting/setting messaging versions. It also 
fixes a minor issue where the {{setup}} method wasn't getting called. These 
changes are in support of a test case for CASSANDRA-15072, feel free to take a 
look and give feedback on that as well. Circle is behind picking up new 
branches, but I'll update with the workflow links once they're available.

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-05 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810373#comment-16810373
 ] 

Blake Eggleston edited comment on CASSANDRA-15078 at 4/5/19 4:47 PM:
-

|[2.2|https://github.com/bdeggleston/cassandra/tree/15078-2.2]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-2.2]|
|[3.0|https://github.com/bdeggleston/cassandra/tree/15078-3.0]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15078-3.11]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15072-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/15078-trunk]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15078-trunk]|

[~ifesdjeen] / [~benedict] would one of you mind taking a look at this? It's 
just adding and using hooks for getting/setting messaging versions. It also 
fixes a minor issue where the {{setup}} method wasn't getting called. These 
changes are in support of a test case for CASSANDRA-15072, feel free to take a 
look and give feedback on that as well. Circle is behind picking up new 
branches, but I'll update with the workflow links once they're available.


was (Author: bdeggleston):
|[2.2|https://github.com/bdeggleston/cassandra/tree/15078-2.2]|
|[3.0|https://github.com/bdeggleston/cassandra/tree/15078-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15078-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/15078-trunk]|

[~ifesdjeen] / [~benedict] would one of you mind taking a look at this? It's 
just adding and using hooks for getting/setting messaging versions. It also 
fixes a minor issue where the {{setup}} method wasn't getting called. These 
changes are in support of a test case for CASSANDRA-15072, feel free to take a 
look and give feedback on that as well. Circle is behind picking up new 
branches, but I'll update with the workflow links once they're available.

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-05 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810407#comment-16810407
 ] 

Blake Eggleston edited comment on CASSANDRA-15072 at 4/5/19 4:46 PM:
-

|[3.0|https://github.com/bdeggleston/cassandra/tree/15072-3.0]|[tests|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15072-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15072-3.11]|[tests|https://circleci.com/workflow-run/4567dbed-be97-49e5-8c82-66e320e074ca]|

[~beobal] do you have time to review this? It seems to be related to 
CASSANDRA-11087.

A few notes:
 * From what I can tell, returning a row per cell is the right thing to do in 
this case, so I'm using a modified result counter to only going entire 
partitions in this specific case. However, I'm not familiar enough with all the 
dark corners of the 2.1 storage engine to be sure that's appropriate, or won't 
break something else.
 * Doing a point read with the partition key also returns a row per cell, but 
works correctly because the 2.2 coordinator seems to just discard the limit in 
that case.
 * If you're not familiar with the in-jvm dtests yet, and want to run the one 
in this patch, you'll want to run {{ant dtest-jar}} on this branch and [this 
2.2 branch|https://github.com/bdeggleston/cassandra/tree/15078-2.2], and put 
the 2.2 dtest jar in the 3.0 build directory.
 * CircleCI seems to be behind picking up new branches to test, but I'll update 
this with links to the workflows once it catches up.


was (Author: bdeggleston):
|[3.0|https://github.com/bdeggleston/cassandra/tree/15072-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15072-3.11]|

[~beobal] do you have time to review this? It seems to be related to 
CASSANDRA-11087.

A few notes:
 * From what I can tell, returning a row per cell is the right thing to do in 
this case, so I'm using a modified result counter to only going entire 
partitions in this specific case. However, I'm not familiar enough with all the 
dark corners of the 2.1 storage engine to be sure that's appropriate, or won't 
break something else.
 * Doing a point read with the partition key also returns a row per cell, but 
works correctly because the 2.2 coordinator seems to just discard the limit in 
that case.
 * If you're not familiar with the in-jvm dtests yet, and want to run the one 
in this patch, you'll want to run {{ant dtest-jar}} on this branch and [this 
2.2 branch|https://github.com/bdeggleston/cassandra/tree/15078-2.2], and put 
the 2.2 dtest jar in the 3.0 build directory.
 * CircleCI seems to be behind picking up new branches to test, but I'll update 
this with links to the workflows once it catches up.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | 

[jira] [Updated] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15072:

Test and Documentation Plan: circleci / in jvm upgrade dtests
 Status: Patch Available  (was: Open)

|[3.0|https://github.com/bdeggleston/cassandra/tree/15072-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15072-3.11]|

[~beobal] do you have time to review this? It seems to be related to 
CASSANDRA-11087.

A few notes:
 * From what I can tell, returning a row per cell is the right thing to do in 
this case, so I'm using a modified result counter to only going entire 
partitions in this specific case. However, I'm not familiar enough with all the 
dark corners of the 2.1 storage engine to be sure that's appropriate, or won't 
break something else.
 * Doing a point read with the partition key also returns a row per cell, but 
works correctly because the 2.2 coordinator seems to just discard the limit in 
that case.
 * If you're not familiar with the in-jvm dtests yet, and want to run the one 
in this patch, you'll want to run {{ant dtest-jar}} on this branch and [this 
2.2 branch|https://github.com/bdeggleston/cassandra/tree/15078-2.2], and put 
the 2.2 dtest jar in the 3.0 build directory.
 * CircleCI seems to be behind picking up new branches to test, but I'll update 
this with links to the workflows once it catches up.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15078:

Test and Documentation Plan: circle ci once available
 Status: Patch Available  (was: Open)

|[2.2|https://github.com/bdeggleston/cassandra/tree/15078-2.2]|
|[3.0|https://github.com/bdeggleston/cassandra/tree/15078-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15078-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/15078-trunk]|

[~ifesdjeen] / [~benedict] would one of you mind taking a look at this? It's 
just adding and using hooks for getting/setting messaging versions. It also 
fixes a minor issue where the {{setup}} method wasn't getting called. These 
changes are in support of a test case for CASSANDRA-15072, feel free to take a 
look and give feedback on that as well. Circle is behind picking up new 
branches, but I'll update with the workflow links once they're available.

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15078:

 Complexity: Normal
Change Category: Quality Assurance
 Status: Open  (was: Triage Needed)

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15072:

   Severity: Normal
 Complexity: Normal
Component/s: Legacy/Coordination
 Status: Open  (was: Triage Needed)

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston reassigned CASSANDRA-15078:
---

Assignee: Blake Eggleston

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15078:

Fix Version/s: 4.0
   3.11.5
   3.0.19
   2.2.15

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Priority: Normal
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-04 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15078:

Component/s: Test/dtest

> Support cross version messaging in in-jvm upgrade dtests
> 
>
> Key: CASSANDRA-15078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Blake Eggleston
>Priority: Normal
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15078) Support cross version messaging in in-jvm upgrade dtests

2019-04-04 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-15078:
---

 Summary: Support cross version messaging in in-jvm upgrade dtests
 Key: CASSANDRA-15078
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15078
 Project: Cassandra
  Issue Type: Improvement
Reporter: Blake Eggleston






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-02 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808193#comment-16808193
 ] 

Blake Eggleston commented on CASSANDRA-15072:
-

No problem. Yes mixed mode just means you're upgrading your cluster.

I don't know the exact cause, but you've summarized what I think is probably 
happening. Specifically the legacy read path on the 3.0 nodes is probably 
always interpreting single cells as rows for compact storage tables, even ones 
without clustering columns.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-02 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808114#comment-16808114
 ] 

Blake Eggleston commented on CASSANDRA-15072:
-

Huh, I did not know that. I guess that makes sense though. So then this is just 
an upgrade bug.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-02 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808019#comment-16808019
 ] 

Blake Eggleston commented on CASSANDRA-15072:
-

This is a great repro script, thanks. 

A couple of observations:
 * test.test has 2 columns, and uses compact storage, which shouldn’t be 
possible
 * node1 & node3 are the replicas of the missing partition (we’re querying from 
the un-upgraded node2, for those following along).
 * doing a point read ({{select * from test.test where id=‘1’;}}) returns the 
expected partition
 * using LIMIT 2 instead of PAGING 2 has the same problem
 * LIMIT 3 returns a partial row: {{1 | there | null}}
 * LIMIT 4 returns the entire row: {{1 | there |  hi}}

Tables with compact storage can only have a single column, so you shouldn’t be 
able to create a compact storage table with 2 columns. Instead of throwing an 
error though, it seems like it just silently treats the table as a normal 
table. This might be why no one has noticed that our ddl validation is broken.

It looks like the mixed mode read path is treating the table as a proper 
compact storage table though, and treating each cell as a row, which is why you 
see partial rows start to appear as you increase the limit. If you remove 
compact storage from the ddl, or only use a single column, everything works 
normally.

I'll think on the best way to address this.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
> Attachments: eriksw-repro.sh
>
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-01 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807327#comment-16807327
 ] 

Blake Eggleston commented on CASSANDRA-15072:
-

Ok, I can repro your issue with the updated script. It looks like you’re 
hitting a commit log bug that was introduced in 2.1 and fixed in 3.0 
(CASSANDRA-13987) If you drain node 1 & 2 before shutting them down, this 
should stop happening.

I’d also expected putting a sleep larger than the commit log sync interval 
before shutting down node 1 would fix the problem, but it didn’t. I’m still 
looking at why that is.

When you say:
{quote}When all nodes were upgraded (before upgrading sstables), we stopped 
getting incomplete results
{quote}
do you mean data you'd inserted before the upgrade reappeared?

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Muir Manders
>Priority: High
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-01 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston reassigned CASSANDRA-15072:
---

Assignee: Blake Eggleston

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Muir Manders
>Assignee: Blake Eggleston
>Priority: High
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old 
> node is your coordinator and it has to talk to an upgraded replica.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> CONSISTENCY QUORUM;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> ccm node2 stop
> ccm node2 setdir -v 3.11.4
> ccm node2 start
> # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
> # allow for simpler test setup)
> cqlsh 127.0.0.3 < CONSISTENCY QUORUM;
> PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade

2019-04-01 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807185#comment-16807185
 ] 

Blake Eggleston commented on CASSANDRA-15072:
-

Are you seeing incomplete results like this in a real cluster? If so, what 
consistency level are you reading and writing at?

The ccm script you have here _does_ return incomplete results, but it’s also 
writing and reading at CL ONE (the cqlsh default), so that’s not unexpected. I 
modified the script here to read and write at QUORUM, and haven't gotten any 
incomplete results.

> Incomplete range results during 2.X -> 3.11.4 upgrade
> -
>
> Key: CASSANDRA-15072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Muir Manders
>Priority: High
>
> Hello
> During an upgrade from 2.1.17 to 3.11.4, our application starting getting 
> back incomplete results for range queries. When all nodes were upgraded 
> (before upgrading sstables), we stopped getting incomplete results. I was 
> able to reproduce it and listed steps below. It seems to require the random 
> partitioner and compact storage to reproduce reliably. It also reproduces 
> coming from 2.1.21 and 2.2.14.
> {noformat}
> ccm create test -v 2.1.17 -n 3
> ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
> ccm node1 updateconf 'initial_token: 0'
> ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
> ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
> ccm start
> ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE COLUMNFAMILY test.test (
>   id text,
>   foo text,
>   bar text,
>   PRIMARY KEY (id)
> ) WITH COMPACT STORAGE;
> INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
> INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
> SCHEMA
> ccm node1 stop
> ccm node1 setdir -v 3.11.4
> ccm node1 start
> # need to use new cqlsh so we can configure page size
> cqlsh 127.0.0.2 < PAGING 2;
> select * from test.test;
> QUERY
> {noformat}
> This results in:
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
> (1 rows)
> {noformat}
> Running it against the upgraded node (node1):
> {noformat}
> Page size: 2
>  id | bar   | foo
> +---+-
>   2 | there |  hi
>   1 | there |  hi
> (2 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking

2019-03-28 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804130#comment-16804130
 ] 

Blake Eggleston commented on CASSANDRA-10726:
-

4.0 only I'm afraid. It's built on a refactor that's too invasive to backport 
to 3.11

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination
>Reporter: Richard Low
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-03-27 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803112#comment-16803112
 ] 

Blake Eggleston commented on CASSANDRA-15059:
-

{quote}
Future wise this doesn't do anything to address the underlying fragility in how 
Gossiper doesn't document what is safe to call from outside the Gossip thread 
and what isn't. It also doesn't validate the correct thread is running a given 
method.
I think we want to add two (or three?) interfaces that Gossiper can implement. 
One is for methods that are non-mutating and safe to call from any thread. The 
other is for methods for that should only be called from the Gossip stage 
thread. And maybe a third which contains methods that mutate, but block on the 
Gossip stage. So Gossiper.instance would go away and you would have a reference 
to the Gossiper via one of the 2-3 interfaces.
Additionally any method that should only run in Gossip stage interface should 
have a Preconditions.checkState checking that it's actually running in the 
Gossip stage.
{quote}

Do you mean as part of this ticket, or as future improvements to gossip?

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-25 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14607:

Status: Resolved  (was: Ready to Commit)

Committed to trunk as {{9063ceabc9a41825a09db811f74821537dfe726b}}, thanks!

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-25 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14607:

Status: Review In Progress  (was: Patch Available)

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-25 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14607:

Status: Ready to Commit  (was: Review In Progress)

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-03-22 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15059:

Test and Documentation Plan: circleci runs looks good
 Status: Patch Available  (was: Open)

|[3.0|https://github.com/bdeggleston/cassandra/tree/15059-3.0]|[circle|https://circleci.com/workflow-run/a863b2af-db01-4f7a-b059-a0ed2496f286]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/15059-3.11]|[circle|https://circleci.com/workflow-run/c271d33b-92c4-4579-80ce-cab20f4aac6c]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/15059-trunk]|[circle|https://circleci.com/workflow-run/d476de41-c374-40f1-bdae-b464149b703a]|

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-03-22 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15059:

 Severity: Low
   Complexity: Normal
Discovered By: User Report
 Bug Category: Parent values: Degradation(12984)Level 1 values: Other 
Exception(12998)
  Component/s: Cluster/Gossip
   Status: Open  (was: Triage)

> Gossiper#markAlive can race with Gossiper#markDead
> --
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes

2019-03-21 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14459:

Reviewers: Ariel Weisberg, Blake Eggleston  (was: Ariel Weisberg)

> DynamicEndpointSnitch should never prefer latent nodes
> --
>
> Key: CASSANDRA-14459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14459
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Low
>  Labels: 4.0-feature-freeze-review-requested, 
> pull-request-available
> Fix For: 4.x
>
>  Time Spent: 23h
>  Remaining Estimate: 0h
>
> The DynamicEndpointSnitch has two unfortunate behaviors that allow it to 
> provide latent hosts as replicas:
>  # Loses all latency information when Cassandra restarts
>  # Clears latency information entirely every ten minutes (by default), 
> allowing global queries to be routed to _other datacenters_ (and local 
> queries cross racks/azs)
> This means that the first few queries after restart/reset could be quite slow 
> compared to average latencies. I propose we solve this by resetting to the 
> minimum observed latency instead of completely clearing the samples and 
> extending the {{isLatencyForSnitch}} idea to a three state variable instead 
> of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows 
> {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the 
> DS should use those measurements if it only has one or fewer samples for a 
> host. This fixes both problems because on process restart we send out 
> {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to 
> effectively the RTT of the hosts (also at that point normal gossip 
> {{EchoMessages}} have an opportunity to add an additional latency 
> measurement).
> This strategy also nicely deals with the "a host got slow but now it's fine" 
> problem that the DS resets were (afaik) designed to stop because the 
> {{EchoMessage}} ping latency will count only after the reset for that host. 
> Ping latency is a more reasonable lower bound on host latency (as opposed to 
> status quo of zero).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15059) Gossiper#markAlive can race with Gossiper#markDead

2019-03-21 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-15059:
---

 Summary: Gossiper#markAlive can race with Gossiper#markDead
 Key: CASSANDRA-15059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
 Project: Cassandra
  Issue Type: Bug
Reporter: Blake Eggleston
Assignee: Blake Eggleston


The Gossiper class is not threadsafe and assumes all state changes happen in a 
single thread (the gossip stage). Gossiper#convict, however, can be called from 
the GossipTasks thread. This creates a race where calls to Gossiper#markAlive 
and Gossiper#markDead can interleave, corrupting gossip state. 
Gossiper#assassinateEndpoint has a similar problem, being called from the mbean 
server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-20 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14607:

Reviewers: Benedict

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-20 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14607:

Test and Documentation Plan: 
CIrcleCI run looks good: 
https://circleci.com/workflow-run/2680b5b1-e7a3-4ef8-92ae-ba238ea6a1bc

all j11 failures are also failing in trunk
 Status: Patch Available  (was: Open)

|[trunk|https://github.com/bdeggleston/cassandra/tree/14607-trunk-2]|[circle| 
https://circleci.com/workflow-run/2680b5b1-e7a3-4ef8-92ae-ba238ea6a1bc]|

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15053) Fix handling FS errors on writing and reading flat files - LogTransaction and hints

2019-03-15 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793750#comment-16793750
 ] 

Blake Eggleston commented on CASSANDRA-15053:
-

+1

> Fix handling FS errors on writing and reading flat files - LogTransaction and 
> hints
> ---
>
> Key: CASSANDRA-15053
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15053
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> We currently fail to handle and propagate IO errors when dealing with 
> transaction log and hints.  It's trivial to fix this behaviour to ensure that 
> disk failure policy is properly invoked in error scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-13 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston reassigned CASSANDRA-14607:
---

Assignee: Blake Eggleston

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-12 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791278#comment-16791278
 ] 

Blake Eggleston edited comment on CASSANDRA-14607 at 3/13/19 4:21 AM:
--

I think we could get the java 8 behavior in java 11 by moving the update logic 
of {{addAllWithSizeDelta}} into a private method, shuffling the 'shouldLock' 
logic around, and just using a {{synchronized}} block. This would work in both 
java 8 and java 11, so we wouldn't need the java8 and java11 source directories.

I pushed up a branch showing how this would work here: 
[https://github.com/bdeggleston/cassandra/tree/14607]

I may be missing some nuance in the usage of {{monitorEnter/Exit}} (I couldn't 
find a lot of documentation on the method, or explanations for why we were 
using it), but I think this would be functionally identical to what we were 
doing with {{monitorEnter/Exit}}.

WDYT [~benedict] & [~jasobrown]?

Edit: That branch is just for illustration, it's not a finished patch. At a 
minimum it still needs to fix the handling of {{inputDeletionInfoCopy}}, 
condense the {{AbstractBTreePartition}} class structure, and remove the java8 
and java11 source directories.


was (Author: bdeggleston):
I think we could get the java 8 behavior in java 11 by moving the update logic 
of {{addAllWithSizeDelta}} into a private method, shuffling the 'shouldLock' 
logic around, and just using a {{synchronized}} block. This would work in both 
java 8 and java 11, so we wouldn't need the java8 and java11 source directories.

I pushed up a branch showing how this would work here: 
[https://github.com/bdeggleston/cassandra/tree/14607]

I may be missing some nuance in the usage of {{monitorEnter/Exit}} (I couldn't 
find a lot of documentation on the method, or explanations for why we were 
using it), but I think this would be functionally identical to what we were 
doing with {{monitorEnter/Exit}}.

WDYT [~benedict] & [~jasobrown]?

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14607) Explore optimizations in AbstractBtreePartiton, java 11 variant

2019-03-12 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791278#comment-16791278
 ] 

Blake Eggleston commented on CASSANDRA-14607:
-

I think we could get the java 8 behavior in java 11 by moving the update logic 
of {{addAllWithSizeDelta}} into a private method, shuffling the 'shouldLock' 
logic around, and just using a {{synchronized}} block. This would work in both 
java 8 and java 11, so we wouldn't need the java8 and java11 source directories.

I pushed up a branch showing how this would work here: 
[https://github.com/bdeggleston/cassandra/tree/14607]

I may be missing some nuance in the usage of {{monitorEnter/Exit}} (I couldn't 
find a lot of documentation on the method, or explanations for why we were 
using it), but I think this would be functionally identical to what we were 
doing with {{monitorEnter/Exit}}.

WDYT [~benedict] & [~jasobrown]?

> Explore optimizations in AbstractBtreePartiton, java 11 variant
> ---
>
> Key: CASSANDRA-14607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14607
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Jason Brown
>Priority: Normal
>  Labels: Java11
> Fix For: 4.0
>
>
> In CASSANDRA-9608, we discussed some way to optimize the java 11 
> implementation of {{AbstractBTreePartition}}. This ticket serves that 
> purpose, as well as a "note to selves" to ensure the java 11 version does not 
> have a performance regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14482) ZSTD Compressor support in Cassandra

2019-02-22 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14482:

Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

committed to trunk as 
[dccf53061a61e7c632669c60cd94626e405518e9|https://github.com/apache/cassandra/commit/dccf53061a61e7c632669c60cd94626e405518e9],
 thanks!

> ZSTD Compressor support in Cassandra
> 
>
> Key: CASSANDRA-14482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14482
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Dependencies, Feature/Compression
>Reporter: Sushma A Devendrappa
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.x
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

2019-02-22 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15027:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk as 9bde713ee8883f70d130efb6290ec0e6daea524f, thanks

> Handle IR prepare phase failures less race prone by waiting for all results
> ---
>
> Key: CASSANDRA-15027
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Local/Compaction
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 4.x
>
>
> Handling incremental repairs as a coordinator begins by sending a 
> {{PrepareConsistentRequest}} message to all participants, which may also 
> include the coordinator itself. Participants will run anti-compactions upon 
> receiving such a message and report the result of the operation back to the 
> coordinator.
> Once we receive a failure response from any of the participants, we fail-fast 
> in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn 
> completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then 
> the repair command will terminate with an error status, as expected.
> The issue is that in case the node will both be coordinator and participant, 
> we may end up with a local session and submitted anti-compactions, which will 
> be executed without any coordination with the coordinator session (on same 
> node). This may result in situations where running repair commands right 
> after another, may cause overlapping execution of anti-compactions that will 
> cause the following (misleading) message to show up in the logs and will 
> cause the repair to fail again:
>  "Prepare phase for incremental repair session %s has failed because it 
> encountered intersecting sstables belonging to another incremental repair 
> session (%s). This is by starting an incremental repair session before a 
> previous one has completed. Check nodetool repair_admin for hung sessions and 
> fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

2019-02-22 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775361#comment-16775361
 ] 

Blake Eggleston edited comment on CASSANDRA-15027 at 2/22/19 5:34 PM:
--

Nice. Your follow on changes look good to me, I have 2 nits, but those can just 
be fixed on commit.

* We should log the session id in compaction manager when an anti-compaction is 
cancelled (and probably when there's an error as well)
* Some error handling should be added to the commit fixing the race between 
proposeFuture and hasFailure so nodetool doesn't hang if there's an error in 
the callback

edit: proposed fixes 
[here|https://github.com/bdeggleston/cassandra/commit/02d7d9e09983db0d4661486b17adc375e17be24f]


was (Author: bdeggleston):
Nice. Your follow on changes look good to me, I have 2 nits, but those can just 
be fixed on commit.
 
* We should log the session id in compaction manager when an anti-compaction is 
cancelled (and probably when there's an error as well)
* Some error handling should be added to the commit fixing the race between 
proposeFuture and hasFailure so nodetool doesn't hang if there's an error in 
the callback

> Handle IR prepare phase failures less race prone by waiting for all results
> ---
>
> Key: CASSANDRA-15027
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Local/Compaction
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 4.x
>
>
> Handling incremental repairs as a coordinator begins by sending a 
> {{PrepareConsistentRequest}} message to all participants, which may also 
> include the coordinator itself. Participants will run anti-compactions upon 
> receiving such a message and report the result of the operation back to the 
> coordinator.
> Once we receive a failure response from any of the participants, we fail-fast 
> in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn 
> completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then 
> the repair command will terminate with an error status, as expected.
> The issue is that in case the node will both be coordinator and participant, 
> we may end up with a local session and submitted anti-compactions, which will 
> be executed without any coordination with the coordinator session (on same 
> node). This may result in situations where running repair commands right 
> after another, may cause overlapping execution of anti-compactions that will 
> cause the following (misleading) message to show up in the logs and will 
> cause the repair to fail again:
>  "Prepare phase for incremental repair session %s has failed because it 
> encountered intersecting sstables belonging to another incremental repair 
> session (%s). This is by starting an incremental repair session before a 
> previous one has completed. Check nodetool repair_admin for hung sessions and 
> fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

2019-02-22 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775361#comment-16775361
 ] 

Blake Eggleston commented on CASSANDRA-15027:
-

Nice. Your follow on changes look good to me, I have 2 nits, but those can just 
be fixed on commit.
 
* We should log the session id in compaction manager when an anti-compaction is 
cancelled (and probably when there's an error as well)
* Some error handling should be added to the commit fixing the race between 
proposeFuture and hasFailure so nodetool doesn't hang if there's an error in 
the callback

> Handle IR prepare phase failures less race prone by waiting for all results
> ---
>
> Key: CASSANDRA-15027
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Local/Compaction
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 4.x
>
>
> Handling incremental repairs as a coordinator begins by sending a 
> {{PrepareConsistentRequest}} message to all participants, which may also 
> include the coordinator itself. Participants will run anti-compactions upon 
> receiving such a message and report the result of the operation back to the 
> coordinator.
> Once we receive a failure response from any of the participants, we fail-fast 
> in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn 
> completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then 
> the repair command will terminate with an error status, as expected.
> The issue is that in case the node will both be coordinator and participant, 
> we may end up with a local session and submitted anti-compactions, which will 
> be executed without any coordination with the coordinator session (on same 
> node). This may result in situations where running repair commands right 
> after another, may cause overlapping execution of anti-compactions that will 
> cause the following (misleading) message to show up in the logs and will 
> cause the repair to fail again:
>  "Prepare phase for incremental repair session %s has failed because it 
> encountered intersecting sstables belonging to another incremental repair 
> session (%s). This is by starting an incremental repair session before a 
> previous one has completed. Check nodetool repair_admin for hung sessions and 
> fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

2019-02-21 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774515#comment-16774515
 ] 

Blake Eggleston commented on CASSANDRA-15027:
-

Thanks [~spo...@gmail.com]. I’ve extended your code so that in addition to 
waiting for other anti-compactions to complete, the coordinator also 
pro-actively cancels ongoing anti-compactions on the other participants. This 
avoids wasting time waiting for anti-compactions on other machines. The code 
does 3 things:
 * Adds a session state check to the {{isStopRequested}} method in the 
anti-compaction iterator.
 * The coordinator now sends failure messages to all participants when it 
receives a failure message from one of them in the prepare phase. It does not 
mark these participants as having failed internally though, since that would 
cause the nodetool session to immediately complete. Instead, it waits until 
it’s received messages from all the other nodes.
 * The participants will now respond with a failed prepare message if the 
anti-compaction completes, but the session was failed in the mean time. This 
prevents a dead lock on the coordinator in the case where the participant 
received a failure message between the time the anti-compaction completes and 
the callback fires.

Let me know what you think. If everything looks ok to you, I’m +1 on committing.

[trunk|https://github.com/bdeggleston/cassandra/tree/15027-trunk]
 
[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/15027-trunk]

> Handle IR prepare phase failures less race prone by waiting for all results
> ---
>
> Key: CASSANDRA-15027
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Local/Compaction
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 4.x
>
>
> Handling incremental repairs as a coordinator begins by sending a 
> {{PrepareConsistentRequest}} message to all participants, which may also 
> include the coordinator itself. Participants will run anti-compactions upon 
> receiving such a message and report the result of the operation back to the 
> coordinator.
> Once we receive a failure response from any of the participants, we fail-fast 
> in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn 
> completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then 
> the repair command will terminate with an error status, as expected.
> The issue is that in case the node will both be coordinator and participant, 
> we may end up with a local session and submitted anti-compactions, which will 
> be executed without any coordination with the coordinator session (on same 
> node). This may result in situations where running repair commands right 
> after another, may cause overlapping execution of anti-compactions that will 
> cause the following (misleading) message to show up in the logs and will 
> cause the repair to fail again:
>  "Prepare phase for incremental repair session %s has failed because it 
> encountered intersecting sstables belonging to another incremental repair 
> session (%s). This is by starting an incremental repair session before a 
> previous one has completed. Check nodetool repair_admin for hung sessions and 
> fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra

2019-02-20 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773524#comment-16773524
 ] 

Blake Eggleston commented on CASSANDRA-14482:
-

[~djoshi3], I pushed up a commit 
[here|https://github.com/bdeggleston/cassandra/commit/716a6b631d68532773bb804e8ce33ab3dd23946c]
 that makes the following changes:

1. simplifies {{ZstdCompressor#compress}} method to just call 
{{Zstd#compress}}. The compressor doesn't support on heap buffers so there's no 
need to do the stream stuff. This is basically how the snappy compressor works.
 2. updates test support method visibility/annotation in ZstdCompressor
 3. adds ratio to CompressorPerformance output

If you're ok with these changes and I haven't broken the tests, I think this is 
ready to commit.

> ZSTD Compressor support in Cassandra
> 
>
> Key: CASSANDRA-14482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14482
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Dependencies, Feature/Compression
>Reporter: Sushma A Devendrappa
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.x
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

2019-02-20 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15027:

Reviewer: Blake Eggleston

> Handle IR prepare phase failures less race prone by waiting for all results
> ---
>
> Key: CASSANDRA-15027
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Local/Compaction
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 4.x
>
>
> Handling incremental repairs as a coordinator begins by sending a 
> {{PrepareConsistentRequest}} message to all participants, which may also 
> include the coordinator itself. Participants will run anti-compactions upon 
> receiving such a message and report the result of the operation back to the 
> coordinator.
> Once we receive a failure response from any of the participants, we fail-fast 
> in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn 
> completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then 
> the repair command will terminate with an error status, as expected.
> The issue is that in case the node will both be coordinator and participant, 
> we may end up with a local session and submitted anti-compactions, which will 
> be executed without any coordination with the coordinator session (on same 
> node). This may result in situations where running repair commands right 
> after another, may cause overlapping execution of anti-compactions that will 
> cause the following (misleading) message to show up in the logs and will 
> cause the repair to fail again:
>  "Prepare phase for incremental repair session %s has failed because it 
> encountered intersecting sstables belonging to another incremental repair 
> session (%s). This is by starting an incremental repair session before a 
> previous one has completed. Check nodetool repair_admin for hung sessions and 
> fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14482) ZSTD Compressor support in Cassandra

2019-02-19 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14482:

Reviewers:   (was: Dinesh Joshi)
 Reviewer: Blake Eggleston

> ZSTD Compressor support in Cassandra
> 
>
> Key: CASSANDRA-14482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14482
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Dependencies, Feature/Compression
>Reporter: Sushma A Devendrappa
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.x
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15007) Incorrect rf validation in SimpleStrategy

2019-02-18 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15007:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Looks good to me. Committed to trunk as 
{{47d4971b56d97ba8a528f7c17bfd6b11f1ababa3}}

> Incorrect rf validation in SimpleStrategy
> -
>
> Key: CASSANDRA-15007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15007
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Semantics
>Reporter: Michael
>Assignee: Dinesh Joshi
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 15007.patch
>
>
> Getting uninformative ConfigurationException when trying to create a keyspace 
> with SimpleStrategy and no replication factor.
> {{cqlsh> create keyspace test with replication = \{'class': 
> 'SimpleStrategy'};}}
> {{ConfigurationException:}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15007) Incorrect rf validation in SimpleStrategy

2019-02-18 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15007:

Reviewers:   (was: Ariel Weisberg)
 Reviewer: Blake Eggleston  (was: Ariel Weisberg)

> Incorrect rf validation in SimpleStrategy
> -
>
> Key: CASSANDRA-15007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15007
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Semantics
>Reporter: Michael
>Assignee: Dinesh Joshi
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 15007.patch
>
>
> Getting uninformative ConfigurationException when trying to create a keyspace 
> with SimpleStrategy and no replication factor.
> {{cqlsh> create keyspace test with replication = \{'class': 
> 'SimpleStrategy'};}}
> {{ConfigurationException:}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15025) Avoid NPE in RepairRunnable.recordFailure

2019-02-15 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769578#comment-16769578
 ] 

Blake Eggleston commented on CASSANDRA-15025:
-

+1

> Avoid NPE in RepairRunnable.recordFailure
> -
>
> Key: CASSANDRA-15025
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15025
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> failureMessage parameter in {{RepairRunnable.recordFailure}} can be null, 
> avoid this happening and make sure we log the actual exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15025) Avoid NPE in RepairRunnable.recordFailure

2019-02-15 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15025:

Status: Ready to Commit  (was: Patch Available)

> Avoid NPE in RepairRunnable.recordFailure
> -
>
> Key: CASSANDRA-15025
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15025
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> failureMessage parameter in {{RepairRunnable.recordFailure}} can be null, 
> avoid this happening and make sure we log the actual exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15024) Don't try to cancel 2i compactions when starting anticompaction

2019-02-15 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15024:

Status: Ready to Commit  (was: Patch Available)

> Don't try to cancel 2i compactions when starting anticompaction
> ---
>
> Key: CASSANDRA-15024
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15024
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> When we start an anticompaction we cancel ongoing compactions, 
> {{runWithCompactionsDisabled}} always cancels compactions for secondary index 
> cfs:es, this causes problem since CASSANDRA-14935 since we check for range 
> intersection which will fail since 2i sstables are LocalPartitioner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15024) Don't try to cancel 2i compactions when starting anticompaction

2019-02-15 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769570#comment-16769570
 ] 

Blake Eggleston commented on CASSANDRA-15024:
-

+1

> Don't try to cancel 2i compactions when starting anticompaction
> ---
>
> Key: CASSANDRA-15024
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15024
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> When we start an anticompaction we cancel ongoing compactions, 
> {{runWithCompactionsDisabled}} always cancels compactions for secondary index 
> cfs:es, this causes problem since CASSANDRA-14935 since we check for range 
> intersection which will fail since 2i sstables are LocalPartitioner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2019-02-13 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767466#comment-16767466
 ] 

Blake Eggleston commented on CASSANDRA-10446:
-

[~laxmikant99] because it's a new feature. 3.11.x is for bugfixes only.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15004) Anti-compaction briefly removes sstables from the read path

2019-01-31 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15004:

Status: Patch Available  (was: Open)

|[3.0|https://github.com/bdeggleston/cassandra/tree/15004-3.0]|[3.11|https://github.com/bdeggleston/cassandra/tree/15004-3.11]|[trunk|https://github.com/bdeggleston/cassandra/tree/15004-trunk]|
|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F15004-3.0]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F15004-3.11]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F15004-trunk]|

> Anti-compaction briefly removes sstables from the read path
> ---
>
> Key: CASSANDRA-15004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Since we use multiple sstable rewriters in anticompaction, the first call to 
> prepareToCommit will remove the original sstables from the tracker view 
> before the other rewriters add their sstables. This creates a brief window 
> where reads can miss data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15004) Anti-compaction briefly removes sstables from the read path

2019-01-31 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15004:

Reviewers: Benedict, Marcus Eriksson

> Anti-compaction briefly removes sstables from the read path
> ---
>
> Key: CASSANDRA-15004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Since we use multiple sstable rewriters in anticompaction, the first call to 
> prepareToCommit will remove the original sstables from the tracker view 
> before the other rewriters add their sstables. This creates a brief window 
> where reads can miss data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15004) Anti-compaction briefly removes sstables from the read path

2019-01-31 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-15004:
---

 Summary: Anti-compaction briefly removes sstables from the read 
path
 Key: CASSANDRA-15004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15004
 Project: Cassandra
  Issue Type: Bug
Reporter: Blake Eggleston
Assignee: Blake Eggleston
 Fix For: 4.0, 3.0.x, 3.11.x


Since we use multiple sstable rewriters in anticompaction, the first call to 
prepareToCommit will remove the original sstables from the tracker view before 
the other rewriters add their sstables. This creates a brief window where reads 
can miss data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15002) Avoid leaking threads when anticompaction fails

2019-01-30 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15002:

Status: Ready to Commit  (was: Patch Available)

> Avoid leaking threads when anticompaction fails
> ---
>
> Key: CASSANDRA-15002
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15002
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> If anticompaction fails on a node, a message is sent to all repair 
> participants that this session is failed. If the other participants 
> successfully finish their anticompactions we will not shut down the executor 
> in `LocalSessions` since we can't change the state from "FAILED" to 
> "PREPARED" and throw exception before calling shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15002) Avoid leaking threads when anticompaction fails

2019-01-30 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756718#comment-16756718
 ] 

Blake Eggleston commented on CASSANDRA-15002:
-

+1

> Avoid leaking threads when anticompaction fails
> ---
>
> Key: CASSANDRA-15002
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15002
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> If anticompaction fails on a node, a message is sent to all repair 
> participants that this session is failed. If the other participants 
> successfully finish their anticompactions we will not shut down the executor 
> in `LocalSessions` since we can't change the state from "FAILED" to 
> "PREPARED" and throw exception before calling shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser

2019-01-30 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756703#comment-16756703
 ] 

Blake Eggleston commented on CASSANDRA-14871:
-

[~snazy] go for it

> Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
> ---
>
> Key: CASSANDRA-14871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14871
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Critical
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
>    There are a couple of places in the code base that do not respect that 
> j.u.HashMap + related classes are not thread safe and some parts rely on 
> internals of the implementation of HM, which can change.
> We have observed failures like {{NullPointerException}} and  
> {{ConcurrentModificationException}} as well as wrong behavior.
> Affected areas in the code base:
>  * {{SizeTieredCompactionStrategy}}
>  * {{DateTieredCompactionStrategy}}
>  * {{TimeWindowCompactionStrategy}}
>  * {{TokenMetadata.Topology}}
>  * {{TypeParser}}
>  * streaming / concurrent access to {{LifecycleTransaction}} (handled in 
> CASSANDRA-14554)
> While the patches for the compaction strategies + {{TypeParser}} are pretty 
> straight forward, the patch for {{TokenMetadata.Topology}} requires it to be 
> made immutable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14939) fix some operational holes in incremental repair

2019-01-29 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14939:

 Reviewer: Marcus Eriksson
Fix Version/s: 4.0
   Status: Patch Available  (was: Open)

|[trunk|https://github.com/bdeggleston/cassandra/tree/14939-trunk]|[dtests|https://github.com/bdeggleston/cassandra-dtest/tree/14939]|[circle|https://circleci.com/workflow-run/587857d3-8d7a-4710-906f-010f499d4910]|

This adds {{\-\-cleanup}}, {{\-\-summarize-pending}}, and 
{{\-\-summarize-repaired}} flags to {{nodetool repair_admin}}. Each of which 
also accept keyspace/table and range args. To support reporting the min/max 
repairedAt time for a given token range across restarts, I had to alter the 
logic for purging old sessions. Finalized sessions will only be deleted if 
there is a newer session covering it's ranges.

> fix some operational holes in incremental repair
> 
>
> Key: CASSANDRA-14939
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14939
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0
>
>
> Incremental repair has a few operational rough spots that make it more 
> difficult to fully automate and operate at scale than it should be.
> * Visibility into whether pending repair data exists for a given token range.
> * Ability to force promotion/demotion of data for completed sessions instead 
> of waiting for compaction.
> * Get the most recent repairedAt timestamp for a given token range.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser

2019-01-28 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754461#comment-16754461
 ] 

Blake Eggleston commented on CASSANDRA-14871:
-

[~snazy] ping

> Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
> ---
>
> Key: CASSANDRA-14871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14871
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Critical
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
>    There are a couple of places in the code base that do not respect that 
> j.u.HashMap + related classes are not thread safe and some parts rely on 
> internals of the implementation of HM, which can change.
> We have observed failures like {{NullPointerException}} and  
> {{ConcurrentModificationException}} as well as wrong behavior.
> Affected areas in the code base:
>  * {{SizeTieredCompactionStrategy}}
>  * {{DateTieredCompactionStrategy}}
>  * {{TimeWindowCompactionStrategy}}
>  * {{TokenMetadata.Topology}}
>  * {{TypeParser}}
>  * streaming / concurrent access to {{LifecycleTransaction}} (handled in 
> CASSANDRA-14554)
> While the patches for the compaction strategies + {{TypeParser}} are pretty 
> straight forward, the patch for {{TokenMetadata.Topology}} requires it to be 
> made immutable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14996) Incremental backups can fill up the disk if they are not being uploaded

2019-01-28 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754459#comment-16754459
 ] 

Blake Eggleston commented on CASSANDRA-14996:
-

This doesn’t seem like the right solution to the problem you’re describing. If 
I understand incremental backup correctly, occasionally skipping hard link 
creation defeats the purpose and will leave you with an incomplete backup and 
lost data on recovery. 

The real problem seems to be that you don’t notice that your backup system has 
failed or your nodes are out of disk space until nodes start falling over. I'd 
probably start by putting some monitoring and alerts around your cluster and 
backup system.

> Incremental backups can fill up the disk if they are not being uploaded
> ---
>
> Key: CASSANDRA-14996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14996
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Shaurya Gupta
>Priority: Major
> Attachments: patch_CASSANDRA-14996, patch_CASSANDRA-14996.html
>
>
> This creates a major problem if the script which uploads the snapshots is 
> triggered via some API and the application triggering it is somehow down and 
> is not able to hit the API. This affects Cassandra and slowly one or more 
> Cassandra nodes go down.
> There could be a check in Cassandra that before creating a snapshot it checks 
> that the disk has some sufficient (configurable via cassandra.yaml and jmx) 
> disk space.
> This could be achieved by making change in 
> public void maybeIncrementallyBackup(final Iterable sstables)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable

2019-01-18 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746698#comment-16746698
 ] 

Blake Eggleston commented on CASSANDRA-13508:
-

Keyspace level lwts would be cool, but I think they've always been more of a 
pony than a feature anyone was seriously considering implementing. Implicitly 
expanding paxos granularity from table/primary_key into keyspace/primary_key 
isn't a good idea imo, since you can introduce cross table contention that 
users don't necessarily need or want. Also, assuming keyspace lwts does become 
a thing, splitting system.paxos up per table wouldn't necessarily preclude 
them. 

> Make system.paxos table compaction strategy configurable
> 
>
> Key: CASSANDRA-13508
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13508
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Major
>  Labels: LWT, core, paxos
> Fix For: 4.x
>
> Attachments: test11.png, test2.png
>
>
> The default compaction strategy for {{system.paxos}} table is LCS for 
> performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the 
> system is busy with {{system.paxos}} compaction.
> As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our 
> test, it significantly reduced the number of compaction without impacting the 
> latency too much:
> !test11.png!
> The time window for TWCS is set to 2 minutes for the test.
> Here is the p99 latency impact:
> !test2.png!
> the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% 
> increase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14935) PendingAntiCompaction should be more judicious in the compactions it cancels

2019-01-16 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14935:

Status: Ready to Commit  (was: Patch Available)

> PendingAntiCompaction should be more judicious in the compactions it cancels
> 
>
> Key: CASSANDRA-14935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14935
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> PendingAntiCompaction currently cancels all ongoing compactions to be able to 
> grab the sstables it requires - it should;
> * avoid stopping ongoing anticompactions - for example, if we have an sstable 
> [0, 100] and are anticompacting it on [0, 50] - now if we start a new 
> incremental repair on [51, 100] we will (try to) stop the first one. Instead 
> we should fail the new incremental repair request.
> * avoid stopping unrelated regular compactions - we should only stop regular 
> compactions for the sstables the new anticompaction needs
> This requires us to keep track of which sstables are being compacted by a 
> given compaction id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14935) PendingAntiCompaction should be more judicious in the compactions it cancels

2019-01-16 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744237#comment-16744237
 ] 

Blake Eggleston commented on CASSANDRA-14935:
-

+1

> PendingAntiCompaction should be more judicious in the compactions it cancels
> 
>
> Key: CASSANDRA-14935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14935
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> PendingAntiCompaction currently cancels all ongoing compactions to be able to 
> grab the sstables it requires - it should;
> * avoid stopping ongoing anticompactions - for example, if we have an sstable 
> [0, 100] and are anticompacting it on [0, 50] - now if we start a new 
> incremental repair on [51, 100] we will (try to) stop the first one. Instead 
> we should fail the new incremental repair request.
> * avoid stopping unrelated regular compactions - we should only stop regular 
> compactions for the sstables the new anticompaction needs
> This requires us to keep track of which sstables are being compacted by a 
> given compaction id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14935) PendingAntiCompaction should be more judicious in the compactions it cancels

2019-01-15 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743256#comment-16743256
 ] 

Blake Eggleston commented on CASSANDRA-14935:
-

Looks functionally solid, I like how you collapsed the 2 acquisition attempts 
in {{PendingAntiCompaction}}. There are a few minor / stylistic things though:
 * There’s an unused import in {{AutoSavingCache}}
 * I think {{CancellationType}} can be removed from {{CompactionInfo}}. 
Anywhere it’s set to {{ALL}}, {{sstables.isEmpty()}} should always be true, 
making it redundant.
 * We shouldn't prompt users to check repair_admin in the exception message if 
there are conflicting anti-compactions. Those won’t be caused by hung sessions, 
but users starting multiple repairs at the same time.
 * I think we should use ImmutableSet inside CompactionInfo, but I don’t think 
we should make callers make the immutable copy. If we just accepted a Set and 
made an immutable copy in the ctor, the places we construct CompactionInfo 
would be a bit cleaner.
 * Nit: when quickly reading code, referring to the ActiveCompactionTracker 
instance as {{tracker}} clashes (in my head) with the cfs data tracker. Not a 
huge deal, but calling it something like {{compactions}} or 
{{activeCompactions}} would make it a bit clearer at a glance.

> PendingAntiCompaction should be more judicious in the compactions it cancels
> 
>
> Key: CASSANDRA-14935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14935
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> PendingAntiCompaction currently cancels all ongoing compactions to be able to 
> grab the sstables it requires - it should;
> * avoid stopping ongoing anticompactions - for example, if we have an sstable 
> [0, 100] and are anticompacting it on [0, 50] - now if we start a new 
> incremental repair on [51, 100] we will (try to) stop the first one. Instead 
> we should fail the new incremental repair request.
> * avoid stopping unrelated regular compactions - we should only stop regular 
> compactions for the sstables the new anticompaction needs
> This requires us to keep track of which sstables are being compacted by a 
> given compaction id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser

2019-01-07 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736454#comment-16736454
 ] 

Blake Eggleston commented on CASSANDRA-14871:
-

[~snazy] is this ready to commit?

> Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
> ---
>
> Key: CASSANDRA-14871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14871
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Critical
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
>    There are a couple of places in the code base that do not respect that 
> j.u.HashMap + related classes are not thread safe and some parts rely on 
> internals of the implementation of HM, which can change.
> We have observed failures like {{NullPointerException}} and  
> {{ConcurrentModificationException}} as well as wrong behavior.
> Affected areas in the code base:
>  * {{SizeTieredCompactionStrategy}}
>  * {{DateTieredCompactionStrategy}}
>  * {{TimeWindowCompactionStrategy}}
>  * {{TokenMetadata.Topology}}
>  * {{TypeParser}}
>  * streaming / concurrent access to {{LifecycleTransaction}} (handled in 
> CASSANDRA-14554)
> While the patches for the compaction strategies + {{TypeParser}} are pretty 
> straight forward, the patch for {{TokenMetadata.Topology}} requires it to be 
> made immutable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14936) Anticompaction should throw exceptions on errors, not just log them

2018-12-19 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725146#comment-16725146
 ] 

Blake Eggleston commented on CASSANDRA-14936:
-

looks good to me

> Anticompaction should throw exceptions on errors, not just log them
> ---
>
> Key: CASSANDRA-14936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14936
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> Anticompaction currently catches any exceptions and just logs them instead of 
> rethrowing, this can cause us to overstream and leave sstables unrepaired.
> This was made more likely to happen with CASSANDRA-14397 (before that 
> anticompactions could not be stopped at all).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14939) fix some operational holes in incremental repair

2018-12-18 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724481#comment-16724481
 ] 

Blake Eggleston commented on CASSANDRA-14939:
-

/cc [~mbyrd], [~spo...@gmail.com]

> fix some operational holes in incremental repair
> 
>
> Key: CASSANDRA-14939
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14939
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
>
> Incremental repair has a few operational rough spots that make it more 
> difficult to fully automate and operate at scale than it should be.
> * Visibility into whether pending repair data exists for a given token range.
> * Ability to force promotion/demotion of data for completed sessions instead 
> of waiting for compaction.
> * Get the most recent repairedAt timestamp for a given token range.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14936) Anticompaction should throw exceptions on errors, not just log them

2018-12-18 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724180#comment-16724180
 ] 

Blake Eggleston commented on CASSANDRA-14936:
-

Ah ok I guess that part is complete. +1 then

> Anticompaction should throw exceptions on errors, not just log them
> ---
>
> Key: CASSANDRA-14936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14936
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> Anticompaction currently catches any exceptions and just logs them instead of 
> rethrowing, this can cause us to overstream and leave sstables unrepaired.
> This was made more likely to happen with CASSANDRA-14397 (before that 
> anticompactions could not be stopped at all).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14936) Anticompaction should throw exceptions on errors, not just log them

2018-12-17 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723464#comment-16723464
 ] 

Blake Eggleston commented on CASSANDRA-14936:
-

This addresses this ticket fine, but it seems like there's some stuff related 
to CASSANDRA-14935 in here as well. Could you put that in a separate branch?

> Anticompaction should throw exceptions on errors, not just log them
> ---
>
> Key: CASSANDRA-14936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14936
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> Anticompaction currently catches any exceptions and just logs them instead of 
> rethrowing, this can cause us to overstream and leave sstables unrepaired.
> This was made more likely to happen with CASSANDRA-14397 (before that 
> anticompactions could not be stopped at all).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14939) fix some operational holes in incremental repair

2018-12-17 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-14939:
---

 Summary: fix some operational holes in incremental repair
 Key: CASSANDRA-14939
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14939
 Project: Cassandra
  Issue Type: Bug
Reporter: Blake Eggleston
Assignee: Blake Eggleston


Incremental repair has a few operational rough spots that make it more 
difficult to fully automate and operate at scale than it should be.

* Visibility into whether pending repair data exists for a given token range.
* Ability to force promotion/demotion of data for completed sessions instead of 
waiting for compaction.
* Get the most recent repairedAt timestamp for a given token range.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



<    1   2   3   4   5   6   7   8   9   10   >