[jira] [Created] (CASSANDRA-16252) centos 7
Karim Chowdhury created CASSANDRA-16252: --- Summary: centos 7 Key: CASSANDRA-16252 URL: https://issues.apache.org/jira/browse/CASSANDRA-16252 Project: Cassandra Issue Type: Bug Reporter: Karim Chowdhury We are using centos 7. we have an 10 node Cassandra Cluster and currently we are facing issues with one of them with following error: Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException or java.lang.AssertionError: Added column does not sort as the last colum I already tried to remove the data folder, start scrub and repair. Error happens after 24h again. Need help -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227604#comment-17227604 ] David Capwell commented on CASSANDRA-15214: --- +1 from me with small comment, see PR. I tested this patch by breaking byte buffer allocation to run out of direct memory, in doing so found an edge case on client (.transport package) code, so once that is fixed client and internode shut down on OOM. > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Yifan Cai >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > Time Spent: 0.5h > Remaining Estimate: 0h > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.
[ https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-16248: - Resolution: Fixed Status: Resolved (was: Open) Works for me too, that commit got it. > GossipTest hangs until timeout, then fails. > --- > > Key: CASSANDRA-16248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16248 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown, Messaging/Internode, > Test/dtest/java >Reporter: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0-beta4 > > > A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} > * There seems to have been a merge/commit race between CASSANDRA-16146 > ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e]) > and CASSANDRA-15935 > ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]). > The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, > but the latter changed the method signature, so this never actually gets > injected. This causes a latch in the test not to be counted down and it hangs > until timeout. > * After fixing the test code, it still hangs due to changes to > {{server_encryption_options}} initialization in CASSANDRA-16144 > ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]). > It appears to be causing an incorrect keystore location to be specified, > which causes instance startup to fail, again leading to the test hanging > until it times out. I don't have the cycles to dig into this further right > now, but reverting that commit (and making the test fix above) restores the > green bar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.
[ https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227583#comment-17227583 ] David Capwell commented on CASSANDRA-16248: --- {code} ./ci-test org/apache/cassandra/distributed/test/GossipTest ... testclasslist: [echo] Number of test runners: 1 [mkdir] Created dir: /Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/cassandra [mkdir] Created dir: /Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/output [junit-timeout] Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true [junit-timeout] Testsuite: org.apache.cassandra.distributed.test.GossipTest [junit-timeout] Testsuite: org.apache.cassandra.distributed.test.GossipTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 56.095 sec [junit-timeout] BUILD SUCCESSFUL Total time: 1 minute 37 seconds {code} trunk is working for me again, thanks for the commit [~yifanc] and [~brandon.williams] > GossipTest hangs until timeout, then fails. > --- > > Key: CASSANDRA-16248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16248 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown, Messaging/Internode, > Test/dtest/java >Reporter: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0-beta4 > > > A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} > * There seems to have been a merge/commit race between CASSANDRA-16146 > ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e]) > and CASSANDRA-15935 > ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]). > The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, > but the latter changed the method signature, so this never actually gets > injected. This causes a latch in the test not to be counted down and it hangs > until timeout. > * After fixing the test code, it still hangs due to changes to > {{server_encryption_options}} initialization in CASSANDRA-16144 > ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]). > It appears to be causing an incorrect keystore location to be specified, > which causes instance startup to fail, again leading to the test hanging > until it times out. I don't have the cycles to dig into this further right > now, but reverting that commit (and making the test fix above) restores the > green bar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227575#comment-17227575 ] Sam Tunnicliffe commented on CASSANDRA-15299: - I just pushed a commit which renames {{o.a.c.t.Frame}} to {{Envelope}}, which IMO this greatly reduces the cognitive friction here. There are no client facing changes involved, the renaming is purely internal (aside from docs, I've updated the WIP asciidoc on V5 framing, but will get the main protocol spec in CASSANDRA-14688, asap) > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > --- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client >Reporter: Aleksey Yeschenko >Assignee: Sam Tunnicliffe >Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > Attachments: Process CQL Frame.png, V5 Flow Chart.png > > > CASSANDRA-13304 made an important improvement to our native protocol: it > introduced checksumming/CRC32 to request and response bodies. It’s an > important step forward, but it doesn’t cover the entire stream. In > particular, the message header is not covered by a checksum or a crc, which > poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren’t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of > computing the CRC32 on the bytes written on the wire - losing the properties > of the CRC32. In some cases, due to this sequencing, attempting to decompress > a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte order, > also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for > explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since > *way* before CASSANDRA-13304. Importantly, both checksumming and compression > operate on individual message bodies rather than frames of multiple complete > messages. In reality, this has several important additional downsides. To > name a couple: > # For compression, we are getting poor compression ratios for smaller > messages - when operating on tiny sequences of bytes. In reality, for most > small requests and responses we are discarding the compressed value as it’d > be smaller than the uncompressed one - incurring both redundant allocations > and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messages. > 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the > checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we > should switch to a framing protocol with multiple messages in a single frame. > I suggest we reuse the framing protocol recently implemented for internode > messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, > and that we do it before native protocol v5 graduates from beta. See > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java > and > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227573#comment-17227573 ] Andres de la Peña commented on CASSANDRA-16183: --- Great, thanks. Overall the approach looks good to me. I have added a few initial minor comments, I'll finish my review early next week. > Add tests to cover ClientRequest metrics > - > > Key: CASSANDRA-16183 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16183 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Benjamin Lerer >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > We do not have test that covers the ClientRequest metrics. > * ClientRequestMetrics > * CASClientRequestMetrics > * CASClientWriteRequestMetrics > * ClientWriteRequestMetrics > * ViewWriteMetrics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.
[ https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227552#comment-17227552 ] Brandon Williams commented on CASSANDRA-16248: -- I fixed the tests in e5ab8c1951, but cc [~yifanc] > GossipTest hangs until timeout, then fails. > --- > > Key: CASSANDRA-16248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16248 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown, Messaging/Internode, > Test/dtest/java >Reporter: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0-beta4 > > > A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} > * There seems to have been a merge/commit race between CASSANDRA-16146 > ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e]) > and CASSANDRA-15935 > ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]). > The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, > but the latter changed the method signature, so this never actually gets > injected. This causes a latch in the test not to be counted down and it hangs > until timeout. > * After fixing the test code, it still hangs due to changes to > {{server_encryption_options}} initialization in CASSANDRA-16144 > ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]). > It appears to be causing an incorrect keystore location to be specified, > which causes instance startup to fail, again leading to the test hanging > until it times out. I don't have the cycles to dig into this further right > now, but reverting that commit (and making the test fix above) restores the > green bar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.0 updated: Fix tests broken by CASSANDRA-16146
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-3.0 by this push: new e5ab8c1 Fix tests broken by CASSANDRA-16146 e5ab8c1 is described below commit e5ab8c1951384b9ddf0df9f1d4d49b4c9dfc188f Author: yifan-c AuthorDate: Tue Nov 3 15:30:30 2020 -0800 Fix tests broken by CASSANDRA-16146 Patch by Yifan Cai, reviewed by brandonwilliams for CASSANDRA-16146 --- src/java/org/apache/cassandra/service/StorageService.java | 13 - .../org/apache/cassandra/distributed/impl/Instance.java | 2 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 7645091..c4309f8 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -1225,10 +1225,21 @@ public class StorageService extends NotificationBroadcasterSupport implements IE } @VisibleForTesting // only used by test -public void setMovingModeUnsafe() { +public void setMovingModeUnsafe() +{ setMode(Mode.MOVING, true); } +/** + * Only used in jvm dtest when not using GOSSIP. + * See org.apache.cassandra.distributed.impl.Instance#initializeRing(org.apache.cassandra.distributed.api.ICluster) + */ +@VisibleForTesting +public void setNormalModeUnsafe() +{ +setMode(Mode.NORMAL, true); +} + private void setMode(Mode m, boolean log) { setMode(m, null, log); diff --git a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java index 4f799ee..f72661d 100644 --- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java +++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java @@ -690,6 +690,8 @@ public class Instance extends IsolatedExecutor implements IInvokableInstance // check that all nodes are in token metadata for (int i = 0; i < tokens.size(); ++i) assert storageService.getTokenMetadata().isMember(hosts.get(i).getAddress()); + +storageService.setNormalModeUnsafe(); } catch (Throwable e) // UnknownHostException { - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 94f940cc50c72bcd819098e97548ab28d576bfac Merge: 3200bcf e5ab8c1 Author: Brandon Williams AuthorDate: Fri Nov 6 11:42:01 2020 -0600 Merge branch 'cassandra-3.0' into cassandra-3.11 src/java/org/apache/cassandra/service/StorageService.java | 13 - .../org/apache/cassandra/distributed/impl/Instance.java | 2 ++ .../distributed/test/ClientNetworkStopStartTest.java| 1 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --cc test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java index da0731e,da0731e..0aabc8c --- a/test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java +++ b/test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java @@@ -51,6 -51,6 +51,7 @@@ public class ClientNetworkStopStartTes @Test public void stopStartThrift() throws IOException, TException { ++// GOSSIP is needed in order to initServer correctly. try (Cluster cluster = init(Cluster.build(1).withConfig(c -> c.with(Feature.NATIVE_PROTOCOL)).start())) { IInvokableInstance node = cluster.get(1); - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (0700d79 -> 9ac9a93)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 0700d79 Circleci should run cqlshlib tests as well new e5ab8c1 Fix tests broken by CASSANDRA-16146 new 94f940c Merge branch 'cassandra-3.0' into cassandra-3.11 new 9ac9a93 Merge branch 'cassandra-3.11' into trunk The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: src/java/org/apache/cassandra/service/StorageService.java | 13 - .../org/apache/cassandra/distributed/impl/Instance.java | 1 + .../org/apache/cassandra/distributed/test/GossipTest.java | 12 ++-- 3 files changed, 15 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227544#comment-17227544 ] Brandon Williams commented on CASSANDRA-16146: -- Committed, thanks! > Node state incorrectly set to NORMAL after nodetool disablegossip and > enablegossip during bootstrap > --- > > Key: CASSANDRA-16146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16146 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta3 > > Time Spent: 20m > Remaining Estimate: 0h > > At high level, {{StorageService#setGossipTokens}} set the gossip state to > {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) > overrides the actual gossip state. > > It could happen in the below scenario. > # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and > code execution exits StorageService#initServer. > # Operator runs nodetool to stop and re-start gossip. The gossip state gets > flipped to {{NORMAL}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.11 updated (3200bcf -> 94f940c)
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 3200bcf Merge branch 'cassandra-3.0' into cassandra-3.11 new e5ab8c1 Fix tests broken by CASSANDRA-16146 new 94f940c Merge branch 'cassandra-3.0' into cassandra-3.11 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: src/java/org/apache/cassandra/service/StorageService.java | 13 - .../org/apache/cassandra/distributed/impl/Instance.java | 2 ++ .../distributed/test/ClientNetworkStopStartTest.java| 1 + 3 files changed, 15 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 9ac9a9343540e67f4609f75dd5199b2a66624488 Merge: 0700d79 94f940c Author: Brandon Williams AuthorDate: Fri Nov 6 11:43:23 2020 -0600 Merge branch 'cassandra-3.11' into trunk src/java/org/apache/cassandra/service/StorageService.java | 13 - .../org/apache/cassandra/distributed/impl/Instance.java | 1 + .../org/apache/cassandra/distributed/test/GossipTest.java | 12 ++-- 3 files changed, 15 insertions(+), 11 deletions(-) diff --cc src/java/org/apache/cassandra/service/StorageService.java index d7d3ebe,7dea7a0..47f82b8 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@@ -1464,6 -1488,16 +1465,16 @@@ public class StorageService extends Not setMode(Mode.MOVING, true); } + /** + * Only used in jvm dtest when not using GOSSIP. - * See org.apache.cassandra.distributed.impl.Instance#initializeRing(org.apache.cassandra.distributed.api.ICluster) ++ * See org.apache.cassandra.distributed.impl.Instance#startup(org.apache.cassandra.distributed.api.ICluster) + */ + @VisibleForTesting + public void setNormalModeUnsafe() + { + setMode(Mode.NORMAL, true); + } + private void setMode(Mode m, boolean log) { setMode(m, null, log); diff --cc test/distributed/org/apache/cassandra/distributed/impl/Instance.java index 4c778f1,50aea0b..2fc7044 --- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java +++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java @@@ -481,13 -554,7 +481,14 @@@ public class Instance extends IsolatedE } else { -initializeRing(cluster); +cluster.stream().forEach(peer -> { +if (cluster instanceof Cluster) +GossipHelper.statusToNormal((IInvokableInstance) peer).accept(this); +else +GossipHelper.unsafeStatusToNormal(this, (IInstance) peer); +}); + ++StorageService.instance.setNormalModeUnsafe(); } StorageService.instance.ensureTraceKeyspace(); diff --cc test/distributed/org/apache/cassandra/distributed/test/GossipTest.java index a162ebf,32ecb95..1b6a004 --- a/test/distributed/org/apache/cassandra/distributed/test/GossipTest.java +++ b/test/distributed/org/apache/cassandra/distributed/test/GossipTest.java @@@ -19,17 -19,17 +19,13 @@@ package org.apache.cassandra.distributed.test; import java.io.Closeable; --import java.net.InetAddress; import java.util.Collection; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; --import java.util.concurrent.locks.LockSupport; --import java.util.stream.Collectors; --import com.google.common.collect.Iterables; import com.google.common.util.concurrent.Uninterruptibles; import org.junit.Assert; import org.junit.Test; @@@ -39,11 -39,11 +35,7 @@@ import net.bytebuddy.dynamic.loading.Cl import net.bytebuddy.implementation.MethodDelegation; import org.apache.cassandra.dht.Token; import org.apache.cassandra.distributed.Cluster; --import org.apache.cassandra.gms.ApplicationState; --import org.apache.cassandra.gms.EndpointState; --import org.apache.cassandra.gms.Gossiper; import org.apache.cassandra.service.StorageService; --import org.apache.cassandra.utils.FBUtilities; import static net.bytebuddy.matcher.ElementMatchers.named; import static net.bytebuddy.matcher.ElementMatchers.takesArguments; @@@ -61,13 -132,13 +53,13 @@@ public class GossipTest extends TestBas if (nodeNumber != 2) return; new ByteBuddy().rebase(StorageService.class) -- .method(named("bootstrap").and(takesArguments(1))) ++ .method(named("bootstrap").and(takesArguments(2))) .intercept(MethodDelegation.to(BBBootstrapInterceptor.class)) .make() .load(cl, ClassLoadingStrategy.Default.INJECTION); } --public static boolean bootstrap(Collection tokens) throws Exception ++public static boolean bootstrap(Collection tokens, long bootstrapTimeoutMillis) { bootstrapStart.countDown(); Uninterruptibles.awaitUninterruptibly(bootstrapReady); - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additiona
[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227536#comment-17227536 ] Yifan Cai commented on CASSANDRA-16146: --- Sure thing. Comment was just added for the unsafe method in each branch. > Node state incorrectly set to NORMAL after nodetool disablegossip and > enablegossip during bootstrap > --- > > Key: CASSANDRA-16146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16146 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta3 > > Time Spent: 20m > Remaining Estimate: 0h > > At high level, {{StorageService#setGossipTokens}} set the gossip state to > {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) > overrides the actual gossip state. > > It could happen in the below scenario. > # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and > code execution exits StorageService#initServer. > # Operator runs nodetool to stop and re-start gossip. The gossip state gets > flipped to {{NORMAL}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227533#comment-17227533 ] Adam Holmberg commented on CASSANDRA-16183: --- I also appreciate using PRs for review. Here's one against my fork: https://github.com/aholmberg/cassandra-dtest/pull/1 > Add tests to cover ClientRequest metrics > - > > Key: CASSANDRA-16183 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16183 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Benjamin Lerer >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > We do not have test that covers the ClientRequest metrics. > * ClientRequestMetrics > * CASClientRequestMetrics > * CASClientWriteRequestMetrics > * ClientWriteRequestMetrics > * ViewWriteMetrics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227474#comment-17227474 ] Andres de la Peña commented on CASSANDRA-16183: --- We don't merge the PRs nor require them, but I found them useful for the review comments, and in my experience it's usual to have them for that purpose (see [here|https://github.com/apache/cassandra/pulls] and [here)|https://github.com/apache/cassandra-dtest/pulls], although tickets without PR are not unusual either. I think that without a PR we can only add comments on each of the nine individual commits, but not on the diff, so it's difficult to have a global vision. Also I'm not sure whether those comments would survive a squash, as they do with PRs. WDYT? I can create the PR if you don't disagree, or perhaps we can squash the changes and comment on a single commit. > Add tests to cover ClientRequest metrics > - > > Key: CASSANDRA-16183 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16183 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Benjamin Lerer >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > We do not have test that covers the ClientRequest metrics. > * ClientRequestMetrics > * CASClientRequestMetrics > * CASClientWriteRequestMetrics > * ClientWriteRequestMetrics > * ViewWriteMetrics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
[ https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-16249: --- Fix Version/s: (was: 4.0.x) 4.0-beta > ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position > --- > > Key: CASSANDRA-16249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16249 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer > position when calculating the final offset for reading data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
[ https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-16249: --- Reviewers: Benjamin Lerer > ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position > --- > > Key: CASSANDRA-16249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16249 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0.x > > Time Spent: 10m > Remaining Estimate: 0h > > {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer > position when calculating the final offset for reading data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227463#comment-17227463 ] Adam Holmberg commented on CASSANDRA-16183: --- I just linked a diff in the previous comment. Here: https://github.com/apache/cassandra-dtest/compare/trunk...aholmberg:CASSANDRA-16183 Do you want me to create an actual PR? I was under the impression we don't actually take those on the mirrors. > Add tests to cover ClientRequest metrics > - > > Key: CASSANDRA-16183 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16183 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Benjamin Lerer >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > We do not have test that covers the ClientRequest metrics. > * ClientRequestMetrics > * CASClientRequestMetrics > * CASClientWriteRequestMetrics > * ClientWriteRequestMetrics > * ViewWriteMetrics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227455#comment-17227455 ] Andres de la Peña commented on CASSANDRA-16183: --- [~aholmber] is there a PR for the dtest patch? > Add tests to cover ClientRequest metrics > - > > Key: CASSANDRA-16183 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16183 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Benjamin Lerer >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > We do not have test that covers the ClientRequest metrics. > * ClientRequestMetrics > * CASClientRequestMetrics > * CASClientWriteRequestMetrics > * ClientWriteRequestMetrics > * ViewWriteMetrics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227449#comment-17227449 ] Benedict Elliott Smith commented on CASSANDRA-12126: Yes, that sounds like a great idea, and I really appreciate you offering to take that to the list. I'll chime in with any necessary details to help inform the decision, but will try not to influence it otherwise. I don't have a strong opinion about which of those four options we select, except that my experiments do suggest (3) is perhaps dangerous for some of our users. It's probably a trade-off that should be made with careful business consideration and experimentation by each end user. As far as delaying 4.0 is concerned, that's probably also a matter of community decision-making. We could quite quickly have a patch, that has been reviewed by multiple committers, posted in fairly short order - perhaps before we exit beta. This work will have had much greater validation than the current implementation, but publishing all of this validation work will take longer - likely also achievable before GA, but we might have to invert our process a little. Perhaps this is acceptable, given the balance of correctness and regression we're considering as an alternative, but given my proximity to the work (and that I also don't have a strong position either way), I would prefer to let others make that call. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227448#comment-17227448 ] Brandon Williams commented on CASSANDRA-16146: -- LGTM, supernit: I think it's a good idea to have some comment around unsafe methods, but that's easy enough to add on commit. > Node state incorrectly set to NORMAL after nodetool disablegossip and > enablegossip during bootstrap > --- > > Key: CASSANDRA-16146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16146 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta3 > > Time Spent: 20m > Remaining Estimate: 0h > > At high level, {{StorageService#setGossipTokens}} set the gossip state to > {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) > overrides the actual gossip state. > > It could happen in the below scenario. > # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and > code execution exits StorageService#initServer. > # Operator runs nodetool to stop and re-start gossip. The gossip state gets > flipped to {{NORMAL}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16251) SSTableLoader documentation needs improvement
[ https://issues.apache.org/jira/browse/CASSANDRA-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16251: Change Category: Semantic Complexity: Normal Component/s: Documentation/Website Priority: Low (was: Normal) Status: Open (was: Triage Needed) > SSTableLoader documentation needs improvement > - > > Key: CASSANDRA-16251 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16251 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Ekaterina Dimitrova >Priority: Low > > SSTableLoader documentation is unclear. > Offline/online usage; directories; steps to use it - It is unclear and > sometimes for a new user. > /CC [~lor...@datastax.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16251) SSTableLoader documentation needs improvement
Ekaterina Dimitrova created CASSANDRA-16251: --- Summary: SSTableLoader documentation needs improvement Key: CASSANDRA-16251 URL: https://issues.apache.org/jira/browse/CASSANDRA-16251 Project: Cassandra Issue Type: Improvement Reporter: Ekaterina Dimitrova SSTableLoader documentation is unclear. Offline/online usage; directories; steps to use it - It is unclear and sometimes for a new user. /CC [~lor...@datastax.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227423#comment-17227423 ] Ekaterina Dimitrova commented on CASSANDRA-14013: - ??To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above.?? To support this, the first time I was reading for the SSTableLoader and trying to use it, I did exactly what you said and got really frustrated :-) I will open a ticket to [~lor...@datastax.com] to do her magic :-) > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16250) LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-16250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer reassigned CASSANDRA-16250: -- Assignee: Benjamin Lerer > LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky > --- > > Key: CASSANDRA-16250 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16250 > Project: Cassandra > Issue Type: Improvement > Components: Test/burn >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer >Priority: Normal > > Within the burn tests > {{LongSharedExecutorPoolTest.testPromptnessOfExecution}} fail regularily with > the following stacktrace: > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:213) > at > org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:102) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16250) LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky
Benjamin Lerer created CASSANDRA-16250: -- Summary: LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky Key: CASSANDRA-16250 URL: https://issues.apache.org/jira/browse/CASSANDRA-16250 Project: Cassandra Issue Type: Improvement Components: Test/burn Reporter: Benjamin Lerer Within the burn tests {{LongSharedExecutorPoolTest.testPromptnessOfExecution}} fail regularily with the following stacktrace: {code} junit.framework.AssertionFailedError at org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:213) at org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:102) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16171) Remove Windows scripts
[ https://issues.apache.org/jira/browse/CASSANDRA-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227386#comment-17227386 ] Yuki Morishita commented on CASSANDRA-16171: Sorry for delay. Updated my PR with: * removing install instruction from README * removing .bat/.ps1 ref from build.xml / rpm spec I think we don't have to touch CHANGES/NEWS. I left make.bat for doc, it is for development and it does not go into release artifacts. > Remove Windows scripts > -- > > Key: CASSANDRA-16171 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16171 > Project: Cassandra > Issue Type: Task > Components: Packaging >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Normal > Fix For: 4.0-rc > > Time Spent: 10m > Remaining Estimate: 0h > > As per the email thread in cassandra-dev mailing list[1], remove windows > scripts from Cassandra 4.0 onwards, due to the lack of maintenance and tests. > 1: https://www.mail-archive.com/dev@cassandra.apache.org/msg15583.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16171) Remove Windows scripts
[ https://issues.apache.org/jira/browse/CASSANDRA-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-16171: --- Status: Patch Available (was: In Progress) > Remove Windows scripts > -- > > Key: CASSANDRA-16171 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16171 > Project: Cassandra > Issue Type: Task > Components: Packaging >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Normal > Fix For: 4.0-rc > > Time Spent: 10m > Remaining Estimate: 0h > > As per the email thread in cassandra-dev mailing list[1], remove windows > scripts from Cassandra 4.0 onwards, due to the lack of maintenance and tests. > 1: https://www.mail-archive.com/dev@cassandra.apache.org/msg15583.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16171) Remove Windows scripts
[ https://issues.apache.org/jira/browse/CASSANDRA-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-16171: --- Status: In Progress (was: Changes Suggested) > Remove Windows scripts > -- > > Key: CASSANDRA-16171 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16171 > Project: Cassandra > Issue Type: Task > Components: Packaging >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Normal > Fix For: 4.0-rc > > Time Spent: 10m > Remaining Estimate: 0h > > As per the email thread in cassandra-dev mailing list[1], remove windows > scripts from Cassandra 4.0 onwards, due to the lack of maintenance and tests. > 1: https://www.mail-archive.com/dev@cassandra.apache.org/msg15583.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227379#comment-17227379 ] Benjamin Lerer edited comment on CASSANDRA-12126 at 11/6/20, 12:44 PM: --- It seem to me that there are several options here: # Try to use your proposal for 4.0 if the community has the appetite for it. The main issue there is some potential extra delay for 4.0 # Do nothing for 4.0. Meaning do not commit the patch. We have lived a long time with that issue and we can probably wait a bit more for a proper solution. # Commit the patch as such, fixing the correctness but introducting potentially some performance issue until we release a better solution. # Changing the patch to default to the current behavior but allowing people to enable the new one if the correctness is a problem for them. May be we should trigger a discussion on the mailing list and see what is other people opinion. I can take care of that next week if you think it is a good idea. was (Author: blerer): It seem to me that there are several options here: # Try to use your proposal for 4.0 if the community has the appetite for it. The main issue there is some potential extra delay for 4.0 # Do nothing for 4.0. Meaning do not commit the patch. We have lived a long time with that issue and we can probably wait a bit more for a proper solution. # Commit the patch as such, fixing the correctness but introducting potentially some performance issue until we release a better solution. # Changing the patch to default to the current behavior but allowing people to enable the new one if the correctness is a problem for them. May be we should trigger a discussion on the mailing list and see what is other people opinion. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227379#comment-17227379 ] Benjamin Lerer commented on CASSANDRA-12126: It seem to me that there are several options here: # Try to use your proposal for 4.0 if the community has the appetite for it. The main issue there is some potential extra delay for 4.0 # Do nothing for 4.0. Meaning do not commit the patch. We have lived a long time with that issue and we can probably wait a bit more for a proper solution. # Commit the patch as such, fixing the correctness but introducting potentially some performance issue until we release a better solution. # Changing the patch to default to the current behavior but allowing people to enable the new one if the correctness is a problem for them. May be we should trigger a discussion on the mailing list and see what is other people opinion. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16183) Add tests to cover ClientRequest metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andres de la Peña updated CASSANDRA-16183: -- Reviewers: Andres de la Peña > Add tests to cover ClientRequest metrics > - > > Key: CASSANDRA-16183 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16183 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Benjamin Lerer >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > We do not have test that covers the ClientRequest metrics. > * ClientRequestMetrics > * CASClientRequestMetrics > * CASClientWriteRequestMetrics > * ClientWriteRequestMetrics > * ViewWriteMetrics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
[ https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-16249: -- Test and Documentation Plan: Run regression tests Status: Patch Available (was: In Progress) > ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position > --- > > Key: CASSANDRA-16249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16249 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0.x > > Time Spent: 10m > Remaining Estimate: 0h > > {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer > position when calculating the final offset for reading data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
[ https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-16249: -- Fix Version/s: 4.0.x Source Control Link: https://github.com/apache/cassandra/pull/811 > ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position > --- > > Key: CASSANDRA-16249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16249 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0.x > > Time Spent: 10m > Remaining Estimate: 0h > > {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer > position when calculating the final offset for reading data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
[ https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-16249: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Unrecoverable Corruption / Loss(13161) Complexity: Low Hanging Fruit Component/s: Legacy/Core Discovered By: Code Inspection Severity: Critical Status: Open (was: Triage Needed) > ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position > --- > > Key: CASSANDRA-16249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16249 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > > {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer > position when calculating the final offset for reading data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
Jacek Lewandowski created CASSANDRA-16249: - Summary: ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position Key: CASSANDRA-16249 URL: https://issues.apache.org/jira/browse/CASSANDRA-16249 Project: Cassandra Issue Type: Bug Reporter: Jacek Lewandowski Assignee: Jacek Lewandowski {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer position when calculating the final offset for reading data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.
[ https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-16248: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: Unit Test Fix Version/s: 4.0-beta4 Severity: Critical Status: Open (was: Triage Needed) > GossipTest hangs until timeout, then fails. > --- > > Key: CASSANDRA-16248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16248 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown, Messaging/Internode, > Test/dtest/java >Reporter: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0-beta4 > > > A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} > * There seems to have been a merge/commit race between CASSANDRA-16146 > ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e]) > and CASSANDRA-15935 > ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]). > The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, > but the latter changed the method signature, so this never actually gets > injected. This causes a latch in the test not to be counted down and it hangs > until timeout. > * After fixing the test code, it still hangs due to changes to > {{server_encryption_options}} initialization in CASSANDRA-16144 > ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]). > It appears to be causing an incorrect keystore location to be specified, > which causes instance startup to fail, again leading to the test hanging > until it times out. I don't have the cycles to dig into this further right > now, but reverting that commit (and making the test fix above) restores the > green bar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.
[ https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-16248: Description: A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} * There seems to have been a merge/commit race between CASSANDRA-16146 ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e]) and CASSANDRA-15935 ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]). The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, but the latter changed the method signature, so this never actually gets injected. This causes a latch in the test not to be counted down and it hangs until timeout. * After fixing the test code, it still hangs due to changes to {{server_encryption_options}} initialization in CASSANDRA-16144 ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]). It appears to be causing an incorrect keystore location to be specified, which causes instance startup to fail, again leading to the test hanging until it times out. I don't have the cycles to dig into this further right now, but reverting that commit (and making the test fix above) restores the green bar. was: A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} * There seems to have been a merge/commit race between CASSANDRA-16146 ({{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e) and CASSANDRA-15935 ({{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab)). The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, but the latter changed the method signature, so this never actually gets injected. This causes a latch in the test not to be counted down and it hangs until timeout. * After fixing the test code, it still hangs due to changes to {{server_encryption_options}} initialization in CASSANDRA-166144 ({{f293376a|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c). It appears to be causing an incorrect keystore location to be specified, which causes instance startup to fail, again leading to the test hanging until it times out. I don't have the cycles to dig into this further right now, but reverting that commit (and making the test fix above) restores the green bar. > GossipTest hangs until timeout, then fails. > --- > > Key: CASSANDRA-16248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16248 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown, Messaging/Internode, > Test/dtest/java >Reporter: Sam Tunnicliffe >Priority: Normal > > A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} > * There seems to have been a merge/commit race between CASSANDRA-16146 > ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e]) > and CASSANDRA-15935 > ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]). > The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, > but the latter changed the method signature, so this never actually gets > injected. This causes a latch in the test not to be counted down and it hangs > until timeout. > * After fixing the test code, it still hangs due to changes to > {{server_encryption_options}} initialization in CASSANDRA-16144 > ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]). > It appears to be causing an incorrect keystore location to be specified, > which causes instance startup to fail, again leading to the test hanging > until it times out. I don't have the cycles to dig into this further right > now, but reverting that commit (and making the test fix above) restores the > green bar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.
Sam Tunnicliffe created CASSANDRA-16248: --- Summary: GossipTest hangs until timeout, then fails. Key: CASSANDRA-16248 URL: https://issues.apache.org/jira/browse/CASSANDRA-16248 Project: Cassandra Issue Type: Bug Components: Local/Startup and Shutdown, Messaging/Internode, Test/dtest/java Reporter: Sam Tunnicliffe A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}} * There seems to have been a merge/commit race between CASSANDRA-16146 ({{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e) and CASSANDRA-15935 ({{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab)). The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, but the latter changed the method signature, so this never actually gets injected. This causes a latch in the test not to be counted down and it hangs until timeout. * After fixing the test code, it still hangs due to changes to {{server_encryption_options}} initialization in CASSANDRA-166144 ({{f293376a|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c). It appears to be causing an incorrect keystore location to be specified, which causes instance startup to fail, again leading to the test hanging until it times out. I don't have the cycles to dig into this further right now, but reverting that commit (and making the test fix above) restores the green bar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227354#comment-17227354 ] Benedict Elliott Smith commented on CASSANDRA-12126: To some extent that is all up for debate. My plan so far has been to avoid interfering with 4.0 release, so I have been working towards targeting 4.x. This would also permit time to produce documentation and reach out to the list to begin the slow handshake to see if the project wants the work, and in what manner. However, the main body of work is essentially complete, so it is possible that this could be brought forwards if there were appetite. As to target version, it would be possible to target 3.0+, at least for a portion of the work that would encompass this issue, without a great deal of work. The project's appetite would be the main decider, as it's a significant body of work. The main contribution would be a parallel implementation of the same underlying Paxos algorithm, that is able to run concurrently alongside it (supporting live migration), but with several latency improvements, as well as several fixes to correctness. Alongside this is related work to guarantee linearizability across range movements in the form of modifications to repair, bootstrap, replace etc. Related to this work are several patches to wider Cassandra to support automated verification of its correctness, by permitting deterministic simulation of Cassandra clusters with adversarial ordering of events. We have so far simulated billions of transactions to verify its linearizability. I anticipate that this work will be useful for the project's overall goal of improving quality, but they are themselves quite significant and will require their own discussions around timeline and scope. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227347#comment-17227347 ] Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 11:29 AM: --- Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should be outside of the data directories and the table name should be without the TableID. In this case, keyspaces and tables with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested. {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235. The patch should add some tests for those scenarios. We should also probably test a {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. was (Author: blerer): Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should be outside of the data directories and the table name should be without the TableID. In this case, keyspaces and tables with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235. The patch should add some tests for those scenarios. We should also probably test a {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubsc
[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227347#comment-17227347 ] Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 11:30 AM: --- Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should be outside of the data directories and the table name should be without the TableID. In this case, keyspaces and tables with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested. {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235). The patch should add some tests for those scenarios. We should also probably test {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. was (Author: blerer): Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should be outside of the data directories and the table name should be without the TableID. In this case, keyspaces and tables with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested. {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235. The patch should add some tests for those scenarios. We should also probably test a {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubsc
[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227347#comment-17227347 ] Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 11:29 AM: --- Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should be outside of the data directories and the table name should be without the TableID. In this case, keyspaces and tables with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235. The patch should add some tests for those scenarios. We should also probably test a {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. was (Author: blerer): Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should outside of the data directories and the table name should be without the TableID. In this case, keyspace and table with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235. The patch should add some tests for those scenarios. We should also probably test a {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@c
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227347#comment-17227347 ] Benjamin Lerer commented on CASSANDRA-14013: Trying to summarize the problem: # SSTables used within the C* data directories should be within the data directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and the table directories should be in the form {{-}}. In this case the problem come mainly from keyspace being named {{backups}} or {{snapshots}}. # Files coming from SSTableLoader should outside of the data directories and the table name should be without the TableID. In this case, keyspace and table with a {{backups}} or {{snapshots}} name will be having issues. To be honest, the documentation I found on the SSTableloader is pretty confusing and I imagine that some people might try to use it directly on the C* data directories in which case the table directory will contains the TableID. This case is somehow the same than the {{1.}} above. [~stefan.miklosovic] As you pointed out there are several scenario that we never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested by CASSANDRA-16235. The patch should add some tests for those scenarios. We should also probably test a {{nodetool refresh}} with a {{snapshots}} or {{backups}} keyspace. Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227268#comment-17227268 ] Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 9:23 AM: -- {quote}That is not true{quote} You are right, I should open my eyes properly ;-) Then unless I am mistaken (again ;-)), we cannot rely on {{DatabaseDescriptor.getAllDataFileLocations()}} as those directories will not be the same as the one in which is stored the input directory for the SSTableLoader. was (Author: blerer): {quote}That is not true{quote} You are right, I should open my eyes properly ;-) Then unless I am mistaken (again ;-)), you cannot rely on {{DatabaseDescriptor.getAllDataFileLocations()}} as those directories will not be the same as the one in which is stored the input directory for the SSTableLoader. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227268#comment-17227268 ] Benjamin Lerer commented on CASSANDRA-14013: {quote}That is not true{quote} You are right, I should open my eyes properly ;-) Then unless I am mistaken (again ;-)), you cannot rely on {{DatabaseDescriptor.getAllDataFileLocations()}} as those directories will not be the same as the one in which is stored the input directory for the SSTableLoader. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Gregor Uhlenheuer >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14925) DecimalSerializer.toString() can be used as OOM attack
[ https://issues.apache.org/jira/browse/CASSANDRA-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227257#comment-17227257 ] Jacek Lewandowski commented on CASSANDRA-14925: --- When is it going to be merged? > DecimalSerializer.toString() can be used as OOM attack > --- > > Key: CASSANDRA-14925 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14925 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Low > > Currently, in {{DecimalSerializer.toString(value)}}, it uses > {{BigDecimal.toPlainString()}} which generates huge string for large scale > values. > > {code:java} > BigDecimal d = new BigDecimal("1e-" + (Integer.MAX_VALUE - 6)); > d.toPlainString(); // oom{code} > > Propose to use {{BigDecimal.toString()}} when scale is larger than 100 which > is configurable via {{-Dcassandra.decimal.maxscaleforstring}} > > | patch | circle-ci | > |[trunk|https://github.com/jasonstack/cassandra/commits/decimal-tostring-trunk]|[unit|https://circleci.com/gh/jasonstack/cassandra/751?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link]| > The code should apply cleanly to 3.0+. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16192) Add more tests to cover compaction metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227243#comment-17227243 ] Mohamed Zafraan commented on CASSANDRA-16192: - [~blerer] Sorry. Must have done so by accident. > Add more tests to cover compaction metrics > -- > > Key: CASSANDRA-16192 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16192 > Project: Cassandra > Issue Type: Improvement > Components: Test/unit >Reporter: Benjamin Lerer >Assignee: Mohamed Zafraan >Priority: Normal > Fix For: 4.0-beta > > Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch > > > Some compaction metrics do not seems to be tested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16192) Add more tests to cover compaction metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-16192: --- Status: Patch Available (was: Review In Progress) > Add more tests to cover compaction metrics > -- > > Key: CASSANDRA-16192 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16192 > Project: Cassandra > Issue Type: Improvement > Components: Test/unit >Reporter: Benjamin Lerer >Assignee: Mohamed Zafraan >Priority: Normal > Fix For: 4.0-beta > > Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch > > > Some compaction metrics do not seems to be tested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16192) Add more tests to cover compaction metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227234#comment-17227234 ] Benjamin Lerer commented on CASSANDRA-16192: [~mohamed_zafraan] The reviewers for a patch should be different persons that the ones that created the patch. By consequence you cannot put yourself as reviewer. :-) > Add more tests to cover compaction metrics > -- > > Key: CASSANDRA-16192 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16192 > Project: Cassandra > Issue Type: Improvement > Components: Test/unit >Reporter: Benjamin Lerer >Assignee: Mohamed Zafraan >Priority: Normal > Fix For: 4.0-beta > > Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch > > > Some compaction metrics do not seems to be tested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16192) Add more tests to cover compaction metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-16192: --- Reviewers: Adam Holmberg (was: Adam Holmberg, Mohamed Zafraan) > Add more tests to cover compaction metrics > -- > > Key: CASSANDRA-16192 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16192 > Project: Cassandra > Issue Type: Improvement > Components: Test/unit >Reporter: Benjamin Lerer >Assignee: Mohamed Zafraan >Priority: Normal > Fix For: 4.0-beta > > Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch > > > Some compaction metrics do not seems to be tested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org