[jira] [Updated] (CASSANDRA-19409) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.*
[ https://issues.apache.org/jira/browse/CASSANDRA-19409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-19409: Fix Version/s: 4.0.13 4.1.5 5.x Since Version: 5.0 Source Control Link: https://github.com/apache/cassandra-dtest/commit/aec94d67f63fcb62ec02ec448402b6fec8fdc9a9 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.* > - > > Key: CASSANDRA-19409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19409 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0.13, 4.1.5, 5.0-rc, 5.x > > > Failing in Jenkins: > * > [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] > * > [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] > * > [dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test/TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade/] > * > [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade/] > * > [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-dtest) branch trunk updated: Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.*
This is an automated email from the ASF dual-hosted git repository. bereng pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git The following commit(s) were added to refs/heads/trunk by this push: new aec94d67 Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.* aec94d67 is described below commit aec94d67f63fcb62ec02ec448402b6fec8fdc9a9 Author: Bereng AuthorDate: Thu Feb 22 10:44:43 2024 +0100 Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.* patch by Berenguer Blasi; reviewed by Ekaterina Dimitrova for CASSANDRA-19409 Co-authored-by: Berenguer Blasi Co-authored-by: Ekaterina Dimitrova --- pytest.ini | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/pytest.ini b/pytest.ini index 6054229a..79d18eca 100644 --- a/pytest.ini +++ b/pytest.ini @@ -1,10 +1,9 @@ [pytest] -addopts = --show-capture=stdout +addopts = --show-capture=stdout --timeout=900 python_files = test_*.py *_test.py *_tests.py junit_suite_name = Cassandra dtests log_level = INFO log_format = %(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s -timeout = 900 markers = since vnodes - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch asf-staging updated (e5f98fc3b -> 4edcc0e8c)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard e5f98fc3b generate docs for fd550e9c new 4edcc0e8c generate docs for fd550e9c This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (e5f98fc3b) \ N -- N -- N refs/heads/asf-staging (4edcc0e8c) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: content/search-index.js | 2 +- site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes 2 files changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820913#comment-17820913 ] Dipietro Salvatore commented on CASSANDRA-19429: let me know when you have a PR ready for Cassandra 4 that I can test it > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19442: --- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/cf6de14d5b96ea173d6a1b2dad9bb64d563df06c Resolution: Fixed Status: Resolved (was: Ready to Commit) > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > Fix For: NA > > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19442 Update access of ClearSnapshotStrategy [cassandra-analytics]
frankgh commented on PR #42: URL: https://github.com/apache/cassandra-analytics/pull/42#issuecomment-1965475454 Closed via https://github.com/apache/cassandra-analytics/commit/cf6de14d5b96ea173d6a1b2dad9bb64d563df06c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19442 Update access of ClearSnapshotStrategy [cassandra-analytics]
frankgh closed pull request #42: CASSANDRA-19442 Update access of ClearSnapshotStrategy URL: https://github.com/apache/cassandra-analytics/pull/42 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19442: --- Status: Ready to Commit (was: Review In Progress) > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-analytics) branch trunk updated: CASSANDRA-19442 Update access of ClearSnapshotStrategy
This is an automated email from the ASF dual-hosted git repository. frankgh pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-analytics.git The following commit(s) were added to refs/heads/trunk by this push: new cf6de14 CASSANDRA-19442 Update access of ClearSnapshotStrategy cf6de14 is described below commit cf6de14d5b96ea173d6a1b2dad9bb64d563df06c Author: Saranya Krishnakumar AuthorDate: Mon Feb 19 11:27:35 2024 -0800 CASSANDRA-19442 Update access of ClearSnapshotStrategy Patch by Saranya Krishnakumar; Reviewed by Yifan Cai, Francisco Guerrero for CASSANDRA-19442 --- CHANGES.txt| 1 + .../apache/cassandra/spark/data/ClientConfig.java | 233 +++-- 2 files changed, 122 insertions(+), 112 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 1472baf..8215822 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 1.0.0 + * Update access of ClearSnapshotStrategy (CASSANDRA-19442) * Bulk reader fails to produce a row when regular column values are null (CASSANDRA-19411) * Use XXHash32 for digest calculation of SSTables (CASSANDRA-19369) * Startup Validation Failures when Checking Sidecar Connectivity (CASSANDRA-19377) diff --git a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/ClientConfig.java b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/ClientConfig.java index 5330f1b..dd9675c 100644 --- a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/ClientConfig.java +++ b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/ClientConfig.java @@ -39,9 +39,9 @@ import org.jetbrains.annotations.Nullable; import static org.apache.cassandra.spark.data.CassandraDataLayer.aliasLastModifiedTimestamp; -public final class ClientConfig +public class ClientConfig { -private static final Logger LOGGER = LoggerFactory.getLogger(ClientConfig.class); +protected final Logger logger = LoggerFactory.getLogger(this.getClass()); public static final String SIDECAR_INSTANCES = "sidecar_instances"; public static final String KEYSPACE_KEY = "keyspace"; @@ -78,34 +78,34 @@ public final class ClientConfig public static final String QUOTE_IDENTIFIERS = "quote_identifiers"; public static final int DEFAULT_SIDECAR_PORT = 9043; -private final String sidecarInstances; +protected String sidecarInstances; @Nullable -private final String keyspace; +protected String keyspace; @Nullable -private final String table; -private final String snapshotName; -private final String datacenter; -private final boolean createSnapshot; -private final boolean clearSnapshot; -private final ClearSnapshotStrategy clearSnapshotStrategy; -private final int defaultParallelism; -private final int numCores; -private final ConsistencyLevel consistencyLevel; -private final Map bigNumberConfigMap; -private final boolean enableStats; -private final boolean readIndexOffset; -private final String sizing; -private final int maxPartitionSize; -private final boolean useIncrementalRepair; -private final List requestedFeatures; -private final String lastModifiedTimestampField; -private final Boolean enableExpansionShrinkCheck; -private final int sidecarPort; -private final boolean quoteIdentifiers; - -private ClientConfig(Map options) +protected String table; +protected String snapshotName; +protected String datacenter; +protected boolean createSnapshot; +protected boolean clearSnapshot; +protected ClearSnapshotStrategy clearSnapshotStrategy; +protected int defaultParallelism; +protected int numCores; +protected ConsistencyLevel consistencyLevel; +protected Map bigNumberConfigMap; +protected boolean enableStats; +protected boolean readIndexOffset; +protected String sizing; +protected int maxPartitionSize; +protected boolean useIncrementalRepair; +protected List requestedFeatures; +protected String lastModifiedTimestampField; +protected Boolean enableExpansionShrinkCheck; +protected int sidecarPort; +protected boolean quoteIdentifiers; + +protected ClientConfig(Map options) { -this.sidecarInstances = MapUtils.getOrThrow(options, SIDECAR_INSTANCES, "sidecar_instances"); +this.sidecarInstances = parseSidecarInstances(options); this.keyspace = MapUtils.getOrThrow(options, KEYSPACE_KEY, "keyspace"); this.table = MapUtils.getOrThrow(options, TABLE_KEY, "table"); this.snapshotName = MapUtils.getOrDefault(options, SNAPSHOT_NAME_KEY, "sbr_" + UUID.randomUUID().toString().replace("-", "")); @@ -135,48 +135,24 @@ public final class ClientConfig this.quoteIdentifiers = MapUtils.getBoolean(options, QUOTE_IDENTIFIERS, false); } +protected String
[jira] [Commented] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI
[ https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820897#comment-17820897 ] Ekaterina Dimitrova commented on CASSANDRA-19222: - The provided link points to 5.1 branch. The failure is a timeout, and the artifacts are expired already. [~jlewandowski], have you seen it on 5.0, too? > Leak - Strong self-ref loop detected in BTI > --- > > Key: CASSANDRA-19222 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19222 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Jacek Lewandowski >Priority: Normal > Fix For: 5.0-rc, 5.x > > > https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests > {noformat} > ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong > self-ref loop detected > [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big > private java.util.List > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList > transient java.lang.Object[] > java.util.ArrayList.elementData-[Ljava.lang.Object; > transient java.lang.Object[] > java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle > final org.apache.cassandra.utils.concurrent.Ref > org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref > final org.apache.cassandra.utils.concurrent.Ref$State > org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State > final org.apache.cassandra.utils.concurrent.Ref$GlobalState > org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState > private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy > org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup > final java.util.Optional > org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional > private final java.lang.Object > java.util.Optional.value-org.apache.cassandra.cache.ChunkCache > private final org.apache.cassandra.utils.memory.BufferPool > org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool > private final java.util.Set > org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap > private final java.util.Map > java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap > private final java.util.Map > java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef > private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks > org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef.chunks-org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks > private org.apache.cassandra.utils.memory.BufferPool$Chunk > org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.chunk0-org.apache.cassandra.utils.memory.BufferPool$Chunk > private volatile org.apache.cassandra.utils.memory.BufferPool$LocalPool > org.apache.cassandra.utils.memory.BufferPool$Chunk.owner-org.apache.cassandra.utils.memory.BufferPool$LocalPool > private final java.lang.Thread > org.apache.cassandra.utils.memory.BufferPool$LocalPool.owningThread-io.netty.util.concurrent.FastThreadLocalThread > private java.lang.Runnable > java.lang.Thread.target-io.netty.util.concurrent.FastThreadLocalRunnable > private final java.lang.Runnable > io.netty.util.concurrent.FastThreadLocalRunnable.runnable-java.util.concurrent.ThreadPoolExecutor$Worker > final java.util.concurrent.ThreadPoolExecutor > java.util.concurrent.ThreadPoolExecutor$Worker.this$0-org.apache.cassandra.concurrent.ScheduledThreadPoolExecutorPlus > private final java.util.concurrent.BlockingQueue > java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue > private final java.util.concurrent.BlockingQueue > java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask > private java.util.concurrent.Callable > java.util.concurrent.FutureTask.callable-java.util.concurrent.Executors$RunnableAdapter > private final java.lang.Runnable > java.util.concurrent.Executors$RunnableAdapter.task-org.apache.cassandra.concurrent.ExecutionFailure$1 > final java.lang.Runnable > org.apache.cassandra.concurrent.ExecutionFailure$1.val$wrap-org.apache.cassandra.hints.HintsService$$Lambda$1142/0x000801576aa0 > private final org.apache.cassandra.hints.HintsService >
[jira] [Commented] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820894#comment-17820894 ] Yifan Cai commented on CASSANDRA-19442: --- +1 > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19444) AccordRepairJob should be async like CassandraRepairJob
Ariel Weisberg created CASSANDRA-19444: -- Summary: AccordRepairJob should be async like CassandraRepairJob Key: CASSANDRA-19444 URL: https://issues.apache.org/jira/browse/CASSANDRA-19444 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg The thread that manages repairs needs to be available and not block. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Since Version: 5.0-alpha1 Source Control Link: https://github.com/apache/cassandra/commit/1d7bae3697b97e64de2c2b958427ef86a1b17731 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed as https://github.com/apache/cassandra/commit/1d7bae3697b97e64de2c2b958427ef86a1b17731 > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated (81e253b166 -> ce963bc991)
This is an automated email from the ASF dual-hosted git repository. maedhroz pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git from 81e253b166 Merge branch 'cassandra-5.0' into trunk new 1d7bae3697 Record latencies for SAI post-filtering reads against local storage new ce963bc991 Merge branch 'cassandra-5.0' into trunk The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + .../index/sai/metrics/TableQueryMetrics.java | 4 +++ .../cassandra/index/sai/plan/QueryController.java | 14 + .../sai/plan/StorageAttachedIndexSearcher.java | 33 +- .../index/sai/metrics/QueryMetricsTest.java| 6 +++- .../cassandra/index/sai/plan/OperationTest.java| 9 ++ 6 files changed, 33 insertions(+), 34 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-5.0 updated: Record latencies for SAI post-filtering reads against local storage
This is an automated email from the ASF dual-hosted git repository. maedhroz pushed a commit to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-5.0 by this push: new 1d7bae3697 Record latencies for SAI post-filtering reads against local storage 1d7bae3697 is described below commit 1d7bae3697b97e64de2c2b958427ef86a1b17731 Author: Caleb Rackliffe AuthorDate: Thu Feb 22 15:08:23 2024 -0600 Record latencies for SAI post-filtering reads against local storage patch by Caleb Rackliffe; reviewed by Mike Adamson for CASSANDRA-18940 --- CHANGES.txt| 1 + .../index/sai/metrics/TableQueryMetrics.java | 4 +++ .../cassandra/index/sai/plan/QueryController.java | 14 + .../sai/plan/StorageAttachedIndexSearcher.java | 33 +- .../index/sai/metrics/QueryMetricsTest.java| 6 +++- .../cassandra/index/sai/plan/OperationTest.java| 9 ++ 6 files changed, 33 insertions(+), 34 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 20e0c6e959..4f35041497 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 5.0-beta2 + * Record latencies for SAI post-filtering reads against local storage (CASSANDRA-18940) * Fix VectorMemoryIndex#update logic to compare vectors. Fix Index view (CASSANDRA-19168) * Deprecate native_transport_port_ssl (CASSANDRA-19392) * Update packaging shell includes (CASSANDRA-19283) diff --git a/src/java/org/apache/cassandra/index/sai/metrics/TableQueryMetrics.java b/src/java/org/apache/cassandra/index/sai/metrics/TableQueryMetrics.java index 7154df241d..987c70ef75 100644 --- a/src/java/org/apache/cassandra/index/sai/metrics/TableQueryMetrics.java +++ b/src/java/org/apache/cassandra/index/sai/metrics/TableQueryMetrics.java @@ -32,6 +32,8 @@ public class TableQueryMetrics extends AbstractMetrics { public static final String TABLE_QUERY_METRIC_TYPE = "TableQueryMetrics"; +public final Timer postFilteringReadLatency; + private final PerQueryMetrics perQueryMetrics; private final Counter totalQueryTimeouts; @@ -45,6 +47,8 @@ public class TableQueryMetrics extends AbstractMetrics perQueryMetrics = new PerQueryMetrics(table); +postFilteringReadLatency = Metrics.timer(createMetricName("PostFilteringReadLatency")); + totalPartitionReads = Metrics.counter(createMetricName("TotalPartitionReads")); totalRowsFiltered = Metrics.counter(createMetricName("TotalRowsFiltered")); totalQueriesCompleted = Metrics.counter(createMetricName("TotalQueriesCompleted")); diff --git a/src/java/org/apache/cassandra/index/sai/plan/QueryController.java b/src/java/org/apache/cassandra/index/sai/plan/QueryController.java index 597e339aaa..d844304812 100644 --- a/src/java/org/apache/cassandra/index/sai/plan/QueryController.java +++ b/src/java/org/apache/cassandra/index/sai/plan/QueryController.java @@ -57,7 +57,6 @@ import org.apache.cassandra.index.sai.iterators.KeyRangeIntersectionIterator; import org.apache.cassandra.index.sai.iterators.KeyRangeIterator; import org.apache.cassandra.index.sai.iterators.KeyRangeOrderingIterator; import org.apache.cassandra.index.sai.iterators.KeyRangeUnionIterator; -import org.apache.cassandra.index.sai.metrics.TableQueryMetrics; import org.apache.cassandra.index.sai.utils.PrimaryKey; import org.apache.cassandra.net.ParamType; import org.apache.cassandra.schema.TableMetadata; @@ -73,7 +72,6 @@ public class QueryController private final ColumnFamilyStore cfs; private final ReadCommand command; private final QueryContext queryContext; -private final TableQueryMetrics tableQueryMetrics; private final RowFilter filterOperation; private final List ranges; private final AbstractBounds mergeRange; @@ -85,13 +83,11 @@ public class QueryController public QueryController(ColumnFamilyStore cfs, ReadCommand command, RowFilter filterOperation, - QueryContext queryContext, - TableQueryMetrics tableQueryMetrics) + QueryContext queryContext) { this.cfs = cfs; this.command = command; this.queryContext = queryContext; -this.tableQueryMetrics = tableQueryMetrics; this.filterOperation = filterOperation; this.ranges = dataRanges(command); DataRange first = ranges.get(0); @@ -249,14 +245,6 @@ public class QueryController return key.kind() == PrimaryKey.Kind.WIDE && !command.clusteringIndexFilter(key.partitionKey()).selects(key.clustering()); } -/** - * Used to release all resources and record metrics when query finishes. - */ -public void finish() -{ -if (tableQueryMetrics != null) tableQueryMetrics.record(queryContext); -} -
(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk
This is an automated email from the ASF dual-hosted git repository. maedhroz pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit ce963bc991506190b1092b8190e9c42a9bfb6621 Merge: 81e253b166 1d7bae3697 Author: Caleb Rackliffe AuthorDate: Mon Feb 26 15:28:25 2024 -0600 Merge branch 'cassandra-5.0' into trunk * cassandra-5.0: Record latencies for SAI post-filtering reads against local storage CHANGES.txt| 1 + .../index/sai/metrics/TableQueryMetrics.java | 4 +++ .../cassandra/index/sai/plan/QueryController.java | 14 + .../sai/plan/StorageAttachedIndexSearcher.java | 33 +- .../index/sai/metrics/QueryMetricsTest.java| 6 +++- .../cassandra/index/sai/plan/OperationTest.java| 9 ++ 6 files changed, 33 insertions(+), 34 deletions(-) diff --cc CHANGES.txt index eecbc9e50e,4f35041497..f9f28ba23d --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,24 -1,5 +1,25 @@@ -5.0-beta2 +5.1 + * Refactor cqlshmain global constants (CASSANDRA-19201) + * Remove native_transport_port_ssl (CASSANDRA-19397) + * Make nodetool reconfigurecms sync by default and add --cancel to be able to cancel ongoing reconfigurations (CASSANDRA-19216) + * Expose auth mode in system_views.clients, nodetool clientstats, metrics (CASSANDRA-19366) + * Remove sealed_periods and last_sealed_period tables (CASSANDRA-19189) + * Improve setup and initialisation of LocalLog/LogSpec (CASSANDRA-19271) + * Refactor structure of caching metrics and expose auth cache metrics via JMX (CASSANDRA-17062) + * Allow CQL client certificate authentication to work without sending an AUTHENTICATE request (CASSANDRA-18857) + * Extend nodetool tpstats and system_views.thread_pools with detailed pool parameters (CASSANDRA-19289) + * Remove dependency on Sigar in favor of OSHI (CASSANDRA-16565) + * Simplify the bind marker and Term logic (CASSANDRA-18813) + * Limit cassandra startup to supported JDKs, allow higher JDKs by setting CASSANDRA_JDK_UNSUPPORTED (CASSANDRA-18688) + * Standardize nodetool tablestats formatting of data units (CASSANDRA-19104) + * Make nodetool tablestats use number of significant digits for time and average values consistently (CASSANDRA-19015) + * Upgrade jackson to 2.15.3 and snakeyaml to 2.1 (CASSANDRA-18875) + * Transactional Cluster Metadata [CEP-21] (CASSANDRA-18330) + * Add ELAPSED command to cqlsh (CASSANDRA-18861) + * Add the ability to disable bulk loading of SSTables (CASSANDRA-18781) + * Clean up obsolete functions and simplify cql_version handling in cqlsh (CASSANDRA-18787) +Merged from 5.0: + * Record latencies for SAI post-filtering reads against local storage (CASSANDRA-18940) * Fix VectorMemoryIndex#update logic to compare vectors. Fix Index view (CASSANDRA-19168) * Deprecate native_transport_port_ssl (CASSANDRA-19392) * Update packaging shell includes (CASSANDRA-19283) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19442: --- Status: Review In Progress (was: Patch Available) > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820878#comment-17820878 ] Francisco Guerrero commented on CASSANDRA-19442: +1 Thanks for the patch > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saranya Krishnakumar updated CASSANDRA-19442: - Test and Documentation Plan: Tested with unit tests Status: Patch Available (was: Open) > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820876#comment-17820876 ] Saranya Krishnakumar commented on CASSANDRA-19442: -- Green CI [https://app.circleci.com/pipelines/github/sarankk/cassandra-analytics/52] > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819816#comment-17819816 ] Caleb Rackliffe edited comment on CASSANDRA-18940 at 2/26/24 9:17 PM: -- [5.0 patch|https://github.com/apache/cassandra/pull/3122] [trunk patch|https://github.com/apache/cassandra/pull/3138] Nothing interesting in the CI results. (Upgrade tests in the attached results are pretty irrelevant, but failing due to an environment-related issue on my side.) was (Author: maedhroz): [5.0 patch|https://github.com/apache/cassandra/pull/3122] [trunk patch|https://github.com/apache/cassandra/pull/3138] > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Fix Version/s: 5.0-rc (was: 5.0.x) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Attachment: ci_summary.html result_details.tar.gz > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819816#comment-17819816 ] Caleb Rackliffe edited comment on CASSANDRA-18940 at 2/26/24 9:15 PM: -- [5.0 patch|https://github.com/apache/cassandra/pull/3122] [trunk patch|https://github.com/apache/cassandra/pull/3138] was (Author: maedhroz): [5.0 patch|https://github.com/apache/cassandra/pull/3122] I'll run CI once CASSANDRA-19168 merges... > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > Time Spent: 1.5h > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19404) Unexpected NullPointerException in ANN+WHERE when adding rows in another partition
[ https://issues.apache.org/jira/browse/CASSANDRA-19404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820870#comment-17820870 ] Ekaterina Dimitrova commented on CASSANDRA-19404: - trunk CI results: * junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit - CASSANDRA-18851 * in-jvm tests crash - CASSANDRA-19239 * org.apache.cassandra.audit.AuditLoggerAuthTest - CASSANDRA-19443 I did not re-run the 5.0 CI as the changes to the test suggested during review did not change the test coverage or the patch. I just ensured the test still passes locally. If there is nothing else to be addressed, I can commit this tomorrow. > Unexpected NullPointerException in ANN+WHERE when adding rows in another > partition > -- > > Key: CASSANDRA-19404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19404 > Project: Cassandra > Issue Type: Bug > Components: Feature/Vector Search >Reporter: Stefano Lottini >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > * *Bug observed on the Docker image 5.0-beta1* > * *Bug also observed on latest head of Cassandra repo (as of 2024-02-15)* > * _*(working fine on vsearch branch of datastax/cassandra, commit hash > 80c2f8b9ad5b89efee0645977a5ca53943717c0d)*_ > Summary: A query with _ann + where clause on a map + where clause on the > partition key_ starts erroring once there are other partitions in the table. > There are three SELECT statements in the repro minimal code below - the third > is where the error is triggered. > {code:java} > // reproduced with Dockerized Cassandra 5.0-beta1 on 2024-02-15 > / > // SCHEMA > / > CREATE TABLE ks.v_table ( > pk int, > row_v vector, > metadata map, > PRIMARY KEY (pk) > ); > CREATE CUSTOM INDEX v_md > ON ks.v_table (entries(metadata)) > USING 'StorageAttachedIndex'; > CREATE CUSTOM INDEX v_idx > ON ks.v_table (row_v) > USING 'StorageAttachedIndex'; > / > // SELECT WORKS (empty table) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; > // > // ADD ONE ROW > // > INSERT INTO ks.v_table (pk, metadata, row_v) > VALUES > (0, {'map_k': 'map_v'}, [0.11, 0.19]); > / > // SELECT WORKS (table has queried partition) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; > // > // ADD ONE ROW (another partition) > // > INSERT INTO ks.v_table (pk, metadata, row_v) > VALUES > (10, {'map_k': 'map_v'}, [0.11, 0.19]); > / > // SELECT BREAKS (table gained another partition) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; {code} > The error has this appearance in CQL Console: > {code:java} > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures: UNKNOWN from > /172.17.0.2:7000" info={'consistency': 'ONE', 'required_responses': 1, > 'received_responses': 0, 'failures': 1, 'error_code_map': {'172.17.0.2': > '0x'}} {code} > And the Cassandra logs have this to say: > {code:java} > java.lang.NullPointerException: Cannot invoke > "org.apache.cassandra.index.sai.iterators.KeyRangeIterator.skipTo(org.apache.cassandra.index.sai.utils.PrimaryKey)" > because "this.nextIterator" is null {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Attachment: (was: draft_fix_for_SAI_post-filtering_reads_not_updating_local_table_metrics.patch) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. (I've > attached a patch that should apply cleanly to trunk, but there may be a > better way...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saranya Krishnakumar updated CASSANDRA-19442: - Change Category: Code Clarity Complexity: Low Hanging Fruit Component/s: Analytics Library Reviewers: Francisco Guerrero, Yifan Cai Assignee: Saranya Krishnakumar Status: Open (was: Triage Needed) Patch: [https://github.com/apache/cassandra-analytics/pull/42]. > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRA-19442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Want to update access of ClearSnapshotStrategy added to Cassandra Analytics > library to allow setting TTL for snapshots created by bulk reader. Currently > the access of the class is package private, for plugging in custom > implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Description: Once an SAI index finds matches (primary keys), it reads the associated rows and post-filters them to incorporate partial writes, tombstones, etc. However, those reads are not currently updating the local table latency metrics. It should be simple enough to attach a metrics recording transformation to the iterator produced by querying local storage. (was: Once an SAI index finds matches (primary keys), it reads the associated rows and post-filters them to incorporate partial writes, tombstones, etc. However, those reads are not currently updating the local table latency metrics. It should be simple enough to attach a metrics recording transformation to the iterator produced by querying local storage. (I've attached a patch that should apply cleanly to trunk, but there may be a better way...)) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19443) Test Failure: org.apache.cassandra.audit.AuditLoggerAuthTest
Ekaterina Dimitrova created CASSANDRA-19443: --- Summary: Test Failure: org.apache.cassandra.audit.AuditLoggerAuthTest Key: CASSANDRA-19443 URL: https://issues.apache.org/jira/browse/CASSANDRA-19443 Project: Cassandra Issue Type: Bug Reporter: Ekaterina Dimitrova The tests in this class are flaky for current trunk, as seen in this run: [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=AuditLoggerAuthTest-trunk] org.apache.cassandra.audit.AuditLoggerAuthTest.testUNAUTHORIZED_ATTEMPTAuditing-_jdk17 {code:java} junit.framework.AssertionFailedError: expected: but was: at org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) at org.apache.cassandra.audit.AuditLoggerAuthTest.testUNAUTHORIZED_ATTEMPTAuditing(AuditLoggerAuthTest.java:252) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} testCqlLoginAuditing-_jdk17 {code:java} junit.framework.AssertionFailedError: expected: but was: at org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) at org.apache.cassandra.audit.AuditLoggerAuthTest.testCqlLoginAuditing(AuditLoggerAuthTest.java:119) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} testCqlLISTROLESAuditing-_jdk17 {code:java} junit.framework.AssertionFailedError: expected: but was: at org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) at org.apache.cassandra.audit.AuditLoggerAuthTest.testCqlLISTROLESAuditing(AuditLoggerAuthTest.java:209) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} testCqlCreateRoleAuditing-_jdk17 {code:java} junit.framework.AssertionFailedError: expected: but was: at org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) at org.apache.cassandra.audit.AuditLoggerAuthTest.createTestRole(AuditLoggerAuthTest.java:424) at org.apache.cassandra.audit.AuditLoggerAuthTest.testCqlCreateRoleAuditing(AuditLoggerAuthTest.java:127) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19443) Test Failure: org.apache.cassandra.audit.AuditLoggerAuthTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19443: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Component/s: CI Discovered By: User Report Fix Version/s: 5.x Severity: Normal Status: Open (was: Triage Needed) > Test Failure: org.apache.cassandra.audit.AuditLoggerAuthTest > > > Key: CASSANDRA-19443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19443 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.x > > > The tests in this class are flaky for current trunk, as seen in this run: > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=AuditLoggerAuthTest-trunk] > org.apache.cassandra.audit.AuditLoggerAuthTest.testUNAUTHORIZED_ATTEMPTAuditing-_jdk17 > {code:java} > junit.framework.AssertionFailedError: expected: but > was: at > org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) > at > org.apache.cassandra.audit.AuditLoggerAuthTest.testUNAUTHORIZED_ATTEMPTAuditing(AuditLoggerAuthTest.java:252) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > testCqlLoginAuditing-_jdk17 > {code:java} > junit.framework.AssertionFailedError: expected: but > was: at > org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) > at > org.apache.cassandra.audit.AuditLoggerAuthTest.testCqlLoginAuditing(AuditLoggerAuthTest.java:119) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > testCqlLISTROLESAuditing-_jdk17 > {code:java} > junit.framework.AssertionFailedError: expected: but > was: at > org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) > at > org.apache.cassandra.audit.AuditLoggerAuthTest.testCqlLISTROLESAuditing(AuditLoggerAuthTest.java:209) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > testCqlCreateRoleAuditing-_jdk17 > {code:java} > junit.framework.AssertionFailedError: expected: but > was: at > org.apache.cassandra.audit.AuditLoggerAuthTest.executeWithCredentials(AuditLoggerAuthTest.java:374) > at > org.apache.cassandra.audit.AuditLoggerAuthTest.createTestRole(AuditLoggerAuthTest.java:424) > at > org.apache.cassandra.audit.AuditLoggerAuthTest.testCqlCreateRoleAuditing(AuditLoggerAuthTest.java:127) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19442) Update access of ClearSnapshotStrategy
Saranya Krishnakumar created CASSANDRA-19442: Summary: Update access of ClearSnapshotStrategy Key: CASSANDRA-19442 URL: https://issues.apache.org/jira/browse/CASSANDRA-19442 Project: Cassandra Issue Type: Task Reporter: Saranya Krishnakumar Want to update access of ClearSnapshotStrategy added to Cassandra Analytics library to allow setting TTL for snapshots created by bulk reader. Currently the access of the class is package private, for plugging in custom implementation, we need access outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-108) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRASC-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saranya Krishnakumar updated CASSANDRASC-108: - Resolution: Not A Problem Status: Resolved (was: Open) > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRASC-108 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-108 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Want to update access of ClearSnapshotStrategy added to allow setting TTL for > snapshots created by bulk reader. Currently the access of the class is > package private, for plugging in custom implementation, we need access > outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-108) Update access of ClearSnapshotStrategy
[ https://issues.apache.org/jira/browse/CASSANDRASC-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820867#comment-17820867 ] Saranya Krishnakumar commented on CASSANDRASC-108: -- This is JIRA related to Cassandra Analytics project. Hence closing it here > Update access of ClearSnapshotStrategy > -- > > Key: CASSANDRASC-108 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-108 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Want to update access of ClearSnapshotStrategy added to allow setting TTL for > snapshots created by bulk reader. Currently the access of the class is > package private, for plugging in custom implementation, we need access > outside the package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Status: Ready to Commit (was: Review In Progress) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: > draft_fix_for_SAI_post-filtering_reads_not_updating_local_table_metrics.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. (I've > attached a patch that should apply cleanly to trunk, but there may be a > better way...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Reviewers: Mike Adamson, Caleb Rackliffe (was: Mike Adamson) Mike Adamson, Caleb Rackliffe (was: Caleb Rackliffe, Mike Adamson) Status: Review In Progress (was: Patch Available) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: > draft_fix_for_SAI_post-filtering_reads_not_updating_local_table_metrics.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. (I've > attached a patch that should apply cleanly to trunk, but there may be a > better way...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18940) SAI post-filtering reads don't update local table latency metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18940: Reviewers: Mike Adamson (was: Caleb Rackliffe, Mike Adamson) > SAI post-filtering reads don't update local table latency metrics > - > > Key: CASSANDRA-18940 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18940 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index, Feature/SAI, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: > draft_fix_for_SAI_post-filtering_reads_not_updating_local_table_metrics.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Once an SAI index finds matches (primary keys), it reads the associated rows > and post-filters them to incorporate partial writes, tombstones, etc. > However, those reads are not currently updating the local table latency > metrics. It should be simple enough to attach a metrics recording > transformation to the iterator produced by querying local storage. (I've > attached a patch that should apply cleanly to trunk, but there may be a > better way...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19430) Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord
[ https://issues.apache.org/jira/browse/CASSANDRA-19430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820838#comment-17820838 ] Ariel Weisberg commented on CASSANDRA-19430: This is narrowly scoped to just attempting to send it to the right place. We will also need to address the fact that we could send it to the wrong system and need to retry. BRR is per partition so we at least don't need to break up mutations. > Read repair through Accord needs to only route the read repair through Accord > if the range is actually migrated/running on Accord > - > > Key: CASSANDRA-19430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19430 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > This is because the read repair will simply fail if Accord doesn't manage > that range. Not only does it need to be routed through Accord but if it races > with topology change it needs to retry and not surface an error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes
[ https://issues.apache.org/jira/browse/CASSANDRA-19436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820837#comment-17820837 ] Ariel Weisberg commented on CASSANDRA-19436: Also need to consider materialized views in the future which make additional mutations downstream. > When transitioning to Accord migration it's not safe to read immediately > using Accord due to concurrent non-serial writes > - > > Key: CASSANDRA-19436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19436 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > Concurrent writes at the same time that migration starts make it unsafe to > read from Accord because txn recovery will not be deterministic in the > presences of writes not done through Accord. > Adding key migration to non-serial writes could solve this by causing writes > not going through Accord to be rejected at nodes where key migration already > occurred. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19441) Blocking read repair needs to handle racing with Accord topology changes
Ariel Weisberg created CASSANDRA-19441: -- Summary: Blocking read repair needs to handle racing with Accord topology changes Key: CASSANDRA-19441 URL: https://issues.apache.org/jira/browse/CASSANDRA-19441 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Similar to other forms of writes it's possible for the read repair to end up on the wrong system and it should be rejected if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14572) Expose all table metrics in virtual table
[ https://issues.apache.org/jira/browse/CASSANDRA-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820831#comment-17820831 ] Maxim Muzafarov commented on CASSANDRA-14572: - These pull requests compare approaches with and without the annotation processor (the diff in the source code lines is not too big). With the annotation processor: https://github.com/apache/cassandra/pull/2958/files +3,646 −419 Without the annotation processor: https://github.com/apache/cassandra/pull/3137/files +3,884 −418 > Expose all table metrics in virtual table > - > > Key: CASSANDRA-14572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14572 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability, Observability/Metrics >Reporter: Chris Lohfink >Assignee: Maxim Muzafarov >Priority: Low > Labels: virtual-tables > Fix For: 5.x > > Attachments: flight_recording_1270017199_13.jfr, keyspayces_group > responses times.png, keyspayces_group summary.png, select keyspaces_group by > string prefix.png, select keyspaces_group compare with wo.png, select > keyspaces_group without value.png, systemv_views.metrics_dropped_message.png, > thread_pools benchmark.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > While we want a number of virtual tables to display data in a way thats great > and intuitive like in nodetool. There is also much for being able to expose > the metrics we have for tooling via CQL instead of JMX. This is more for the > tooling and adhoc advanced users who know exactly what they are looking for. > *Schema:* > Initial idea is to expose data via {{((keyspace, table), metric)}} with a > column for each metric value. Could also use a Map or UDT instead of the > column based that can be a bit more specific to each metric type. To that end > there can be a {{metric_type}} column and then a UDT for each metric type > filled in, or a single value with more of a Map style. I am > purposing the column type though as with {{ALLOW FILTERING}} it does allow > more extensive query capabilities. > *Implementations:* > * Use reflection to grab all the metrics from TableMetrics (see: > CASSANDRA-7622 impl). This is easiest and least abrasive towards new metric > implementors... but its reflection and a kinda a bad idea. > * Add a hook in TableMetrics to register with this virtual table when > registering > * Pull from the CassandraMetrics registery (either reporter or iterate > through metrics query on read of virtual table) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19431) Mutations need to split Accord/non-Accord mutations based on whether migration is completed
[ https://issues.apache.org/jira/browse/CASSANDRA-19431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820827#comment-17820827 ] Ariel Weisberg edited comment on CASSANDRA-19431 at 2/26/24 7:34 PM: - This one is narrowly scoped to just splitting and routing the mutations which can be done today without the linked issues, but still won't be correct until it can also detect mutations sent to the wrong place and then retry them without generating an error. was (Author: aweisberg): This one is narrowly scoped to just splitting and routing the mutations which can be done today without the others, but still won't be correct until it can also detect mutations sent to the wrong place and then retry them without generating an error. > Mutations need to split Accord/non-Accord mutations based on whether > migration is completed > --- > > Key: CASSANDRA-19431 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19431 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > If we don't do this then requests will fail if they span Accord and > non-Accord keys and tables. This breaks unlogged batches for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19431) Mutations need to split Accord/non-Accord mutations based on whether migration is completed
[ https://issues.apache.org/jira/browse/CASSANDRA-19431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820827#comment-17820827 ] Ariel Weisberg commented on CASSANDRA-19431: This one is narrowly scoped to just splitting and routing the mutations which can be done today without the others, but still won't be correct until it can also detect mutations sent to the wrong place and then retry them without generating an error. > Mutations need to split Accord/non-Accord mutations based on whether > migration is completed > --- > > Key: CASSANDRA-19431 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19431 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > If we don't do this then requests will fail if they span Accord and > non-Accord keys and tables. This breaks unlogged batches for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-14572-walker created (now 03a90b40a2)
This is an automated email from the ASF dual-hosted git repository. mmuzaf pushed a change to branch cassandra-14572-walker in repository https://gitbox.apache.org/repos/asf/cassandra.git at 03a90b40a2 CASSANDRA-14572 Expose all table metrics in virtual tables No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes
[ https://issues.apache.org/jira/browse/CASSANDRA-19436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820826#comment-17820826 ] Ariel Weisberg commented on CASSANDRA-19436: This issue covers detecting and generating the appropriate refusal to accept a read/write on the wrong system. Accord and Paxos already do this detection, but non-SERIAL operations do not. The linked issues cover this, but also the loop in read/write that receives the error and automatically retries instead of failing the query. > When transitioning to Accord migration it's not safe to read immediately > using Accord due to concurrent non-serial writes > - > > Key: CASSANDRA-19436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19436 > Project: Cassandra > Issue Type: Bug >Reporter: Ariel Weisberg >Priority: Normal > > Concurrent writes at the same time that migration starts make it unsafe to > read from Accord because txn recovery will not be deterministic in the > presences of writes not done through Accord. > Adding key migration to non-serial writes could solve this by causing writes > not going through Accord to be rejected at nodes where key migration already > occurred. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19440) Non-serial writes can race with Accord topology changes
Ariel Weisberg created CASSANDRA-19440: -- Summary: Non-serial writes can race with Accord topology changes Key: CASSANDRA-19440 URL: https://issues.apache.org/jira/browse/CASSANDRA-19440 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Accord and Paxos handle these, but non-SERIAL writes don't check for this condition and can't retry the portions of the write that failed on the correct system until the entire thing succeeds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820823#comment-17820823 ] Stefan Miklosovic commented on CASSANDRA-19429: --- yeah hard to believe ... I will prepare 4.0 patch and [~dipiets] can run it before / after too. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19439) Non-serial reads need to handle racing with Accord topology changes
Ariel Weisberg created CASSANDRA-19439: -- Summary: Non-serial reads need to handle racing with Accord topology changes Key: CASSANDRA-19439 URL: https://issues.apache.org/jira/browse/CASSANDRA-19439 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg A key or range read could end up being sent to Accord when it's not managed by Accord and we might not find out until the execution epoch is known. In reality I think this already throws an exception in Accord for a key we just need to propagate and handle the exception and retry with the new topology until we can complete the read. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820822#comment-17820822 ] Brandon Williams commented on CASSANDRA-19429: -- A 2-3x gain sounds so good I'm having a hard time believing we've left it on the table for years, at least in the case of 4.0. Speaking of which, it would be good to check performance there too since it has the same code and everything should line up. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19438) Accord barriers need to handle racing with topology changes
Ariel Weisberg created CASSANDRA-19438: -- Summary: Accord barriers need to handle racing with topology changes Key: CASSANDRA-19438 URL: https://issues.apache.org/jira/browse/CASSANDRA-19438 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Topology changes can result in the ranges sent to Accord including things not managed by Accord. It might be sufficient to have the range barriers automatically remove the unsupported subranges since that might be sufficient for the caller. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19437) Non-serial reads/range reads need to be done through Accord for Accord to support async apply/commit
Ariel Weisberg created CASSANDRA-19437: -- Summary: Non-serial reads/range reads need to be done through Accord for Accord to support async apply/commit Key: CASSANDRA-19437 URL: https://issues.apache.org/jira/browse/CASSANDRA-19437 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Currently they haven't been implemented. We have a path forward for it using ephemeral reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18851) Test failure: junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit
[ https://issues.apache.org/jira/browse/CASSANDRA-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820820#comment-17820820 ] Ekaterina Dimitrova commented on CASSANDRA-18851: - Seen here again: https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2656/workflows/d2a7dd60-7d1a-4781-b59e-5d5756034c83/jobs/57099/tests#failed-test-0 > Test failure: > junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit > --- > > Key: CASSANDRA-18851 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18851 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Berenguer Blasi >Priority: Normal > Fix For: 4.1.x, 5.x > > > See CASSANDRA-18707 Where this test is > [proven|https://issues.apache.org/jira/browse/CASSANDRA-18707?focusedCommentId=17761803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17761803] > flaky > failure in testLocalSerialLocalCommit: > {noformat} > junit.framework.AssertionFailedError: numWritten: 2 < 3 > at > org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialCommit(CASMultiDCTest.java:111) > at > org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit(CASMultiDCTest.java:121) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes
Ariel Weisberg created CASSANDRA-19436: -- Summary: When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes Key: CASSANDRA-19436 URL: https://issues.apache.org/jira/browse/CASSANDRA-19436 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Concurrent writes at the same time that migration starts make it unsafe to read from Accord because txn recovery will not be deterministic in the presences of writes not done through Accord. Adding key migration to non-serial writes could solve this by causing writes not going through Accord to be rejected at nodes where key migration already occurred. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19435) Hint delivery doesn't write through Accord
Ariel Weisberg created CASSANDRA-19435: -- Summary: Hint delivery doesn't write through Accord Key: CASSANDRA-19435 URL: https://issues.apache.org/jira/browse/CASSANDRA-19435 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Hint delivery doesn't write through Accord which would make txn recovery non-deterministic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19434) Batch log doesn't write through Accord during Accord migration
Ariel Weisberg created CASSANDRA-19434: -- Summary: Batch log doesn't write through Accord during Accord migration Key: CASSANDRA-19434 URL: https://issues.apache.org/jira/browse/CASSANDRA-19434 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg This can result in writes not through Accord occurring which makes txn recovery non-deterministic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19412) delete useless collection:backPressureHosts in the sendToHintedReplicas to improve write performance
[ https://issues.apache.org/jira/browse/CASSANDRA-19412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19412: - Fix Version/s: 4.0.13 (was: 4.0.12) > delete useless collection:backPressureHosts in the sendToHintedReplicas to > improve write performance > > > Key: CASSANDRA-19412 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19412 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Low > Fix For: 4.0.13, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Every normal write request will go through this > method({_}*sendToHintedReplicas*{_}). However, the list:backPressureHosts in > the method has never been used functionally. > The _*backpressure*_ was introduced by: > {code:java} > Support optional backpressure strategies at the coordinator > patch by Sergio Bossa; reviewed by Stefania Alborghetti for CASSANDRA-9318 > d43b9ce5 Sergio Bossa on 2016/9/19 at 10:42 AM {code} > {code:java} > public static void sendToHintedEndpoints(final Mutation mutation, > Iterable targets, > > AbstractWriteResponseHandler responseHandler, > String localDataCenter, > Stage stage) > throws OverloadedException > { > int targetsSize = Iterables.size(targets); > // this dc replicas: > Collection localDc = null; > // extra-datacenter replicas, grouped by dc > Map> dcGroups = null; > // only need to create a Message for non-local writes > MessageOut message = null; > boolean insertLocal = false; > ArrayList endpointsToHint = null; > List backPressureHosts = null; > for (InetAddress destination : targets) > { > checkHintOverload(destination); > if (FailureDetector.instance.isAlive(destination)) > { > if (canDoLocalRequest(destination)) > { > insertLocal = true; > } > else > { > // belongs on a different server > if (message == null) > message = mutation.createMessage(); > String dc = > DatabaseDescriptor.getEndpointSnitch().getDatacenter(destination); > // direct writes to local DC or old Cassandra versions > // (1.1 knows how to forward old-style String message > IDs; updated to int in 2.0) > if (localDataCenter.equals(dc)) > { > if (localDc == null) > localDc = new ArrayList<>(targetsSize); > localDc.add(destination); > } > else > { > Collection messages = (dcGroups != null) > ? dcGroups.get(dc) : null; > if (messages == null) > { > messages = new ArrayList<>(3); // most DCs will > have <= 3 replicas > if (dcGroups == null) > dcGroups = new HashMap<>(); > dcGroups.put(dc, messages); > } > messages.add(destination); > } > if (backPressureHosts == null) > backPressureHosts = new ArrayList<>(targetsSize); > backPressureHosts.add(destination); > } > } > else > { > if (shouldHint(destination)) > { > if (endpointsToHint == null) > endpointsToHint = new ArrayList<>(targetsSize); > endpointsToHint.add(destination); > } > } > } > if (backPressureHosts != null) > MessagingService.instance().applyBackPressure(backPressureHosts, > responseHandler.currentTimeout()); > if (endpointsToHint != null) > submitHint(mutation, endpointsToHint, responseHandler); > if (insertLocal) > performLocally(stage, Optional.of(mutation), mutation::apply, > responseHandler); > if (localDc != null) > { > for (InetAddress destination : localDc) >
[jira] [Updated] (CASSANDRA-19120) local consistencies may get timeout if blocking read repair is sending the read repair mutation to other DC
[ https://issues.apache.org/jira/browse/CASSANDRA-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19120: - Fix Version/s: 4.0.13 (was: 4.0.12) > local consistencies may get timeout if blocking read repair is sending the > read repair mutation to other DC > > > Key: CASSANDRA-19120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19120 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Runtian Liu >Assignee: Runtian Liu >Priority: Normal > Fix For: 4.0.13, 4.1.5, 5.0-beta2, 5.1 > > Attachments: image-2023-11-29-15-26-08-056.png, signature.asc > > Time Spent: 20m > Remaining Estimate: 0h > > For a two DCs cluster setup. When a new node is being added to DC1, for > blocking read repair triggered by local_quorum in DC1, it will require to > send read repair mutation to an extra node(1)(2). The selector for read > repair may select *ANY* node that has not been contacted before(3) instead of > selecting the DC1 nodes. If a node from DC2 is selected, this will cause 100% > timeout because of the bug described below: > When we initialized the latch(4) for blocking read repair, the shouldBlockOn > function will only return true for local nodes(5), the blockFor value will be > reduced if a local node doesn't require repair(6). The blockFor is same as > the number of read repair mutation sent out. But when the coordinator node > receives the response from the target nodes, the latch only count down for > nodes in same DC(7). The latch will wait till timeout and the read request > will timeout. > This can be reproduced if you have a constant load on a 3 + 3 cluster when > adding a node. If you have someway to trigger blocking read repair(maybe by > adding load using stress tool). If you use local_quorum consistency with a > constant read after write load in the same DC that you are adding node. You > will see read timeout issue from time to time because of the bug described > above > > I think for read repair when selecting the extra node to do repair, we should > prefer local nodes than the nodes from other region. Also, we need to fix the > latch part so even if we send mutation to the nodes in other DC, we don't get > a timeout. > (1)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L455] > (2)[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L183] > (3)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L458] > (4)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L96] > (5)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L71] > (6)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L88] > (7)[https://github.com/apache/cassandra/blob/cassandra-4.0.11/src/java/org/apache/cassandra/service/reads/repair/BlockingPartitionRepair.java#L113] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19416) fix "if" condition for mx4j tool in cassandra-env.sh
[ https://issues.apache.org/jira/browse/CASSANDRA-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19416: - Fix Version/s: 4.0.13 (was: 4.0.12) > fix "if" condition for mx4j tool in cassandra-env.sh > > > Key: CASSANDRA-19416 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19416 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.13, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > There is this in cassandra-env.sh > {code} > if [[ "$MX4J_ADDRESS" == \-Dmx4jaddress* ]]; then > {code} > (similar for port) > This is wrong for /bin/sh shell (our shebang in bin/cassandra) and this does > not work, probably in bash only, because /bin/sh does not understand what > "[[" is nor it understand what "==" is. > The reason this was never detected so far is that the logic will never come > there when MX4J_ADDRESS and / or MX4J_PORT is commented out couple lines > above. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19422) Fix Git repository links
[ https://issues.apache.org/jira/browse/CASSANDRA-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19422: - Fix Version/s: 4.0.13 (was: 4.0.12) > Fix Git repository links > > > Key: CASSANDRA-19422 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19422 > Project: Cassandra > Issue Type: Bug > Components: Build, CI >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.0.30, 3.11.17, 4.0.13, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I am creating an issue based on this PR (1) > (1) https://github.com/apache/cassandra/pull/3120 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18824) Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused missing replica
[ https://issues.apache.org/jira/browse/CASSANDRA-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-18824: - Fix Version/s: 4.0.13 (was: 4.0.12) > Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused > missing replica > --- > > Key: CASSANDRA-18824 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18824 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: Szymon Miezal >Assignee: Szymon Miezal >Priority: Normal > Fix For: 3.0.30, 3.11.17, 4.0.13, 4.1.4, 5.0-rc, 5.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Node decommission triggers data transfer to other nodes. While this transfer > is in progress, > receiving nodes temporarily hold token ranges in a pending state. However, > the cleanup process currently doesn't consider these pending ranges when > calculating token ownership. > As a consequence, data that is already stored in sstables gets inadvertently > cleaned up. > STR: > * Create two node cluster > * Create keyspace with RF=1 > * Insert sample data (assert data is available when querying both nodes) > * Start decommission process of node 1 > * Start running cleanup in a loop on node 2 until decommission on node 1 > finishes > * Verify of all rows are in the cluster - it will fail as the previous step > removed some of the rows > It seems that the cleanup process does not take into account the pending > ranges, it uses only the local ranges - > [https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466]. > There are two solutions to the problem. > One would be to change the cleanup process in a way that it start taking > pending ranges into account. Even thought it might sound tempting at first it > will require involving changes and a lot of testing effort. > Alternatively we could interrupt/prevent the cleanup process from running > when any pending range on a node is detected. That sounds like a reasonable > alternative to the problem and something that is relatively easy to implement. > The bug has been already fixed in 4.x with CASSANDRA-16418, the goal of this > ticket is to backport it to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19336) Repair causes out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-19336: Fix Version/s: 4.0.13 (was: 4.0.x) > Repair causes out of memory > --- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0.13, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19336) Repair causes out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820818#comment-17820818 ] Jeremiah Jordan edited comment on CASSANDRA-19336 at 2/26/24 6:58 PM: -- You are correct. This was not included in 4.0.12/4.1.5. Looks like the "latest versions" in JIRA were not updated after the last release. was (Author: jjordan): You are correct. This was not included in 4.0.12/4.1.5. Looks like the "latest versions" were not updated after the last release. > Repair causes out of memory > --- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0.x, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19336) Repair causes out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-19336: Fix Version/s: 4.0.x 4.1.5 (was: 4.0.1) > Repair causes out of memory > --- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0.x, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19336) Repair causes out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820818#comment-17820818 ] Jeremiah Jordan commented on CASSANDRA-19336: - You are correct. This was not included in 4.0.12/4.1.5. Looks like the "latest versions" were not updated after the last release. > Repair causes out of memory > --- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0.x, 4.1.5, 5.0-beta2, 5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19336) Repair causes out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-19336: Fix Version/s: 4.0.1 (was: 4.0.12) (was: 4.1.4) > Repair causes out of memory > --- > > Key: CASSANDRA-19336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19336 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0.1, 5.0-beta2, 5.1 > > Time Spent: 40m > Remaining Estimate: 0h > > CASSANDRA-14096 introduced {{repair_session_space}} as a limit for the memory > usage for Merkle tree calculations during repairs. This limit is applied to > the set of Merkle trees built for a received validation request > ({{{}VALIDATION_REQ{}}}), divided by the replication factor so as not to > overwhelm the repair coordinator, who will have requested RF sets of Merkle > trees. That way the repair coordinator should only use > {{repair_session_space}} for the RF Merkle trees. > However, a repair session without {{{}-pr-{}}}/{{{}-partitioner-range{}}} > will send RF*RF validation requests, because the repair coordinator node has > RF-1 replicas and is also the replica of RF-1 nodes. Since all the requests > are sent at the same time, at some point the repair coordinator can have up > to RF*{{{}repair_session_space{}}} worth of Merkle trees if none of the > validation responses is fully processed before the last response arrives. > Even worse, if the cluster uses virtual nodes, many nodes can be replicas of > the repair coordinator, and some nodes can be replicas of multiple token > ranges. It would mean that the repair coordinator can send more than RF or > RF*RF simultaneous validation requests. > For example, in an 11-node cluster with RF=3 and 256 tokens, we have seen a > repair session involving 44 groups of ranges to be repaired. This produces > 44*3=132 validation requests contacting all the nodes in the cluster. When > the responses for all these requests start to arrive to the coordinator, each > containing up to {{repair_session_space}}/3 of Merkle trees, they accumulate > quicker than they are consumed, greatly exceeding {{repair_session_space}} > and OOMing the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820812#comment-17820812 ] Dipietro Salvatore edited comment on CASSANDRA-19429 at 2/26/24 6:52 PM: - My performance benchmarks results looks good to me: r8g.24xl: 458k op/s (2.72x) r7i.24xl: 292k op/s (1.9x) Tested with commands: {code:java} cd git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr cd cassandra-instaclustr git checkout CASSANDRA-19429-4.1 git log -10 --oneline CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt{code} was (Author: JIRAUSER304377): My performance benchmarks results looks good to me: r8g.24xl: 458k op/s (2.72x) r7i.24xl: 292k op/s (1.9x) Tested with commands: {code:java} cd git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr cd cassandra-instaclustr git checkout CASSANDRA-19429-4.1git log -10 --oneline CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt{code} > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820816#comment-17820816 ] Dipietro Salvatore commented on CASSANDRA-19429: Got new profile with the patch and I don't see that locks anymore and the overall number dropped to 200. Preatty amazing! !Screenshot 2024-02-26 at 10.27.10.png! > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dipietro Salvatore updated CASSANDRA-19429: --- Attachment: Screenshot 2024-02-26 at 10.27.10.png > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820814#comment-17820814 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Great, so what are the next steps? [~brandon.williams] If think that if we go to call DD in MBeans, this should not stay only in 4.0 and 4.1 but we should copy this behavior to newer branches too. That would make it a little bit more involved. I am probably fine with leaving out MBean bits from 4.0 / 4.1 but it is questionable if we all are OK with it ... > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820812#comment-17820812 ] Dipietro Salvatore commented on CASSANDRA-19429: My performance benchmarks results looks good to me: r8g.24xl: 458k op/s (2.72x) r7i.24xl: 292k op/s (1.9x) Tested with commands: {code:java} cd git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr cd cassandra-instaclustr git checkout CASSANDRA-19429-4.1git log -10 --oneline CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt{code} > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch asf-staging updated (b9eaa9ece -> e5f98fc3b)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard b9eaa9ece generate docs for fd550e9c new e5f98fc3b generate docs for fd550e9c This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (b9eaa9ece) \ N -- N -- N refs/heads/asf-staging (e5f98fc3b) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: content/search-index.js | 2 +- site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes 2 files changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820793#comment-17820793 ] Dipietro Salvatore commented on CASSANDRA-19429: sure, let me test your patch > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19433) Nodetool cleanup can drop data Accord might need
Ariel Weisberg created CASSANDRA-19433: -- Summary: Nodetool cleanup can drop data Accord might need Key: CASSANDRA-19433 URL: https://issues.apache.org/jira/browse/CASSANDRA-19433 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg Nodetool cleanup can theoretically drop data that Accord still needs. I don't think cleanup even waits for streaming to finish. Accord in general doesn't have a strategy for dropping data after topology changes right now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19432) Accord & Paxos migration doesn't enforce lower bound on generated timestamps
Ariel Weisberg created CASSANDRA-19432: -- Summary: Accord & Paxos migration doesn't enforce lower bound on generated timestamps Key: CASSANDRA-19432 URL: https://issues.apache.org/jira/browse/CASSANDRA-19432 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg When migrating between the two a coordinator with bad clock sync could write data that has already been tombstones for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19431) Mutations need to split Accord/non-Accord mutations based on whether migration is completed
Ariel Weisberg created CASSANDRA-19431: -- Summary: Mutations need to split Accord/non-Accord mutations based on whether migration is completed Key: CASSANDRA-19431 URL: https://issues.apache.org/jira/browse/CASSANDRA-19431 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg If we don't do this then requests will fail if they span Accord and non-Accord keys and tables. This breaks unlogged batches for example. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19430) Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord
Ariel Weisberg created CASSANDRA-19430: -- Summary: Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord Key: CASSANDRA-19430 URL: https://issues.apache.org/jira/browse/CASSANDRA-19430 Project: Cassandra Issue Type: Bug Reporter: Ariel Weisberg This is because the read repair will simply fail if Accord doesn't manage that range. Not only does it need to be routed through Accord but if it races with topology change it needs to retry and not surface an error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Source Control Link: https://github.com/apache/cassandra/pull/3136 (was: https://github.com/apache/cassandra/pull/3094) > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815861#comment-17815861 ] Ariel Weisberg edited comment on CASSANDRA-19381 at 2/26/24 4:14 PM: - [https://github.com/apache/cassandra/pull/3136] Still need to run the tests was (Author: aweisberg): [https://github.com/apache/cassandra/pull/3094] Currently running tests > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820764#comment-17820764 ] Ariel Weisberg commented on CASSANDRA-19381: Sorry about that, the branch name was mis-spelled and then I never recreated the PR and finished the tests. > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19409) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.*
[ https://issues.apache.org/jira/browse/CASSANDRA-19409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820762#comment-17820762 ] Ekaterina Dimitrova commented on CASSANDRA-19409: - You said you checked that those tests pass almost on the edge time-wise on other branches and in CircleCI, and they just hit the limit on Jenkins 5.0. I agree then there is nothing to pull in and we can commit the fix we agreed on: [https://github.com/apache/cassandra-dtest/pull/254] {quote}The wording on the docs didn't hint me on that. {quote} Indeed, they seem useless in these matters; I found it out from Google and by experimenting. > Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.* > - > > Key: CASSANDRA-19409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19409 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0-rc > > > Failing in Jenkins: > * > [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] > * > [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] > * > [dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test/TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade/] > * > [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade/] > * > [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19381) StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points correctly on overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19381: --- Status: In Progress (was: Patch Available) > StreamingTombstoneHistogramBuilder.DataHolder does not merge histogram points > correctly on overflow > --- > > Key: CASSANDRA-19381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19381 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.0, 5.1 > > > The algorithm tries to merge the two nearest points in the histogram and > create a new point that is in between the two merged points based on the > weight of each point. This can overflow long arithmetic with the code that is > currently there and the work around is pick one of the points and just put > it there. > This can be worked around by changing the midpoint calculation to not > overflow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820761#comment-17820761 ] Brad Schoening commented on CASSANDRA-18762: [~manmagic3] yes, we are using vnodes where num_tokens = 16. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB >
[jira] [Updated] (CASSANDRA-19409) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.*
[ https://issues.apache.org/jira/browse/CASSANDRA-19409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19409: Status: Ready to Commit (was: Review In Progress) > Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.* > - > > Key: CASSANDRA-19409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19409 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0-rc > > > Failing in Jenkins: > * > [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] > * > [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] > * > [dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test/TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade/] > * > [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade/] > * > [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820757#comment-17820757 ] Stefan Miklosovic commented on CASSANDRA-19426: --- [~samt] thoughts? > Fix Double Type issues in the Gossiper#maybeGossipToCMS > --- > > Key: CASSANDRA-19426 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19426 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > _*issue-1:*_ > if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be > {*}_Infinity_{*}. > randDbl <= probability will always be true, then sendGossip > _*issue-2:*_ > comparing two double is safe by using *<* or {*}>{*}. However missing > accuracy will happen if we compare the equality of two double by > intuition({*}={*}). For example: > {code:java} > double probability = 0.1; > double randDbl = 0.10001; // Slightly greater than probability > if (randDbl <= probability) > { > System.out.println("randDbl <= probability(always here)"); > } > else > { > System.out.println("randDbl > probability"); > } > {code} > A good example from: _*Gossiper#maybeGossipToUnreachableMember*_ > {code:java} > if (randDbl < prob) > { > sendGossip(message, Sets.filter(unreachableEndpoints.keySet(), > ep -> > !isDeadState(getEndpointStateMap().get(ep; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19426: -- Change Category: Performance Complexity: Low Hanging Fruit Component/s: Cluster/Gossip Transactional Cluster Metadata Reviewers: Stefan Miklosovic Status: Open (was: Triage Needed) > Fix Double Type issues in the Gossiper#maybeGossipToCMS > --- > > Key: CASSANDRA-19426 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19426 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > _*issue-1:*_ > if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be > {*}_Infinity_{*}. > randDbl <= probability will always be true, then sendGossip > _*issue-2:*_ > comparing two double is safe by using *<* or {*}>{*}. However missing > accuracy will happen if we compare the equality of two double by > intuition({*}={*}). For example: > {code:java} > double probability = 0.1; > double randDbl = 0.10001; // Slightly greater than probability > if (randDbl <= probability) > { > System.out.println("randDbl <= probability(always here)"); > } > else > { > System.out.println("randDbl > probability"); > } > {code} > A good example from: _*Gossiper#maybeGossipToUnreachableMember*_ > {code:java} > if (randDbl < prob) > { > sendGossip(message, Sets.filter(unreachableEndpoints.keySet(), > ep -> > !isDeadState(getEndpointStateMap().get(ep; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820754#comment-17820754 ] Stefan Miklosovic commented on CASSANDRA-19426: --- I think you are right. The node itself is added among live members in getLiveMembers only so normally it is not there. I checked other code paths as well and it count with a fact that it might be 0. > Fix Double Type issues in the Gossiper#maybeGossipToCMS > --- > > Key: CASSANDRA-19426 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19426 > Project: Cassandra > Issue Type: Improvement >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > _*issue-1:*_ > if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be > {*}_Infinity_{*}. > randDbl <= probability will always be true, then sendGossip > _*issue-2:*_ > comparing two double is safe by using *<* or {*}>{*}. However missing > accuracy will happen if we compare the equality of two double by > intuition({*}={*}). For example: > {code:java} > double probability = 0.1; > double randDbl = 0.10001; // Slightly greater than probability > if (randDbl <= probability) > { > System.out.println("randDbl <= probability(always here)"); > } > else > { > System.out.println("randDbl > probability"); > } > {code} > A good example from: _*Gossiper#maybeGossipToUnreachableMember*_ > {code:java} > if (randDbl < prob) > { > sendGossip(message, Sets.filter(unreachableEndpoints.keySet(), > ep -> > !isDeadState(getEndpointStateMap().get(ep; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820749#comment-17820749 ] Stefan Miklosovic commented on CASSANDRA-19426: --- hi [~maoling], just wondering ... may it ever happen that "liveEndpoints.size() + unreachableEndpoints.size() == 0" ? Is not the node which is executing that code live already? So it will be always at least 1. Or liveEndpoints exclude the current node? Just asking. > Fix Double Type issues in the Gossiper#maybeGossipToCMS > --- > > Key: CASSANDRA-19426 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19426 > Project: Cassandra > Issue Type: Improvement >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > _*issue-1:*_ > if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be > {*}_Infinity_{*}. > randDbl <= probability will always be true, then sendGossip > _*issue-2:*_ > comparing two double is safe by using *<* or {*}>{*}. However missing > accuracy will happen if we compare the equality of two double by > intuition({*}={*}). For example: > {code:java} > double probability = 0.1; > double randDbl = 0.10001; // Slightly greater than probability > if (randDbl <= probability) > { > System.out.println("randDbl <= probability(always here)"); > } > else > { > System.out.println("randDbl > probability"); > } > {code} > A good example from: _*Gossiper#maybeGossipToUnreachableMember*_ > {code:java} > if (randDbl < prob) > { > sendGossip(message, Sets.filter(unreachableEndpoints.keySet(), > ep -> > !isDeadState(getEndpointStateMap().get(ep; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19404) Unexpected NullPointerException in ANN+WHERE when adding rows in another partition
[ https://issues.apache.org/jira/browse/CASSANDRA-19404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820745#comment-17820745 ] Ekaterina Dimitrova commented on CASSANDRA-19404: - Thanks, I added the suggestion in a new commit - https://github.com/apache/cassandra/pull/3130/commits/b8b65d9c86316bf47c9ca3d1aaca3de8cab36220 This is also trunk patch; it is identical to 5.0: https://github.com/ekaterinadimitrova2/cassandra/pull/new/19404-trunk CI just started here: https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19404-trunk > Unexpected NullPointerException in ANN+WHERE when adding rows in another > partition > -- > > Key: CASSANDRA-19404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19404 > Project: Cassandra > Issue Type: Bug > Components: Feature/Vector Search >Reporter: Stefano Lottini >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > * *Bug observed on the Docker image 5.0-beta1* > * *Bug also observed on latest head of Cassandra repo (as of 2024-02-15)* > * _*(working fine on vsearch branch of datastax/cassandra, commit hash > 80c2f8b9ad5b89efee0645977a5ca53943717c0d)*_ > Summary: A query with _ann + where clause on a map + where clause on the > partition key_ starts erroring once there are other partitions in the table. > There are three SELECT statements in the repro minimal code below - the third > is where the error is triggered. > {code:java} > // reproduced with Dockerized Cassandra 5.0-beta1 on 2024-02-15 > / > // SCHEMA > / > CREATE TABLE ks.v_table ( > pk int, > row_v vector, > metadata map, > PRIMARY KEY (pk) > ); > CREATE CUSTOM INDEX v_md > ON ks.v_table (entries(metadata)) > USING 'StorageAttachedIndex'; > CREATE CUSTOM INDEX v_idx > ON ks.v_table (row_v) > USING 'StorageAttachedIndex'; > / > // SELECT WORKS (empty table) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; > // > // ADD ONE ROW > // > INSERT INTO ks.v_table (pk, metadata, row_v) > VALUES > (0, {'map_k': 'map_v'}, [0.11, 0.19]); > / > // SELECT WORKS (table has queried partition) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; > // > // ADD ONE ROW (another partition) > // > INSERT INTO ks.v_table (pk, metadata, row_v) > VALUES > (10, {'map_k': 'map_v'}, [0.11, 0.19]); > / > // SELECT BREAKS (table gained another partition) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; {code} > The error has this appearance in CQL Console: > {code:java} > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures: UNKNOWN from > /172.17.0.2:7000" info={'consistency': 'ONE', 'required_responses': 1, > 'received_responses': 0, 'failures': 1, 'error_code_map': {'172.17.0.2': > '0x'}} {code} > And the Cassandra logs have this to say: > {code:java} > java.lang.NullPointerException: Cannot invoke > "org.apache.cassandra.index.sai.iterators.KeyRangeIterator.skipTo(org.apache.cassandra.index.sai.utils.PrimaryKey)" > because "this.nextIterator" is null {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command
[ https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820744#comment-17820744 ] Stefan Miklosovic commented on CASSANDRA-19417: --- [~skoppu] would you be so nice and squashed all your work into one commit on top of the current trunk? I see there is a lot of merges etc in your PR. It would make the review and eventual merge way easier. [~samt] are you OK with introducing this to trunk? As I read CASSANDRA-18018 you were quite involved in the discussions there. > LIST SUPERUSERS cql command > --- > > Key: CASSANDRA-19417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19417 > Project: Cassandra > Issue Type: Improvement > Components: Tool/cqlsh >Reporter: Shailaja Koppu >Assignee: Shailaja Koppu >Priority: Normal > Labels: CQL > Time Spent: 10m > Remaining Estimate: 0h > > Developing a new CQL command LIST SUPERUSERS to return list of roles with > superuser privilege. This includes roles who acquired superuser privilege in > the hierarchy. > Context: LIST ROLES cql command lists roles, their membership details and > displays super=true for immediate superusers. But there can be roles who > acquired superuser privilege due to a grant. LIST ROLES command won't display > super=true for such roles and the only way to recognize such roles is to look > for atleast one row with super=true in the output of LIST ROLES OF name> command. While this works to check is a given role has superuser > privilege, there may be services (for example, Sidecar) working with C* and > may need to maintain list of roles with superuser privilege. There is no > existing command/tool to retrieve such roles details. Hence developing this > command which returns all roles having superuser privilege. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command
[ https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820738#comment-17820738 ] Brandon Williams commented on CASSANDRA-19417: -- I think this is the outcome from CASSANDRA-18018 > LIST SUPERUSERS cql command > --- > > Key: CASSANDRA-19417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19417 > Project: Cassandra > Issue Type: Improvement > Components: Tool/cqlsh >Reporter: Shailaja Koppu >Assignee: Shailaja Koppu >Priority: Normal > Labels: CQL > Time Spent: 10m > Remaining Estimate: 0h > > Developing a new CQL command LIST SUPERUSERS to return list of roles with > superuser privilege. This includes roles who acquired superuser privilege in > the hierarchy. > Context: LIST ROLES cql command lists roles, their membership details and > displays super=true for immediate superusers. But there can be roles who > acquired superuser privilege due to a grant. LIST ROLES command won't display > super=true for such roles and the only way to recognize such roles is to look > for atleast one row with super=true in the output of LIST ROLES OF name> command. While this works to check is a given role has superuser > privilege, there may be services (for example, Sidecar) working with C* and > may need to maintain list of roles with superuser privilege. There is no > existing command/tool to retrieve such roles details. Hence developing this > command which returns all roles having superuser privilege. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820737#comment-17820737 ] Berenguer Blasi commented on CASSANDRA-19414: - I understand all the annoyances of having your scripting feng-shui broken when sbdy commits sthg that collides with it. But this is going to be the case for every committer's personal toolbox that is not committed source in the project unfortunately. There can be newer workflows in the future, changes, tunning, etc so it is painful but necessary and unavoidable imo. On the renaming issue, it was _not_ my intention to do it in this ticket but to _eventually_ do it, as I agree a separate workflow makes sense just for clarity as this is not pre-commit indeed. So now that the code is there and I was going to do it already anyway I am in favor of merging it as it is. Iiuc I have +1 from you both to move forward. Please shout if I have it wrong. > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19404) Unexpected NullPointerException in ANN+WHERE when adding rows in another partition
[ https://issues.apache.org/jira/browse/CASSANDRA-19404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-19404: Reviewers: Andres de la Peña, Ekaterina Dimitrova (was: Andres de la Peña) Status: Review In Progress (was: Patch Available) > Unexpected NullPointerException in ANN+WHERE when adding rows in another > partition > -- > > Key: CASSANDRA-19404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19404 > Project: Cassandra > Issue Type: Bug > Components: Feature/Vector Search >Reporter: Stefano Lottini >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > * *Bug observed on the Docker image 5.0-beta1* > * *Bug also observed on latest head of Cassandra repo (as of 2024-02-15)* > * _*(working fine on vsearch branch of datastax/cassandra, commit hash > 80c2f8b9ad5b89efee0645977a5ca53943717c0d)*_ > Summary: A query with _ann + where clause on a map + where clause on the > partition key_ starts erroring once there are other partitions in the table. > There are three SELECT statements in the repro minimal code below - the third > is where the error is triggered. > {code:java} > // reproduced with Dockerized Cassandra 5.0-beta1 on 2024-02-15 > / > // SCHEMA > / > CREATE TABLE ks.v_table ( > pk int, > row_v vector, > metadata map, > PRIMARY KEY (pk) > ); > CREATE CUSTOM INDEX v_md > ON ks.v_table (entries(metadata)) > USING 'StorageAttachedIndex'; > CREATE CUSTOM INDEX v_idx > ON ks.v_table (row_v) > USING 'StorageAttachedIndex'; > / > // SELECT WORKS (empty table) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; > // > // ADD ONE ROW > // > INSERT INTO ks.v_table (pk, metadata, row_v) > VALUES > (0, {'map_k': 'map_v'}, [0.11, 0.19]); > / > // SELECT WORKS (table has queried partition) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; > // > // ADD ONE ROW (another partition) > // > INSERT INTO ks.v_table (pk, metadata, row_v) > VALUES > (10, {'map_k': 'map_v'}, [0.11, 0.19]); > / > // SELECT BREAKS (table gained another partition) > / > SELECT * FROM ks.v_table > WHERE metadata['map_k'] = 'map_v' > AND pk = 0 > ORDER BY row_v ANN OF [0.1, 0.2] > LIMIT 4; {code} > The error has this appearance in CQL Console: > {code:java} > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures: UNKNOWN from > /172.17.0.2:7000" info={'consistency': 'ONE', 'required_responses': 1, > 'received_responses': 0, 'failures': 1, 'error_code_map': {'172.17.0.2': > '0x'}} {code} > And the Cassandra logs have this to say: > {code:java} > java.lang.NullPointerException: Cannot invoke > "org.apache.cassandra.index.sai.iterators.KeyRangeIterator.skipTo(org.apache.cassandra.index.sai.utils.PrimaryKey)" > because "this.nextIterator" is null {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820729#comment-17820729 ] Stefan Miklosovic edited comment on CASSANDRA-19414 at 2/26/24 2:52 PM: I have never done that and I think it is a bad practice in general. In a lot of cases one config is not even applicable to other people because of the differences in CircleCI plan, parallelism, resource classes etc ... that commit has even such commit message "DO NOT COMMIT". Anyway, to close this, just make that workflow called differently if you all think it is better and I ll somehow cope with it. was (Author: smiklosovic): I have never done that and I think it is a bad practice in general. In a lot of cases one config is not even applicable to the other because of the differences in CircleCI plan, parallelism, resource classes etc ... that commit has even such commit message "DO NOT COMMIT". > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820729#comment-17820729 ] Stefan Miklosovic commented on CASSANDRA-19414: --- I have never done that and I think it is a bad practice in general. In a lot of cases one config is not even applicable to the other because of the differences in CircleCI plan, parallelism, resource classes etc ... that commit has even such commit message "DO NOT COMMIT". > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820727#comment-17820727 ] Ekaterina Dimitrova edited comment on CASSANDRA-19414 at 2/26/24 2:49 PM: -- You forget many people cherry-pick a config commit from others and do not use the script, [~smiklosovic] {quote}there is always a committer looking over their shoulder. {quote} As a committer, I prefer clarity so we can have shorter review cycles was (Author: e.dimitrova): You forget many people cherry-pick a config commit from others and do not use the script, [~smiklosovic] > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820727#comment-17820727 ] Ekaterina Dimitrova edited comment on CASSANDRA-19414 at 2/26/24 2:48 PM: -- You forget many people cherry-pick a config commit from others and do not use the script, [~smiklosovic] was (Author: e.dimitrova): You forget many people cherry-pick the config and do not use the script, [~smiklosovic] > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820727#comment-17820727 ] Ekaterina Dimitrova commented on CASSANDRA-19414: - You forget many people cherry-pick the config and do not use the script, [~smiklosovic] > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820726#comment-17820726 ] Stefan Miklosovic commented on CASSANDRA-19414: --- _If we have the same names - people will start pushing pre-commit the dev workflow_ I do not think this will happen. You need to make a conscious decision to run "dev workflow" by specifying "-d". People who are not working on this daily do not have any strong reason to use "-d" and even if they do, there is always a committer looking over their shoulder. Anyway ... if anything I would call it java11_dev_tests your call [~bereng] > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820725#comment-17820725 ] Manish Khandelwal commented on CASSANDRA-18762: --- [~bschoeni] were vnodes enabled for 4DC cluster when you run parallel repair and getting Direct buffer OOM. Also what was the value of vnodes? > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap on a VM with 32GB > of physical RAM due to direct memory. This seems to be related to > CASSANDRA-15202 which moved Merkel trees off-heap in 4.0. Using Cassandra > 4.0.6 with Java 11. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled >
[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow
[ https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820723#comment-17820723 ] Ekaterina Dimitrova commented on CASSANDRA-19414: - {quote}I would need to make the difference here and I do not see that to be necessary. {quote} I disagree; it is a bad user experience to have different workflows under the same name. The idea is to encourage people to spend fewer resources and less of their time on CI. If we have the same names - people will start pushing pre-commit the dev workflow, and then they have to still go do all the clicking for the rest of the jobs or worse - run the real pre-commit workflow and spend more time and resources. Many people do not work on C* daily; we need to make it easy and not confusing for all contributors. When someone looks into the UI, it needs to be clear what they are looking at. [~smiklosovic] , if you want something different that can easily integrate with your scripts - please submit a patch addressing the stated concern about user experience. Your scripts were never open sourced, so we cannot know what would work for you and take it into account here. Otherwise, I am +1 on the latest version that [~Bereng] pushed. > Skinny dev circle workflow > -- > > Key: CASSANDRA-19414 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19414 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x, 5.x > > > CircleCi CI runs are getting pretty heavy. During dev iterations we trigger > many CI pre-commit jobs which are just an overkill. > This ticket has the purpose to purge from the pre-commit workflow all > variations of the test matrix but the vanilla one. That should enable us for > a quick and cheap to iterate *during dev*, this is not a substitute for > pre-commit . This ticket's work will serve as the basis for the upcoming > changes being discussed > [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command
[ https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820714#comment-17820714 ] Stefan Miklosovic commented on CASSANDRA-19417: --- hi [~skoppu], thank you for the patch! Is anybody else aware of this idea? It would be cool to have a broader consensus here. Anything like mailing list thread or similar? > LIST SUPERUSERS cql command > --- > > Key: CASSANDRA-19417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19417 > Project: Cassandra > Issue Type: Improvement > Components: Tool/cqlsh >Reporter: Shailaja Koppu >Assignee: Shailaja Koppu >Priority: Normal > Labels: CQL > Time Spent: 10m > Remaining Estimate: 0h > > Developing a new CQL command LIST SUPERUSERS to return list of roles with > superuser privilege. This includes roles who acquired superuser privilege in > the hierarchy. > Context: LIST ROLES cql command lists roles, their membership details and > displays super=true for immediate superusers. But there can be roles who > acquired superuser privilege due to a grant. LIST ROLES command won't display > super=true for such roles and the only way to recognize such roles is to look > for atleast one row with super=true in the output of LIST ROLES OF name> command. While this works to check is a given role has superuser > privilege, there may be services (for example, Sidecar) working with C* and > may need to maintain list of roles with superuser privilege. There is no > existing command/tool to retrieve such roles details. Hence developing this > command which returns all roles having superuser privilege. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org