[jira] [Comment Edited] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198173#comment-17198173 ] ZhaoYang edited comment on CASSANDRA-16036 at 9/18/20, 7:16 AM: !16036_128mb.png! Above is write perf in mixed-read-write test using 128mb cache between 16036-disable-chunk-cache and its base line. Disabling chunk cache significantly improves latency. (read perf is similar to write perf) !15229_128mb.png! Above is write perf in mixed-read-write test using 128mb cache between 15229-disable-chunk-cache and 15229-improved-buffer-pool. Disabling chunk cache show some improvement on latency. (read perf is similar to write perf) was (Author: jasonstack): !16036_128mb.png! Above is write perf in mixed-read-write test between 16036-disable-chunk-cache and its base line. Disabling chunk cache significantly improves latency. (read perf is similar to write perf) !15229_128mb.png! Above is write perf in mixed-read-write test between 15229-disable-chunk-cache and 15229-improved-buffer-pool. Disabling chunk cache show some improvement on latency. (read perf is similar to write perf) > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15229_128mb.png, 16036_128mb.png, > async-profile.collapsed.svg, > clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_selects_baseline_attempt3.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline_attempt3.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198175#comment-17198175 ] ZhaoYang commented on CASSANDRA-16036: -- +1 to disable chunk cache until we get CASSANDRA-15229 and other improvements (eg. fixed buffer size) into chunk cache. > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15229_128mb.png, 16036_128mb.png, > async-profile.collapsed.svg, > clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_selects_baseline_attempt3.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline_attempt3.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198173#comment-17198173 ] ZhaoYang commented on CASSANDRA-16036: -- !16036_128mb.png! Above is write perf in mixed-read-write test between 16036-disable-chunk-cache and its base line. Disabling chunk cache significantly improves latency. (read perf is similar to write perf) !15229_128mb.png! Above is write perf in mixed-read-write test between 15229-disable-chunk-cache and 15229-improved-buffer-pool. Disabling chunk cache show some improvement on latency. (read perf is similar to write perf) > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15229_128mb.png, 16036_128mb.png, > async-profile.collapsed.svg, > clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_selects_baseline_attempt3.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline_attempt3.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16036: - Attachment: 16036_128mb.png 15229_128mb.png > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15229_128mb.png, 16036_128mb.png, > async-profile.collapsed.svg, > clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_selects_baseline_attempt3.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline_attempt3.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197107#comment-17197107 ] ZhaoYang commented on CASSANDRA-16036: -- I wonder if the chunk-cache regression is related to CASSANDRA-15229, let me run some tests from CASSANDRA-15229. Also, I am not a committer, you may need to find one more... > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: async-profile.collapsed.svg, > clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_selects_baseline_attempt3.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline_attempt3.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16036: - Reviewers: Jon Meredith, ZhaoYang (was: Jon Meredith) > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: async-profile.collapsed.svg, > clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_selects_baseline_attempt3.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline_attempt3.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16123) Use materialized view, CPU usage over 100%
[ https://issues.apache.org/jira/browse/CASSANDRA-16123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194804#comment-17194804 ] ZhaoYang commented on CASSANDRA-16123: -- The jstack showed HintsDispatcher is running and it may generate writes with MV. {\{"Keyspace.applyInternal,line 545"}} will retry until writes timeout, 2s. So I don't think it will consume CPU indefinitely. > Use materialized view, CPU usage over 100% > -- > > Key: CASSANDRA-16123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16123 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: chenbing >Priority: Normal > Attachments: image-2020-09-12-16-50-41-341.png, > image-2020-09-12-16-52-46-581.png, image-2020-09-12-16-55-49-813.png, > jstack_25640.txt > > > env info: > os: CentOS Linux release 7.4.1708 > cassandra: apache-cassandra-3.11.8 > jdk:1.8.0_261 > > I used materialized view,but the cpu use over 100 when not cql client > request, > My analysis process is as follows: > 1. top > find pid is:25640 > 2. top -pH 25640 > !image-2020-09-12-16-52-46-581.png! > 3. printf "%x\n" 26065 > convert threadid 26065 ,hex value is :65d1 > 4.jstack -l 26065 > jstack_25640.txt > find 65d1 ind jstask_25640.txt > !image-2020-09-12-16-55-49-813.png! > > 5.find in source code on org.apache.cassandra.db.Keyspace.applyInternal,line > 545 > I guess used cpu over 100% caused by a loop call Keyspace.applyInternal. > > Everybody have any suggessted? > The jstack_25640.txt file on Attachment. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193818#comment-17193818 ] ZhaoYang commented on CASSANDRA-15861: -- Thanks for the review and feedback > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta3 > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file length didn't matter. > > Ideally it will be great to make sstable STATS metadata immuta
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: https://app.circleci.com/pipelines/github/jasonstack/cassandra/306/workflows/27e49813-d93b-49df-9722-737b932710b3 (was: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/301/workflows/4daf1646-77d4-4b83-8df6-5caeb73f2fe8]) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are se
[jira] [Updated] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
[ https://issues.apache.org/jira/browse/CASSANDRA-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16092: - Test and Documentation Plan: https://app.circleci.com/pipelines/github/jasonstack/cassandra/305/workflows/6c813342-2bdb-4740-8599-6a8c34ab97da Status: Patch Available (was: In Progress) > Add Index Group Interface for Storage Attached Index > > > Key: CASSANDRA-16092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 > Project: Cassandra > Issue Type: New Feature > Components: Feature/SASI >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > [Index > group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] > interface allows: > * indexes on the same table to receive centralized lifecycle events called > secondary index groups. Sharing of data between multiple column indexes on > the same table allows SAI disk usage to realise significant space savings > over other index implementations. > * index-group to analyze user query and provide a query plan that leverages > all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
[ https://issues.apache.org/jira/browse/CASSANDRA-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16092: - Change Category: Semantic (was: Code Clarity) > Add Index Group Interface for Storage Attached Index > > > Key: CASSANDRA-16092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 > Project: Cassandra > Issue Type: New Feature > Components: Feature/SASI >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > [Index > group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] > interface allows: > * indexes on the same table to receive centralized lifecycle events called > secondary index groups. Sharing of data between multiple column indexes on > the same table allows SAI disk usage to realise significant space savings > over other index implementations. > * index-group to analyze user query and provide a query plan that leverages > all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16108) Concurrent Index Memtable implementation using Trie
[ https://issues.apache.org/jira/browse/CASSANDRA-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16108: - Fix Version/s: 4.x > Concurrent Index Memtable implementation using Trie > --- > > Key: CASSANDRA-16108 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16108 > Project: Cassandra > Issue Type: New Feature >Reporter: ZhaoYang >Assignee: ratcharod >Priority: Normal > Fix For: 4.x > > > Replace existing \{{ConcurrentRadixTree}} with Trie implementation for both > numeric index and string index to reduce memory usage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
[ https://issues.apache.org/jira/browse/CASSANDRA-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191625#comment-17191625 ] ZhaoYang commented on CASSANDRA-16092: -- I have ported [Index interface changes|https://github.com/apache/cassandra/pull/735] for Storage Attached Index: * {{Index#Group}} to manage lifecycle of multiple indexes that can communicate with each other. * {{Index#QueryPlan}} to provide a set of indexes that can work together for a given query. * {{Index#Searcher}} to perform actual index searching. * Enhanced {{SSTableFlushObserver}} to pass partition deletion, static row, unfilter separately. * Moved {{UpdateTransaction}} into {{CFS}} so that we can make sure memtable and index memtable are in-sync. cc [~adelapena] [~maedhroz] > Add Index Group Interface for Storage Attached Index > > > Key: CASSANDRA-16092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 > Project: Cassandra > Issue Type: New Feature > Components: Feature/SASI >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > > [Index > group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] > interface allows: > * indexes on the same table to receive centralized lifecycle events called > secondary index groups. Sharing of data between multiple column indexes on > the same table allows SAI disk usage to realise significant space savings > over other index implementations. > * index-group to analyze user query and provide a query plan that leverages > all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
[ https://issues.apache.org/jira/browse/CASSANDRA-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16092: - Change Category: Code Clarity Complexity: Normal Fix Version/s: 4.x Status: Open (was: Triage Needed) > Add Index Group Interface for Storage Attached Index > > > Key: CASSANDRA-16092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 > Project: Cassandra > Issue Type: New Feature > Components: Feature/SASI >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > [Index > group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] > interface allows: > * indexes on the same table to receive centralized lifecycle events called > secondary index groups. Sharing of data between multiple column indexes on > the same table allows SAI disk usage to realise significant space savings > over other index implementations. > * index-group to analyze user query and provide a query plan that leverages > all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
[ https://issues.apache.org/jira/browse/CASSANDRA-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16092: - Source Control Link: https://github.com/apache/cassandra/pull/735 > Add Index Group Interface for Storage Attached Index > > > Key: CASSANDRA-16092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 > Project: Cassandra > Issue Type: New Feature > Components: Feature/SASI >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > > [Index > group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] > interface allows: > * indexes on the same table to receive centralized lifecycle events called > secondary index groups. Sharing of data between multiple column indexes on > the same table allows SAI disk usage to realise significant space savings > over other index implementations. > * index-group to analyze user query and provide a query plan that leverages > all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16108) Concurrent Index Memtable implementation using Trie
ZhaoYang created CASSANDRA-16108: Summary: Concurrent Index Memtable implementation using Trie Key: CASSANDRA-16108 URL: https://issues.apache.org/jira/browse/CASSANDRA-16108 Project: Cassandra Issue Type: New Feature Reporter: ZhaoYang Replace existing \{{ConcurrentRadixTree}} with Trie implementation for both numeric index and string index to reduce memory usage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
[ https://issues.apache.org/jira/browse/CASSANDRA-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang reassigned CASSANDRA-16092: Assignee: ZhaoYang > Add Index Group Interface for Storage Attached Index > > > Key: CASSANDRA-16092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 > Project: Cassandra > Issue Type: New Feature > Components: Feature/SASI >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > > [Index > group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] > interface allows: > * indexes on the same table to receive centralized lifecycle events called > secondary index groups. Sharing of data between multiple column indexes on > the same table allows SAI disk usage to realise significant space savings > over other index implementations. > * index-group to analyze user query and provide a query plan that leverages > all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/301/workflows/4daf1646-77d4-4b83-8df6-5caeb73f2fe8] (was: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/300/workflows/f41f6585-cd97-4791-abdc-a2935694948e]) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/300/workflows/f41f6585-cd97-4791-abdc-a2935694948e] (was: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/298/workflows/4e56c8b2-a998-4785-9daf-0ceee52a9a83]) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188976#comment-17188976 ] ZhaoYang commented on CASSANDRA-15861: -- Pushed a unit test to verify "compressionMetadata" is used to calculate the transferred size for compressed sstable. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file length d
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: [https://app.circleci.com/pipelines/github/jasonstack/cassandra/298/workflows/4e56c8b2-a998-4785-9daf-0ceee52a9a83] (was: https://circleci.com/workflow-run/610e8169-e60c-420b-a556-4120967db6cb) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a
[jira] [Created] (CASSANDRA-16092) Add Index Group Interface for Storage Attached Index
ZhaoYang created CASSANDRA-16092: Summary: Add Index Group Interface for Storage Attached Index Key: CASSANDRA-16092 URL: https://issues.apache.org/jira/browse/CASSANDRA-16092 Project: Cassandra Issue Type: New Feature Components: Feature/SASI Reporter: ZhaoYang [Index group|https://github.com/datastax/cassandra/blob/storage_attached_index/src/java/org/apache/cassandra/index/Index.java#L634] interface allows: * indexes on the same table to receive centralized lifecycle events called secondary index groups. Sharing of data between multiple column indexes on the same table allows SAI disk usage to realise significant space savings over other index implementations. * index-group to analyze user query and provide a query plan that leverages all available indexes within the group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188178#comment-17188178 ] ZhaoYang edited comment on CASSANDRA-15861 at 9/1/20, 6:46 AM: --- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{CassandraStreamHeder}} around compressed size is to restore original behavior (reduce GC) before storege-engine refactoring. was (Author: jasonstack): bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{CassandraStreamHeder}} around compressed size is to restore original behavior before storege-engine refactoring. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 >
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188178#comment-17188178 ] ZhaoYang edited comment on CASSANDRA-15861 at 9/1/20, 6:45 AM: --- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{CassandraStreamHeder}} around compressed size is to restore original behavior before storege-engine refactoring. was (Author: jasonstack): bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{ComponentMetadata}} around compressed size is to restore original behavior before storege-engine refactoring. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broad
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188178#comment-17188178 ] ZhaoYang edited comment on CASSANDRA-15861 at 9/1/20, 6:44 AM: --- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. The change in {{ComponentMetadata}} around compressed size is to restore original behavior before storege-engine refactoring. was (Author: jasonstack): bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-messag
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188178#comment-17188178 ] ZhaoYang commented on CASSANDRA-15861: -- bq. Could you elaborate on what needs to be changed specifically in mine code so it will be fully ok again? hmm.. some moved codes in {{CassandraOutgoingFiles}}. What are you worried about? bq. Are you already sure that your changes are computing sizes in both compressed and uncompressed paths right? this patch is for zero-copy-streaming to avoid partial written files, not about how size is calculated. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{C
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188166#comment-17188166 ] ZhaoYang commented on CASSANDRA-15861: -- [~stefan.miklosovic] I took a brief look at the patch in 15406, I think there are just minor superficial conflicts, no compatibility issue. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming
[jira] [Updated] (CASSANDRA-16076) Batch schema statements to create multiple SASI and MV at once to reduce disk IO
[ https://issues.apache.org/jira/browse/CASSANDRA-16076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16076: - Summary: Batch schema statements to create multiple SASI and MV at once to reduce disk IO (was: Batch schema statement to create multiple SASI and MV at once to reduce disk IO) > Batch schema statements to create multiple SASI and MV at once to reduce disk > IO > > > Key: CASSANDRA-16076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16076 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Materialized Views, Feature/SASI >Reporter: ZhaoYang >Priority: Normal > Fix For: 4.x > > > Currently, operator has to create SASI/MV one by one on the same table and > every index/view build will need to read all data on disk. > In order to speed up multiple SASI/MV creation, I propose to add a new batch > schema statement to create multiple SASI/MV at once, so that C* only needs to > read on-disk data once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16076) Batch schema statement to create multiple SASI and MV at once to reduce disk IO
[ https://issues.apache.org/jira/browse/CASSANDRA-16076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16076: - Summary: Batch schema statement to create multiple SASI and MV at once to reduce disk IO (was: Batch schema statement to multiple SASI and MV at once to reduce disk IO) > Batch schema statement to create multiple SASI and MV at once to reduce disk > IO > --- > > Key: CASSANDRA-16076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16076 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Materialized Views, Feature/SASI >Reporter: ZhaoYang >Priority: Normal > > Currently, operator has to create SASI/MV one by one on the same table and > every index/view build will need to read all data on disk. > In order to speed up multiple SASI/MV creation, I propose to add a new batch > schema statement to create multiple SASI/MV at once, so that C* only needs to > read on-disk data once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16076) Batch schema statement to multiple SASI and MV at once to reduce disk IO
ZhaoYang created CASSANDRA-16076: Summary: Batch schema statement to multiple SASI and MV at once to reduce disk IO Key: CASSANDRA-16076 URL: https://issues.apache.org/jira/browse/CASSANDRA-16076 Project: Cassandra Issue Type: New Feature Components: Feature/Materialized Views, Feature/SASI Reporter: ZhaoYang Currently, operator has to create SASI/MV one by one on the same table and every index/view build will need to read all data on disk. In order to speed up multiple SASI/MV creation, I propose to add a new batch schema statement to create multiple SASI/MV at once, so that C* only needs to read on-disk data once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16076) Batch schema statement to create multiple SASI and MV at once to reduce disk IO
[ https://issues.apache.org/jira/browse/CASSANDRA-16076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16076: - Fix Version/s: 4.x > Batch schema statement to create multiple SASI and MV at once to reduce disk > IO > --- > > Key: CASSANDRA-16076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16076 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Materialized Views, Feature/SASI >Reporter: ZhaoYang >Priority: Normal > Fix For: 4.x > > > Currently, operator has to create SASI/MV one by one on the same table and > every index/view build will need to read all data on disk. > In order to speed up multiple SASI/MV creation, I propose to add a new batch > schema statement to create multiple SASI/MV at once, so that C* only needs to > read on-disk data once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185031#comment-17185031 ] ZhaoYang commented on CASSANDRA-15861: -- [~bdeggleston] I have restored previous commits, sorry for the trouble > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file length didn't matter. > > Ideally it will be great to
[jira] [Commented] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184111#comment-17184111 ] ZhaoYang commented on CASSANDRA-16071: -- Thanks for the patch. LGTM. > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183177#comment-17183177 ] ZhaoYang edited comment on CASSANDRA-16071 at 8/24/20, 11:57 AM: - {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb" (153GB)}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes"}} and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} was (Author: jasonstack): {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb" (153GB)}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes" and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183177#comment-17183177 ] ZhaoYang edited comment on CASSANDRA-16071 at 8/24/20, 11:53 AM: - {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb" (153GB)}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes" and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} was (Author: jasonstack): {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb"}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes" and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183177#comment-17183177 ] ZhaoYang edited comment on CASSANDRA-16071 at 8/24/20, 11:52 AM: - {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb"}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes" and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} was (Author: jasonstack): {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb"}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes" and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183177#comment-17183177 ] ZhaoYang edited comment on CASSANDRA-16071 at 8/24/20, 11:50 AM: - {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb"}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes" and rename {{"IndexMode#maxCompactionFlushMemoryInMb"}} to {{"maxCompactionFlushMemoryInBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} was (Author: jasonstack): {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb"}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183177#comment-17183177 ] ZhaoYang commented on CASSANDRA-16071: -- {code:java} long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1048576 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} [~mck] I think the default {{"1048576 * 0.15 mb"}} may still cause OOM. How about we rename {{"maxMemMb"}} to {{"maxMemBytes"}}. So it should be : {code:java} long maxMemBytes = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable : 1048576L * Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION)); {code} > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16071: - Test and Documentation Plan: CI running Status: Patch Available (was: In Progress) > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16071) max_compaction_flush_memory_in_mb is interpreted as bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16071: - Reviewers: ZhaoYang > max_compaction_flush_memory_in_mb is interpreted as bytes > - > > Key: CASSANDRA-16071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16071 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > In CASSANDRA-12662, [~scottcarey] > [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055] > that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly > interpreted in bytes rather than megabytes as its name implies. > {quote} > 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is > actually memory in BYTES. If you take it at face value, and set it to say, > '512' thinking that means 512MB, you will produce a million temp files > rather quickly in a large compaction, which will exhaust even large values of > max_map_count rapidly, and get the OOM: Map Error issue above and possibly > have a very difficult situation to get a cluster back into a place where > nodes aren't crashing while initilaizing or soon after. This issue is minor > if you know about it in advance and set the value IN BYTES. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181851#comment-17181851 ] ZhaoYang commented on CASSANDRA-15861: -- updated the patch based on caleb's builder approach, now dfile/ifile/bf/indexSummary are all final. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file length didn't matter. >
[jira] [Updated] (CASSANDRA-16052) CEP-7 Storage Attached Index for Apache Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16052: - Summary: CEP-7 Storage Attached Index for Apache Cassandra (was: Storage Attached Index for Apache Cassandra) > CEP-7 Storage Attached Index for Apache Cassandra > - > > Key: CASSANDRA-16052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16052 > Project: Cassandra > Issue Type: Epic > Components: Feature/2i Index >Reporter: ZhaoYang >Priority: Normal > > [CEP|https://docs.google.com/document/d/1V830eAMmQAspjJdjviVZIaSolVGvZ1hVsqOLWyV0DS4/edit#heading=h.67ap6rr1mxr] > - A new index implementation, called Storage > Attached Index(SAI), based on the advancement made by SASI. > * disk usage by sharing of common data between multiple column indexes on > the same table and better compression of on-disk structures. > * numeric range query performance with modified KDTree and collection type > support. > * compaction performance and stability for larger data set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16052) Storage Attached Index for Apache Cassandra
ZhaoYang created CASSANDRA-16052: Summary: Storage Attached Index for Apache Cassandra Key: CASSANDRA-16052 URL: https://issues.apache.org/jira/browse/CASSANDRA-16052 Project: Cassandra Issue Type: Epic Components: Feature/2i Index Reporter: ZhaoYang [CEP|https://docs.google.com/document/d/1V830eAMmQAspjJdjviVZIaSolVGvZ1hVsqOLWyV0DS4/edit#heading=h.67ap6rr1mxr] - A new index implementation, called Storage Attached Index(SAI), based on the advancement made by SASI. * disk usage by sharing of common data between multiple column indexes on the same table and better compression of on-disk structures. * numeric range query performance with modified KDTree and collection type support. * compaction performance and stability for larger data set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176756#comment-17176756 ] ZhaoYang commented on CASSANDRA-16036: -- [~dcapwell] sorry, won't be able to look deeper into chunk cache this week. But based on the comparison between 3.0 baseline, 4.0 chunk cache, and 4.0 no chunk cache, disabling chunk-cache didn't bridge the gap between 3.0 and 4.0. I wonder if something else is affecting the perf instead of chunk cache. do you have > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16036) Add flag to disable chunk cache and disable by default
[ https://issues.apache.org/jira/browse/CASSANDRA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176756#comment-17176756 ] ZhaoYang edited comment on CASSANDRA-16036 at 8/13/20, 4:22 AM: [~dcapwell] sorry, won't be able to look deeper into chunk cache this week. But based on the comparison between 3.0 baseline, 4.0 chunk cache, and 4.0 no chunk cache, disabling chunk-cache didn't bridge the gap between 3.0 and 4.0. I wonder if something else is affecting the perf instead of chunk cache. do you have JFR? was (Author: jasonstack): [~dcapwell] sorry, won't be able to look deeper into chunk cache this week. But based on the comparison between 3.0 baseline, 4.0 chunk cache, and 4.0 no chunk cache, disabling chunk-cache didn't bridge the gap between 3.0 and 4.0. I wonder if something else is affecting the perf instead of chunk cache. do you have > Add flag to disable chunk cache and disable by default > -- > > Key: CASSANDRA-16036 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16036 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > Attachments: clustering-in-clause_latency_selects_baseline.png, > clustering-in-clause_latency_under90_selects_baseline.png, > clustering-slice_latency_selects_baseline.png, > clustering-slice_latency_under90_selects_baseline.png, > medium-blobs_latency_selects_baseline.png, > medium-blobs_latency_under90_selects_baseline.png, > partition-single-row-read_latency_selects_baseline.png, > partition-single-row-read_latency_under90_selects_baseline.png > > > Chunk cache is enabled by default and doesn’t have a flag to disable without > impacting networking. In performance testing 4.0 against 3.0 I found that > reads were slower in 4.0 and after profiling found that the ChunkCache was > partially to blame; after disabling the chunk cache, read performance had > improved. > {code} > 40_w_cc-selects.hdr > #[Mean= 11.50063, StdDeviation = 13.44014] > #[Max =482.41254, Total count= 316477] > #[Buckets = 25, SubBuckets = 262144] > 40_wo_cc-selects.hdr > #[Mean= 9.82115, StdDeviation = 10.14270] > #[Max =522.36493, Total count= 317444] > #[Buckets = 25, SubBuckets = 262144] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: https://circleci.com/workflow-run/610e8169-e60c-420b-a556-4120967db6cb (was: https://circleci.com/workflow-run/9e2af3a1-7b63-423d-8cde-d2cd178c81d6) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412 ] ZhaoYang edited comment on CASSANDRA-15861 at 8/11/20, 10:48 AM: - bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. I think the same streaming plan id is used by different peers. It may fail to create hardlink when streaming the same sstables to different peers in the same stream plan. bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) The old approach was to synchronize entire streaming phase, so I didn't use "synchronized (tidy.global)" which may block concurrent compactions. But now only hard-link creation is synchronized, using "synchronized (tidy.global)" is better than introducing a new lock. was (Author: jasonstack): bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. +1 bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) The old approach was to synchronize entire streaming phase, so I didn't use "synchronized (tidy.global)" which may block concurrent compactions. But now only hard-link creation is synchronized, using "synchronized (tidy.global)" is better than introducing a new lock. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Stat
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412 ] ZhaoYang edited comment on CASSANDRA-15861 at 8/11/20, 9:47 AM: bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. +1 bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) The old approach was to synchronized entire streaming phase, so I didn't use "synchronized (tidy.global)" which may block concurrent compactions. But now only hard-link creation is synchronized, using "synchronized (tidy.global)" is better than introducing a new lock. was (Author: jasonstack): bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. +1 bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.muta
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412 ] ZhaoYang edited comment on CASSANDRA-15861 at 8/11/20, 9:47 AM: bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. +1 bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) The old approach was to synchronize entire streaming phase, so I didn't use "synchronized (tidy.global)" which may block concurrent compactions. But now only hard-link creation is synchronized, using "synchronized (tidy.global)" is better than introducing a new lock. was (Author: jasonstack): bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. +1 bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) The old approach was to synchronized entire streaming phase, so I didn't use "synchronized (tidy.global)" which may block concurrent compactions. But now only hard-link creation is synchronized, using "synchronized (tidy.global)" is better than introducing a new lock. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175412#comment-17175412 ] ZhaoYang commented on CASSANDRA-15861: -- bq. 1) Orphaned hard links need to be cleaned up on startup. If the hard links end with `.tmp`, they will be cleaned up on startup by {{StartupChecks#checkSystemKeyspaceState}} bq. 2) Using the streaming session id for the hard link name, instead of a time uuid, would make debugging some issues easier. +1 bq. We could leave ComponentManifest the way it was before this patch and have a separate class, let's call it ComponentContext, that embeds it. +1 bq. In this case, if you could guarantee that no more than 1 index resample can happen at once for a given sstable, the only thing you'd need to synchronize in `cloneWithNewSummarySamplingLevel` is `saveSummary`. If you did that, you could just synchronize hard link creation on `tidy.global`, instead of introducing a new lock. Agreed with caleb, no more than 1 index resample can happen concurrently for a given sstable as sstable is marked as compacting before resampling. bq. That leaves indexSummary, which perhaps we cold make volatile, and all the state used in cloneAndReplace()...but we could just extend the synchronized (tidy.global) block to include the latter. Nothing expensive happens inside cloneAndReplace(), AFAICT. good idea bq. synchronized (tidy.global) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3.
[jira] [Updated] (CASSANDRA-16044) Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or RangeAwaredCompaction
[ https://issues.apache.org/jira/browse/CASSANDRA-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16044: - Fix Version/s: 4.x > Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or > RangeAwaredCompaction > > > Key: CASSANDRA-16044 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16044 > Project: Cassandra > Issue Type: Improvement > Components: Feature/SASI >Reporter: ZhaoYang >Priority: Normal > Fix For: 4.x > > > Currently SASI searches all SSTable indexes that may include the query > partition key and indexed term, but this will cause large IO overhead with > range index query (ie. age > 18) when sstable count is huge. > Proposed improvement: query sstable indexes in token-sorted-runs lazily. When > the data in the first few token ranges is sufficient for limit, SASI can > reduce the overhead of searching sstable indexes for the remaining ranges. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16044) Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or RangeAwaredCompaction
[ https://issues.apache.org/jira/browse/CASSANDRA-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-16044: - Summary: Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or RangeAwaredCompaction (was: Query SSTable Indexes in token sorted runs for LCS and TWCS) > Query SSTable Indexes lazily in token sorted runs for LCS, TWCS or > RangeAwaredCompaction > > > Key: CASSANDRA-16044 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16044 > Project: Cassandra > Issue Type: Improvement > Components: Feature/SASI >Reporter: ZhaoYang >Priority: Normal > > Currently SASI searches all SSTable indexes that may include the query > partition key and indexed term, but this will cause large IO overhead with > range index query (ie. age > 18) when sstable count is huge. > Proposed improvement: query sstable indexes in token-sorted-runs lazily. When > the data in the first few token ranges is sufficient for limit, SASI can > reduce the overhead of searching sstable indexes for the remaining ranges. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16044) Query SSTable Indexes in token sorted runs for LCS and TWCS
ZhaoYang created CASSANDRA-16044: Summary: Query SSTable Indexes in token sorted runs for LCS and TWCS Key: CASSANDRA-16044 URL: https://issues.apache.org/jira/browse/CASSANDRA-16044 Project: Cassandra Issue Type: Improvement Components: Feature/SASI Reporter: ZhaoYang Currently SASI searches all SSTable indexes that may include the query partition key and indexed term, but this will cause large IO overhead with range index query (ie. age > 18) when sstable count is huge. Proposed improvement: query sstable indexes in token-sorted-runs lazily. When the data in the first few token ranges is sufficient for limit, SASI can reduce the overhead of searching sstable indexes for the remaining ranges. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167210#comment-17167210 ] ZhaoYang commented on CASSANDRA-15861: -- [~maedhroz] thanks for the feedback. I have squashed and pushed. There are two types of concurrent component mutations. * index summary redistribution compaction - deletes index summary and write a new one * pending repair manager's RepairFinishedCompactionTask - atomic replace old stats with new stats file (delete and rewrite on Windows). In order to avoid streaming mismatched ComponentManifest and files, now manifest will create hard links on the mutatable components and stream the hard-linked files instead of the original files which may have been modified. To prevent creating hard links on partially written index summary or stats file in Windows OS, a read lock is needed to create hard links and write lock is needed for saving index summary and stats metadata. With this approach, only saving index summary may block entire-sstable streaming but index summary redistribution is not very frequent. We can get rid of the blocking by writing index summary to a temp file and replace the old summary atomically. (Note: atomic replace doesn't work on Windows, so we have to delete first) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants i
[jira] [Commented] (CASSANDRA-15665) StreamManager should clearly differentiate between "initiator" and "receiver" sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164532#comment-17164532 ] ZhaoYang commented on CASSANDRA-15665: -- the version barrier is defined in {{MessagingService.accept_streaming}} > StreamManager should clearly differentiate between "initiator" and "receiver" > sessions > -- > > Key: CASSANDRA-15665 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15665 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta1 > > > {{StreamManager}} does currently a suboptimal job in differentiating between > stream sessions (in form of {{StreamResultFuture}}) which have been either > initiated or "received", for the following reasons: > 1) Naming is IMO confusing: a "receiver" session could actually both send and > receive files, so technically an initiator is also a receiver. > 2) {{StreamManager#findSession()}} assumes we should first looking into > "initiator" sessions, then into "receiver" ones: this is a dangerous > assumptions, in particular for test environments where the same process could > work as both an initiator and a receiver. > I would recommend the following changes: > 1) Rename "receiver" with "follower" everywhere the former is used. > 2) Introduce a new flag into {{StreamMessageHeader}} to signal if the message > comes from an initiator or follower session, in order to correctly > differentiate and look for sessions in {{StreamManager}}. > While my arguments above might seem trivial, I believe they will improve > clarity and save from potential bugs/headaches at testing time, and doing > such changes now that we're revamping streaming for 4.0 seems the right time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15665) StreamManager should clearly differentiate between "initiator" and "receiver" sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164075#comment-17164075 ] ZhaoYang edited comment on CASSANDRA-15665 at 7/24/20, 1:57 AM: [~maedhroz] does it fail anything? I think we don't allow cross-version streaming between 3.x and 4.0..It's guarded by version when establishing connections. was (Author: jasonstack): [~maedhroz] does it fail anything? I think we don't allow cross-version streaming between 3.x and 4.0.. > StreamManager should clearly differentiate between "initiator" and "receiver" > sessions > -- > > Key: CASSANDRA-15665 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15665 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta1 > > > {{StreamManager}} does currently a suboptimal job in differentiating between > stream sessions (in form of {{StreamResultFuture}}) which have been either > initiated or "received", for the following reasons: > 1) Naming is IMO confusing: a "receiver" session could actually both send and > receive files, so technically an initiator is also a receiver. > 2) {{StreamManager#findSession()}} assumes we should first looking into > "initiator" sessions, then into "receiver" ones: this is a dangerous > assumptions, in particular for test environments where the same process could > work as both an initiator and a receiver. > I would recommend the following changes: > 1) Rename "receiver" with "follower" everywhere the former is used. > 2) Introduce a new flag into {{StreamMessageHeader}} to signal if the message > comes from an initiator or follower session, in order to correctly > differentiate and look for sessions in {{StreamManager}}. > While my arguments above might seem trivial, I believe they will improve > clarity and save from potential bugs/headaches at testing time, and doing > such changes now that we're revamping streaming for 4.0 seems the right time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15665) StreamManager should clearly differentiate between "initiator" and "receiver" sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164075#comment-17164075 ] ZhaoYang commented on CASSANDRA-15665: -- [~maedhroz] does it fail anything? I think we don't allow cross-version streaming between 3.x and 4.0.. > StreamManager should clearly differentiate between "initiator" and "receiver" > sessions > -- > > Key: CASSANDRA-15665 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15665 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta1 > > > {{StreamManager}} does currently a suboptimal job in differentiating between > stream sessions (in form of {{StreamResultFuture}}) which have been either > initiated or "received", for the following reasons: > 1) Naming is IMO confusing: a "receiver" session could actually both send and > receive files, so technically an initiator is also a receiver. > 2) {{StreamManager#findSession()}} assumes we should first looking into > "initiator" sessions, then into "receiver" ones: this is a dangerous > assumptions, in particular for test environments where the same process could > work as both an initiator and a receiver. > I would recommend the following changes: > 1) Rename "receiver" with "follower" everywhere the former is used. > 2) Introduce a new flag into {{StreamMessageHeader}} to signal if the message > comes from an initiator or follower session, in order to correctly > differentiate and look for sessions in {{StreamManager}}. > While my arguments above might seem trivial, I believe they will improve > clarity and save from potential bugs/headaches at testing time, and doing > such changes now that we're revamping streaming for 4.0 seems the right time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163769#comment-17163769 ] ZhaoYang edited comment on CASSANDRA-15861 at 7/23/20, 5:29 PM: Updated the patch to load stats component into memory, so that entire-sstable streaming will not block LCS and incremental repair.. If we want to reduce the blocking time for index summary redistribution, we can consider: * writing new index summary to a temp file and replacing the old file atomically; at the beginning of streaming, open all file channel instances which still point to the old files (this is file system dependent). * writing new index summary to a temp file and replacing the old file atomically; on the streaming side, use hard link to make sure it streams the same file. WDYT? was (Author: jasonstack): Updated the patch to load stats component into memory, so that entire-sstable streaming will not block LCS and incremental repair.. If we want to reduce the blocking time for index summary redistribution, we can consider: writing new index summary to a temp file and replacing the old file atomically; at the beginning of streaming, open all file channel instances which still point to the old files (this is file system dependent). WDYT? > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-mess
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163769#comment-17163769 ] ZhaoYang commented on CASSANDRA-15861: -- Updated the patch to load stats component into memory, so that entire-sstable streaming will not block LCS and incremental repair.. If we want to reduce the blocking time for index summary redistribution, we can consider: writing new index summary to a temp file and replacing the old file atomically; at the beginning of streaming, open all file channel instances which still point to the old files (this is file system dependent). WDYT? > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and
[jira] [Updated] (CASSANDRA-15972) SASI should handle ReversedType when using "instanceof" on AbstractType
[ https://issues.apache.org/jira/browse/CASSANDRA-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15972: - Severity: Low (was: Normal) > SASI should handle ReversedType when using "instanceof" on AbstractType > --- > > Key: CASSANDRA-15972 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15972 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: ZhaoYang >Priority: Low > Fix For: 4.x > > > {code:java} > createTable("CREATE TABLE %s (pk int, ck text, v int, primary key(pk, ck)) > WITH CLUSTERING ORDER BY (ck DESC);"); > createIndex("CREATE CUSTOM INDEX ON %s (ck) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS" + > " = {'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'} "); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15972) SASI should handle ReversedType when using "instanceof" on AbstractType
[ https://issues.apache.org/jira/browse/CASSANDRA-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15972: - Description: {code:java} createTable("CREATE TABLE %s (pk int, ck text, v int, primary key(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC);"); createIndex("CREATE CUSTOM INDEX ON %s (ck) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS" + " = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'} "); {code} was: {code} createTable("CREATE TABLE %s (pk int, ck text, v int, primary key(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC);"); createIndex("CREATE CUSTOM INDEX ON %s (ck) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS" + " = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'} "); {code} > SASI should handle ReversedType when using "instanceof" on AbstractType > --- > > Key: CASSANDRA-15972 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15972 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: ZhaoYang >Priority: Normal > > {code:java} > createTable("CREATE TABLE %s (pk int, ck text, v int, primary key(pk, ck)) > WITH CLUSTERING ORDER BY (ck DESC);"); > createIndex("CREATE CUSTOM INDEX ON %s (ck) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS" + > " = {'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'} "); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15972) SASI should handle ReversedType when using "instanceof" on AbstractType
ZhaoYang created CASSANDRA-15972: Summary: SASI should handle ReversedType when using "instanceof" on AbstractType Key: CASSANDRA-15972 URL: https://issues.apache.org/jira/browse/CASSANDRA-15972 Project: Cassandra Issue Type: Bug Components: Feature/SASI Reporter: ZhaoYang {code} createTable("CREATE TABLE %s (pk int, ck text, v int, primary key(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC);"); createIndex("CREATE CUSTOM INDEX ON %s (ck) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS" + " = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'} "); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15972) SASI should handle ReversedType when using "instanceof" on AbstractType
[ https://issues.apache.org/jira/browse/CASSANDRA-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15972: - Bug Category: Parent values: Code(13163) Complexity: Normal Discovered By: Code Inspection Fix Version/s: 4.x Severity: Normal Status: Open (was: Triage Needed) > SASI should handle ReversedType when using "instanceof" on AbstractType > --- > > Key: CASSANDRA-15972 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15972 > Project: Cassandra > Issue Type: Bug > Components: Feature/SASI >Reporter: ZhaoYang >Priority: Normal > Fix For: 4.x > > > {code:java} > createTable("CREATE TABLE %s (pk int, ck text, v int, primary key(pk, ck)) > WITH CLUSTERING ORDER BY (ck DESC);"); > createIndex("CREATE CUSTOM INDEX ON %s (ck) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS" + > " = {'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'} "); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15921) 4.0 quality testing: Materialized View
[ https://issues.apache.org/jira/browse/CASSANDRA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15921: - Description: The main purpose of this ticket to get a better understanding about 4.0 MV status as a guideline for future improvements. I don't think it should block 4.0 release since it's already marked as experimental. Main areas to test: * Write perf: We expect to see [10% write throughput drop per MV added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. ** Attached C40_MV.png is alpha-4, 5-node, rf3 MV write tests: with 1 mv, throughput dropped 50% * Read perf: identical to normal table * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 * Repair: write path required * Chaos monkey: take down coordinator/base-replica/view-replica during read/write/token-movement and verify data consistency (may need a tool) * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 was: The main purpose of this ticket to get a better understanding about 4.0 MV status as a guideline for future improvements. I don't think it should block 4.0 release since it's already marked as experimental. Main areas to test: * Write perf: We expect to see [10% write throughput drop per MV added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. ** Attached C40_MV.png is alpha-4, 5-node, rf3 MV write tests. * Read perf: identical to normal table * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 * Repair: write path required * Chaos monkey: take down coordinator/base-replica/view-replica during read/write/token-movement and verify data consistency (may need a tool) * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 > 4.0 quality testing: Materialized View > -- > > Key: CASSANDRA-15921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 > Project: Cassandra > Issue Type: Task > Components: Feature/Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > Attachments: C40_MV.png > > > The main purpose of this ticket to get a better understanding about 4.0 MV > status as a guideline for future improvements. I don't think it should block > 4.0 release since it's already marked as experimental. > Main areas to test: > * Write perf: We expect to see [10% write throughput drop per MV > added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. > ** Attached C40_MV.png is alpha-4, 5-node, rf3 MV write tests: with 1 mv, > throughput dropped 50% > * Read perf: identical to normal table > * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 > * Repair: write path required > * Chaos monkey: take down coordinator/base-replica/view-replica during > read/write/token-movement and verify data consistency (may need a tool) > * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 > * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15921) 4.0 quality testing: Materialized View
[ https://issues.apache.org/jira/browse/CASSANDRA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15921: - Description: The main purpose of this ticket to get a better understanding about 4.0 MV status as a guideline for future improvements. I don't think it should block 4.0 release since it's already marked as experimental. Main areas to test: * Write perf: We expect to see [10% write throughput drop per MV added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. ** Attached C40_MV.png is alpha-4, 5-node, rf3 MV write tests. * Read perf: identical to normal table * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 * Repair: write path required * Chaos monkey: take down coordinator/base-replica/view-replica during read/write/token-movement and verify data consistency (may need a tool) * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 was: The main purpose of this ticket to get a better understanding about 4.0 MV status as a guideline for future improvements. I don't think it should block 4.0 release since it's already marked as experimental. Main areas to test: * Write perf: We expect to see [10% write throughput drop per MV added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. * Read perf: identical to normal table * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 * Repair: write path required * Chaos monkey: take down coordinator/base-replica/view-replica during read/write/token-movement and verify data consistency (may need a tool) * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 > 4.0 quality testing: Materialized View > -- > > Key: CASSANDRA-15921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 > Project: Cassandra > Issue Type: Task > Components: Feature/Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > Attachments: C40_MV.png > > > The main purpose of this ticket to get a better understanding about 4.0 MV > status as a guideline for future improvements. I don't think it should block > 4.0 release since it's already marked as experimental. > Main areas to test: > * Write perf: We expect to see [10% write throughput drop per MV > added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. > ** Attached C40_MV.png is alpha-4, 5-node, rf3 MV write tests. > * Read perf: identical to normal table > * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 > * Repair: write path required > * Chaos monkey: take down coordinator/base-replica/view-replica during > read/write/token-movement and verify data consistency (may need a tool) > * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 > * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15921) 4.0 quality testing: Materialized View
[ https://issues.apache.org/jira/browse/CASSANDRA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15921: - Attachment: C40_MV.png > 4.0 quality testing: Materialized View > -- > > Key: CASSANDRA-15921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 > Project: Cassandra > Issue Type: Task > Components: Feature/Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > Attachments: C40_MV.png > > > The main purpose of this ticket to get a better understanding about 4.0 MV > status as a guideline for future improvements. I don't think it should block > 4.0 release since it's already marked as experimental. > Main areas to test: > * Write perf: We expect to see [10% write throughput drop per MV > added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. > * Read perf: identical to normal table > * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 > * Repair: write path required > * Chaos monkey: take down coordinator/base-replica/view-replica during > read/write/token-movement and verify data consistency (may need a tool) > * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 > * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: https://circleci.com/workflow-run/9e2af3a1-7b63-423d-8cde-d2cd178c81d6 (was: https://circleci.com/workflow-run/fde45c54-e845-4040-b59e-abcdabda2b29) Status: Patch Available (was: Open) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual fi
[jira] [Commented] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160790#comment-17160790 ] ZhaoYang commented on CASSANDRA-15861: -- [~maedhroz] thanks for the suggestions. bq. (where that "completion" happens in the non-SSL case isn't 100% clear to me) The netty streaming itself is async, but {{CassandraEntireSSTableStreamWriter#write}} is actually blocking because {{AsyncStreamingOutputPlus#flush}} will wait for data being written to network. We don't need to worry about it. I ended up with sstable read/write lock approach: * During entire-sstable streaming, {{CassandraOutgoingFile}} will execute the streaming code within the sstable read-lock. So multiple streamings on the same sstable can start at the same time. I think it's fine to block stats-mutation/index-summary redistribution until streaming completion. * For stats mutation and index summary redistribution, they will perform the component mutation in the sstable write-lock. * Didn't reuse the synchronization on `tidy.global` because they are used in normal compaction tasks, so I added a separate read-write lock. bq. simplest thing might be handling the stats an index summary in slightly different ways. I feel handling stats differently may make it harder to maintain or to reason. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-m
[jira] [Updated] (CASSANDRA-15766) NoSpamLogger arguments building objects on hot paths
[ https://issues.apache.org/jira/browse/CASSANDRA-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15766: - Reviewers: ZhaoYang LGTM > NoSpamLogger arguments building objects on hot paths > > > Key: CASSANDRA-15766 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15766 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging >Reporter: Jon Meredith >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 1.5h > Remaining Estimate: 0h > > NoSpamLogger is used in hot logging paths to prevent logs being overrun. For > that to be most effective the arguments to the logger need to be cheap to > construct. During the internode messaging refactor CASSANDRA-15066, > performance changes to BufferPool for CASSANDRA-14416 > were accidentally reverted in the merge up from 3.11. > Reviewing other uses since, it looks like there are a few places where the > arguments require some form of String building. > org.apache.cassandra.net.InboundSink#accept > org.apache.cassandra.net.InboundMessageHandler#processCorruptFrame > org.apache.cassandra.net.InboundMessageHandler.LargeMessage#deserialize > org.apache.cassandra.net.OutboundConnection#onOverloaded > org.apache.cassandra.utils.memory.BufferPool.GlobalPool#allocateMoreChunks > Formatting arguments should either be precomputed, or if expensive they > should be computed after the decision on whether to noSpamLog has been made. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15861: - Test and Documentation Plan: https://circleci.com/workflow-run/fde45c54-e845-4040-b59e-abcdabda2b29 (was: https://circleci.com/workflow-run/3a2fed2c-c469-4f3f-a620-07079f0dc0db) > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3 where the sstable's STATS > component size is different from the original size recorded in the manifest. > 7. At the end, node3 reports checksum validation failure when it tries to > mutate sstable level and "isTransient" attribute in > {{CassandraEntireSSTableStreamReader#read}}. > {code} > Currently, entire-sstable-streaming requires sstable components to be > immutable, because \{{ComponentManifest}} > with component sizes are sent before sending actual files. This isn't a > problem in legacy streaming as STATS file
[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections
[ https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15908: - Reviewers: Bryn Cooke, ZhaoYang, ZhaoYang (was: Bryn Cooke, ZhaoYang) Bryn Cooke, ZhaoYang, ZhaoYang Status: Review In Progress (was: Patch Available) > Improve messaging on indexing frozen collections > > > Key: CASSANDRA-15908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15908 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Rocco Varela >Assignee: Rocco Varela >Priority: Low > Fix For: 4.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When attempting to create an index on a frozen collection the error message > produced can be improved to provide more detail about the problem and > possible workarounds. Currently, a user will receive a message indicating > "...Frozen collections only support full() indexes" which is not immediately > clear for users new to Cassandra indexing and datatype compatibility. > Here is an example: > {code:java} > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses > frozen> ); > cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses); > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > create values() index on frozen column addresses. Frozen collections only > support full() indexes"{code} > > I'm proposing possibly enhancing the messaging to something like this. > {quote}Cannot create values() index on frozen column addresses. Frozen > collections only support indexes on the entire data structure due to > immutability constraints of being frozen, wrap your frozen column with the > full() target type to index properly. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections
[ https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15908: - Test and Documentation Plan: circle ci tests Status: Patch Available (was: Open) > Improve messaging on indexing frozen collections > > > Key: CASSANDRA-15908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15908 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Rocco Varela >Assignee: Rocco Varela >Priority: Low > Fix For: 4.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When attempting to create an index on a frozen collection the error message > produced can be improved to provide more detail about the problem and > possible workarounds. Currently, a user will receive a message indicating > "...Frozen collections only support full() indexes" which is not immediately > clear for users new to Cassandra indexing and datatype compatibility. > Here is an example: > {code:java} > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses > frozen> ); > cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses); > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > create values() index on frozen column addresses. Frozen collections only > support full() indexes"{code} > > I'm proposing possibly enhancing the messaging to something like this. > {quote}Cannot create values() index on frozen column addresses. Frozen > collections only support indexes on the entire data structure due to > immutability constraints of being frozen, wrap your frozen column with the > full() target type to index properly. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15859) Avoid per-host hinted-handoff throttle being rounded to 0 in large cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155307#comment-17155307 ] ZhaoYang commented on CASSANDRA-15859: -- |patch|circle| | [trunk|https://github.com/apache/cassandra/pull/616/files] | [ci|https://circleci.com/workflow-run/63f19a49-568a-4350-b368-9c33eeaa17de] | | [3.11|https://github.com/apache/cassandra/pull/674/files] | [ci|https://circleci.com/workflow-run/f18b7afa-36c7-4d7b-a5a3-9792528cc963] | | [3.0|https://github.com/apache/cassandra/pull/673/files] | [ci|https://circleci.com/workflow-run/e2b22eef-f0b2-4752-a4b6-f1b5766e170c] | ported to 3.0 and 3.11.. > Avoid per-host hinted-handoff throttle being rounded to 0 in large cluster > -- > > Key: CASSANDRA-15859 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15859 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > When "hinted_handoff_throttle_in_kb" is sufficiently small or num of nodes in > the cluster is sufficiently large, the per-host throttle will be rounded to > 0, aka. unthrottled. > > {code:java|title=HintsDispatchExecutor.java} > int throttleInKB = DatabaseDescriptor.getHintedHandoffThrottleInKB() / > nodesCount; > this.rateLimiter = RateLimiter.create(throttleInKB == 0 ? Double.MAX_VALUE : > throttleInKB * 1024); > {code} > [trunk-patch|https://github.com/apache/cassandra/pull/616] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10307) Avoid always locking the partition key when a table has a materialized view
[ https://issues.apache.org/jira/browse/CASSANDRA-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154367#comment-17154367 ] ZhaoYang commented on CASSANDRA-10307: -- the lock contention issue will not be a problem in thread-per-core architecture, but the lock is still needed to prevent racing with the previous insertion that is waiting for async io from read-before-write. > Avoid always locking the partition key when a table has a materialized view > --- > > Key: CASSANDRA-10307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10307 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Materialized Views >Reporter: T Jake Luciani >Priority: Normal > Labels: materializedviews > Fix For: 4.x > > > When a table has associated materialized views we must restrict other > concurrent changes to the affected rows. We currently lock the entire > partition. > The issue is many updates to the same partition on the base table is now > serialized effectively. > We can't lock the primary key instead due to range tombstones cover a range > of rows. > If we created (or perhaps reuse if already exists) a clustering range class > we can lock at this level. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152771#comment-17152771 ] ZhaoYang commented on CASSANDRA-15900: -- thanks for the review > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15921) 4.0 quality testing: Materialized View
[ https://issues.apache.org/jira/browse/CASSANDRA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15921: - Change Category: Quality Assurance Complexity: Normal Status: Open (was: Triage Needed) > 4.0 quality testing: Materialized View > -- > > Key: CASSANDRA-15921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 > Project: Cassandra > Issue Type: Task > Components: Feature/Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > The main purpose of this ticket to get a better understanding about 4.0 MV > status as a guideline for future improvements. I don't think it should block > 4.0 release since it's already marked as experimental. > Main areas to test: > * Write perf: We expect to see [10% write throughput drop per MV > added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. > * Read perf: identical to normal table > * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 > * Repair: write path required > * Chaos monkey: take down coordinator/base-replica/view-replica during > read/write/token-movement and verify data consistency (may need a tool) > * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 > * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15921) 4.0 quality testing: Materialized View
[ https://issues.apache.org/jira/browse/CASSANDRA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15921: - Issue Type: Task (was: Improvement) > 4.0 quality testing: Materialized View > -- > > Key: CASSANDRA-15921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 > Project: Cassandra > Issue Type: Task > Components: Feature/Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > The main purpose of this ticket to get a better understanding about 4.0 MV > status as a guideline for future improvements. I don't think it should block > 4.0 release since it's already marked as experimental. > Main areas to test: > * Write perf: We expect to see [10% write throughput drop per MV > added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. > * Read perf: identical to normal table > * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 > * Repair: write path required > * Chaos monkey: take down coordinator/base-replica/view-replica during > read/write/token-movement and verify data consistency (may need a tool) > * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 > * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151008#comment-17151008 ] ZhaoYang commented on CASSANDRA-15900: -- both SimpleReadWriteTest and ImportTest passed locally with JDK11, I don't think they use streaming. > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15921) 4.0 quality testing: Materialized View
[ https://issues.apache.org/jira/browse/CASSANDRA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15921: - Fix Version/s: 4.x > 4.0 quality testing: Materialized View > -- > > Key: CASSANDRA-15921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > The main purpose of this ticket to get a better understanding about 4.0 MV > status as a guideline for future improvements. I don't think it should block > 4.0 release since it's already marked as experimental. > Main areas to test: > * Write perf: We expect to see [10% write throughput drop per MV > added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. > * Read perf: identical to normal table > * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 > * Repair: write path required > * Chaos monkey: take down coordinator/base-replica/view-replica during > read/write/token-movement and verify data consistency (may need a tool) > * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 > * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15921) 4.0 quality testing: Materialized View
ZhaoYang created CASSANDRA-15921: Summary: 4.0 quality testing: Materialized View Key: CASSANDRA-15921 URL: https://issues.apache.org/jira/browse/CASSANDRA-15921 Project: Cassandra Issue Type: Improvement Components: Feature/Materialized Views Reporter: ZhaoYang Assignee: ZhaoYang The main purpose of this ticket to get a better understanding about 4.0 MV status as a guideline for future improvements. I don't think it should block 4.0 release since it's already marked as experimental. Main areas to test: * Write perf: We expect to see [10% write throughput drop per MV added|https://www.datastax.com/blog/2016/05/materialized-view-performance-cassandra-3x]. * Read perf: identical to normal table * Bootstrap/Decommision: no write-path required since CASSANDRA-13065 * Repair: write path required * Chaos monkey: take down coordinator/base-replica/view-replica during read/write/token-movement and verify data consistency (may need a tool) * Hint Replay: able to throttle if table has MV - CASSANDRA-13810 * Schema race: create/drop - CASSANDRA-15845/CASSANDRA-15918 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15918) materialized view rebuild automatically after drop multiple views
[ https://issues.apache.org/jira/browse/CASSANDRA-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15918: - Component/s: Feature/Materialized Views > materialized view rebuild automatically after drop multiple views > - > > Key: CASSANDRA-15918 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15918 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema, Consistency/Repair, Feature/Materialized > Views >Reporter: chonghao li >Priority: Normal > > Background: > Cassandra version: 3.0.12 > Our cassandra cluster has 9 host for DC1 and 3 host for DC2, > each host : > ||node||memory||disk|| > |DC1 1|256 GB|5*788GB| > |DC1 2|256 GB|5*788GB| > |DC1 3|256 GB|5*788GB| > |DC1 4|256 GB|5*788GB| > |DC1 5|256 GB|5*788GB| > |DC1 6|256 GB|5*788GB| > |DC1 7|512 GB|5 TB| > |DC1 8|512 GB|5 TB| > |DC1 9|512 GB|5 TB| > |DC2 1|256 GB|8*788GB| > |DC2 2|256 GB|8*788GB| > |DC2 3|256 GB|8*788GB| > by using nodetool status, node load in DC1 is about 1.5 TB, node load in DC2 > is about 4 TB > QPS: 270 > - > Problem we met: > In DC1 1 node, enter the cql command line and execute command like following > in sametime: > "drop materialized view if exists view1; > drop materialized view if exists view2; > drop materialized view if exists view3; > drop materialized view if exists view4;" > after a while, command line display warning like "schema version mismatch > detected..." (sorry we cannot find the exact output for that time) > After that we find view files in node: "DC1 7" hasn't be deleted yet. > at this moment, we find performance of cluster drop sharp, the cluster almost > stop response to any request. > by runing: select * from system.views_builds_in_progress; > we can see several views were building. > then we execte: > 1, nodetool stop VIEW_BUILD in each node > 2, in cql: delete from system.views_builds_in_progress where view_name= > 3, rolling restart cassandra nodes > > about an hours later, performance increase to normal. > -- > Why this happen? > How to avoid this problem? > Any better way to deal with this problem? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15913) Avoid "ALLOW FILTERING" requirement for multiple restricted columns if index can handle them
ZhaoYang created CASSANDRA-15913: Summary: Avoid "ALLOW FILTERING" requirement for multiple restricted columns if index can handle them Key: CASSANDRA-15913 URL: https://issues.apache.org/jira/browse/CASSANDRA-15913 Project: Cassandra Issue Type: Improvement Components: Feature/SASI Reporter: ZhaoYang When executing following query, "ALLOW FILTERING" is required even if both columns are indexed by SASI and used in {{QueryPlan}}: bq. SELECT * FROM table WHERE age="20" and address="SF" We should consider providing a proper {{"QueryPlan"}} under {{"Index"}} interface to avoid "ALLOW FILTERING" when all restricted columns are handled by index. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149071#comment-17149071 ] ZhaoYang edited comment on CASSANDRA-15900 at 7/1/20, 7:39 AM: --- rebased and submit another round of ci: [j8|https://circleci.com/workflow-run/cdf55335-c876-450b-8bf9-1d778a2df806] and [j11|https://circleci.com/workflow-run/2080f225-f689-4243-ad67-288bef608640] bq. test_restart_node_localhost - pushed_notifications_test.TestPushedNotifications should have been addressed by CASSANDRA-15677 a few days ago. it's failing after rebase... bq. J11 - readRepairTest - org.apache.cassandra.distributed.test.SimpleReadWriteTest bq. J11 - testImportCorrupt - org.apache.cassandra.db.ImportTest doesn't seem to be related. was (Author: jasonstack): rebased and submit another round of ci: [j8|https://circleci.com/workflow-run/cdf55335-c876-450b-8bf9-1d778a2df806] and [j11|https://circleci.com/workflow-run/2080f225-f689-4243-ad67-288bef608640] > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149071#comment-17149071 ] ZhaoYang commented on CASSANDRA-15900: -- rebased and submit another round of ci: [j8|https://circleci.com/workflow-run/cdf55335-c876-450b-8bf9-1d778a2df806] and [j11|https://circleci.com/workflow-run/2080f225-f689-4243-ad67-288bef608640] > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148401#comment-17148401 ] ZhaoYang commented on CASSANDRA-15907: -- {quote}If the number of stale results is very large (i.e. a "silent" replica exists in the vast majority of responses), won't those two approaches result in about the same performance profile? {quote} the second approach will execute RFP requests in two places: # at the beginning of 2nd phase, based on the collected outdated rows from 1st phase. These RFP requests can run in parallel and the number can be large. # at merge-listener, for additional rows requested by SRP. These RFP requests have to run in serial, but the number is usually small. > Operational Improvements & Hardening for Replica Filtering Protection > - > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: 2i, memory > Fix For: 4.0-beta > > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.merge()}} and remove the custom {{stats()}} method. > * {{DataResolver:111 & 228}} - Cache an instance of > {{UnaryOperator#identity()}} instead of creating one on the fly. > * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather > rather than serially querying every row that needs to be completed. This > isn't a clear win perhaps, given it targets the latency of single queries and > adds some complexity. (Certainly a decent candidate to kick even out of this > issue.) > *Documentation and Intelligibility* > * There are a few places (CHANGES.txt, tracing output in > {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side > filtering protection" (which makes it seem like the coordinator doesn't > filter) rather than "replica filtering protection" (which sounds more like > what we actually do, which is protect ourselves against incorrect replica > filtering results). It's a minor fix, but would avoid confusion. > * The method call chain in {{DataResolver}} might be a bit simpler if we put > the {{repairedDataTracker}} in {{ResolveContext}}. > *Guardrails* > * As it stands, we don't have a way to enforce an upper bound on the memory > usage of {{ReplicaFilteringProtection}} which caches row responses from the > first round of requests. (Remember, these are later used to merged with the > second round of results to complete the data for filtering.) Operators will > likely need a way to protect themselves, i.e. simply fail queries if they hit > a particular threshold rather than GC nodes into oblivion. (Having control > over limits and page sizes doesn't quite get us there, because stale results > _expand_ the number of incomplete results we must cache.) The fun question is > how we do this, with the primary axes being scope (per-query, global, etc.) > and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). > My starting disposition on the right trade-off between > performance/complexity and accuracy is having something along the lines of > cached rows per query. Prior art suggests this probably makes sense alongside > things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15907) Operational Improvements & Hardening for Replica Filtering Protection
[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148356#comment-17148356 ] ZhaoYang commented on CASSANDRA-15907: -- As discussed with caleb, the memory issue is that potentially outdated rows in the 1st phase of replica-filtering-protection(RFP) do not count towards merged counter, so short-read-protect(SRP) can potentially query and cache all data in the query range if only one replica has data. Some ideas to cap memory usage during RFP: * Single phase approach: ** Issue blocking RFP read immediately at {{MergeListener#onMergedRows}} when detecting potential outdated rows. ** This guarantees coordinator will cache at most "limit * replicas" num of rows assuming there are no tombstone.. ** This should have similar performance as current 2-phase approach, but current approach can be optimized to execute RFP reads in parallel. * two-phase approach with SRP only at 2nd phase: ** the 1st phase is almost the same as current approach: collecting potentially outdated rows, but without SRP. ** in the second phase, issue RFP reads in parallel based on collected rows in 1st phase. *** When parallel RFP reads complete, merge the responses (original + RFP) again using the merger described in previous approach, but only do blocking RFP for rows requested by SRP. ** With this approach, the amount of memory used is the same as single-phase approach. The num of blocking RFP reads from SRP rows are usually small. > Operational Improvements & Hardening for Replica Filtering Protection > - > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: 2i, memory > Fix For: 4.0-beta > > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.merge()}} and remove the custom {{stats()}} method. > * {{DataResolver:111 & 228}} - Cache an instance of > {{UnaryOperator#identity()}} instead of creating one on the fly. > * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather > rather than serially querying every row that needs to be completed. This > isn't a clear win perhaps, given it targets the latency of single queries and > adds some complexity. (Certainly a decent candidate to kick even out of this > issue.) > *Documentation and Intelligibility* > * There are a few places (CHANGES.txt, tracing output in > {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side > filtering protection" (which makes it seem like the coordinator doesn't > filter) rather than "replica filtering protection" (which sounds more like > what we actually do, which is protect ourselves against incorrect replica > filtering results). It's a minor fix, but would avoid confusion. > * The method call chain in {{DataResolver}} might be a bit simpler if we put > the {{repairedDataTracker}} in {{ResolveContext}}. > *Guardrails* > * As it stands, we don't have a way to enforce an upper bound on the memory > usage of {{ReplicaFilteringProtection}} which caches row responses from the > first round of requests. (Remember, these are later used to merged with the > second round of results to complete the data for filtering.) Operators will > likely need a way to protect themselves, i.e. simply fail queries if they hit > a particular threshold rather than GC nodes into oblivion. (Having control > over limits and page sizes doesn't quite get us there, because stale results > _expand_ the number of incomplete results we must cache.) The fun question is > how we do this, with the primary axes being scope (per-query, global, etc.) > and granularity (per-partition, per-row, per-cell, actual heap usage,
[jira] [Assigned] (CASSANDRA-15866) stream sstable attached index files entirely with data file
[ https://issues.apache.org/jira/browse/CASSANDRA-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang reassigned CASSANDRA-15866: Assignee: (was: ZhaoYang) > stream sstable attached index files entirely with data file > --- > > Key: CASSANDRA-15866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15866 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Streaming >Reporter: ZhaoYang >Priority: Normal > > When sstable is streamed entirely, there is no need to rebuild sstable > attached index on receiver if index files can be streamed entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15900: - Test and Documentation Plan: [https://circleci.com/workflow-run/ba9f4692-da21-44e9-ac31-fe8d2e6215cb] (was: [https://circleci.com/workflow-run/8d266871-2d78-4c67-80ec-3e817187af0c]) > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15900: - Status: Review In Progress (was: Changes Suggested) > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145693#comment-17145693 ] ZhaoYang commented on CASSANDRA-15900: -- bq. It might be worthwhile to have a test in AsyncStreamingOutputPlusTest that verifies AsyncStreamingOutputPlus#writeFileToChannel() closes the provided channel. +1 bq. AsyncStreamingOutputPlus#writeFileToChannel(FileChannel, StreamRateLimiter, int) and AsyncStreamingOutputPlus#writeFileToChannelZeroCopy() may be better off at private visibility, given we're treating them as transport-level implementation details. (Perhaps writeFileToChannel would be easier to test at package-private though.) I left them as public and marked "@VisibleForTesting".. bq. The JavaDoc for writeFileToChannel(FileChannel, StreamRateLimiter) is slightly out-of date now, given we've lowered the batch size for the SSL case. (We should make sure to preserve the bit about the method taking ownership of the FileChannel.) +1 > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15903) Doc update: stream-entire-sstable supports all compaction strategies and internode encryption
ZhaoYang created CASSANDRA-15903: Summary: Doc update: stream-entire-sstable supports all compaction strategies and internode encryption Key: CASSANDRA-15903 URL: https://issues.apache.org/jira/browse/CASSANDRA-15903 Project: Cassandra Issue Type: Task Reporter: ZhaoYang As [~mck2] point out, doc needs to be updated for CASSANDRA-15657 and CASSANDRA-15740. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143778#comment-17143778 ] ZhaoYang edited comment on CASSANDRA-15900 at 6/24/20, 4:58 PM: [~djoshi] do you mind reviewing and checking it on apache ci? was (Author: jasonstack): [~djoshi] do you mind reviewing? > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15900: - Test and Documentation Plan: [https://circleci.com/workflow-run/8d266871-2d78-4c67-80ec-3e817187af0c] (was: [https://circleci.com/workflow-run/48b5c613-f3a5-485f-ad0e-8362fddea5d8]) > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15900: - Description: CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk file into user-space off-heap buffer when SSL is enabled, because netty doesn't support zero-copy with SSL. But there are two issues: # file channel is not closed. # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, thus it's all allocated outside the pool and will cause large amount of allocations. [Patch|https://github.com/apache/cassandra/pull/651]: # close file channel when the last batch is loaded into off-heap bytebuffer. I don't think we need to wait until buffer is flushed by netty. # reduce the batch to 64kb which is more buffer pool friendly when streaming entire sstable with SSL. was: CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk file into user-space off-heap buffer when SSL is enabled, because netty doesn't support zero-copy with SSL. But there are two issues: # file channel is not closed. # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, thus it's all allocated outside the pool and will cause large amount of allocations. [Patch|https://github.com/apache/cassandra/pull/651]: # close file channel when the last batch is loaded into off-heap bytebuffer. # reduce the batch to 64kb which is more buffer pool friendly when streaming entire sstable with SSL. > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, > thus it's all allocated outside the pool and will cause large amount of > allocations. > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > I don't think we need to wait until buffer is flushed by netty. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15900: - Test and Documentation Plan: [https://circleci.com/workflow-run/48b5c613-f3a5-485f-ad0e-8362fddea5d8] Status: Patch Available (was: Open) [~djoshi] do you mind reviewing? > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, thus > it's all allocated outside the pool and will cause large amount of > allocations. > > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
[ https://issues.apache.org/jira/browse/CASSANDRA-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15900: - Bug Category: Parent values: Degradation(12984)Level 1 values: Resource Management(12995) Complexity: Normal Discovered By: Code Inspection Fix Version/s: 4.0-beta Severity: Normal Status: Open (was: Triage Needed) > Close channel and reduce buffer allocation during entire sstable streaming > with SSL > --- > > Key: CASSANDRA-15900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk > file into user-space off-heap buffer when SSL is enabled, because netty > doesn't support zero-copy with SSL. > But there are two issues: > # file channel is not closed. > # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, thus > it's all allocated outside the pool and will cause large amount of > allocations. > > [Patch|https://github.com/apache/cassandra/pull/651]: > # close file channel when the last batch is loaded into off-heap bytebuffer. > # reduce the batch to 64kb which is more buffer pool friendly when streaming > entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15900) Close channel and reduce buffer allocation during entire sstable streaming with SSL
ZhaoYang created CASSANDRA-15900: Summary: Close channel and reduce buffer allocation during entire sstable streaming with SSL Key: CASSANDRA-15900 URL: https://issues.apache.org/jira/browse/CASSANDRA-15900 Project: Cassandra Issue Type: Bug Components: Legacy/Streaming and Messaging Reporter: ZhaoYang Assignee: ZhaoYang CASSANDRA-15740 added the ability to stream entire sstable by loading on-disk file into user-space off-heap buffer when SSL is enabled, because netty doesn't support zero-copy with SSL. But there are two issues: # file channel is not closed. # 1mb batch size is used. 1mb exceeds buffer pool's max allocation size, thus it's all allocated outside the pool and will cause large amount of allocations. [Patch|https://github.com/apache/cassandra/pull/651]: # close file channel when the last batch is loaded into off-heap bytebuffer. # reduce the batch to 64kb which is more buffer pool friendly when streaming entire sstable with SSL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14754) Add verification of state machine in StreamSession
[ https://issues.apache.org/jira/browse/CASSANDRA-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang reassigned CASSANDRA-14754: Assignee: (was: ZhaoYang) > Add verification of state machine in StreamSession > -- > > Key: CASSANDRA-14754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14754 > Project: Cassandra > Issue Type: Task > Components: Legacy/Streaming and Messaging >Reporter: Jason Brown >Priority: Normal > Fix For: 4.x > > > {{StreamSession}} contains an implicit state machine, but we have no > verification of the safety of the transitions between states. For example, we > have no checks to ensure we cannot leave the final states (COMPLETED, FAILED). > I propose we add some program logic in {{StreamSession}}, tests, and > documentation to ensure the correctness of the state transitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-15299: - Reviewers: Alex Petrov (was: Alex Petrov, ZhaoYang) > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > --- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client >Reporter: Aleksey Yeschenko >Assignee: Sam Tunnicliffe >Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > > CASSANDRA-13304 made an important improvement to our native protocol: it > introduced checksumming/CRC32 to request and response bodies. It’s an > important step forward, but it doesn’t cover the entire stream. In > particular, the message header is not covered by a checksum or a crc, which > poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren’t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of > computing the CRC32 on the bytes written on the wire - losing the properties > of the CRC32. In some cases, due to this sequencing, attempting to decompress > a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte order, > also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for > explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since > *way* before CASSANDRA-13304. Importantly, both checksumming and compression > operate on individual message bodies rather than frames of multiple complete > messages. In reality, this has several important additional downsides. To > name a couple: > # For compression, we are getting poor compression ratios for smaller > messages - when operating on tiny sequences of bytes. In reality, for most > small requests and responses we are discarding the compressed value as it’d > be smaller than the uncompressed one - incurring both redundant allocations > and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messages. > 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the > checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we > should switch to a framing protocol with multiple messages in a single frame. > I suggest we reuse the framing protocol recently implemented for internode > messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, > and that we do it before native protocol v5 graduates from beta. See > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java > and > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15861) Mutating sstable component may race with entire-sstable-streaming(ZCS) causing checksum validation failure
[ https://issues.apache.org/jira/browse/CASSANDRA-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143096#comment-17143096 ] ZhaoYang edited comment on CASSANDRA-15861 at 6/24/20, 4:44 AM: {quote}Don't we write to a tmp file then do a atomic move and replace? So would we need to worry about a partial file? {quote} For index summary, it deletes first. (I believe the reason for deletion is that index summary file can be large, up to 2GB. It'd be nice to release the old file earlier if it's not used) Of course, we can change it to use temp file.. was (Author: jasonstack): {quote}Don't we write to a tmp file then do a atomic move and replace? So would we need to worry about a partial file? {quote} For index summary, it deletes first. Of course, we can change it to use temp file.. > Mutating sstable component may race with entire-sstable-streaming(ZCS) > causing checksum validation failure > -- > > Key: CASSANDRA-15861 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15861 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming, > Local/Compaction >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0-beta > > > Flaky dtest: [test_dead_sync_initiator - > repair_tests.repair_test.TestRepair|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/143/testReport/junit/dtest.repair_tests.repair_test/TestRepair/test_dead_sync_initiator/] > {code:java|title=stacktrace} > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [Stream-Deserializer-127.0.0.1:7000-570871f3] 2020-06-03 04:05:19,081 > CassandraEntireSSTableStreamReader.java:145 - [Stream > 6f1c3360-a54f-11ea-a808-2f23710fdc90] Error while reading sstable from stream > for table = keyspace1.standard1 > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.maybeValidateChecksum(MetadataSerializer.java:219) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:198) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:129) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutate(MetadataSerializer.java:226) > at > org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:140) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:78) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:49) > at > org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:49) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:181) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Checksums do not match for > /home/cassandra/cassandra/cassandra-dtest/tmp/dtest-te4ty0r9/test/node3/data0/keyspace1/standard1-5f5ab140a54f11eaa8082f23710fdc90/na-2-big-Statistics.db > {code} > > In the above test, it executes "nodetool repair" on node1 and kills node2 > during repair. At the end, node3 reports checksum validation failure on > sstable transferred from node1. > {code:java|title=what happened} > 1. When repair started on node1, it performs anti-compaction which modifies > sstable's repairAt to 0 and pending repair id to session-id. > 2. Then node1 creates {{ComponentManifest}} which contains file lengths to be > transferred to node3. > 3. Before node1 actually sends the files to node3, node2 is killed and node1 > starts to broadcast repair-failure-message to all participants in > {{CoordinatorSession#fail}} > 4. Node1 receives its own repair-failure-message and fails its local repair > sessions at {{LocalSessions#failSession}} which triggers async background > compaction. > 5. Node1's background compaction will mutate sstable's repairAt to 0 and > pending repair id to null via > {{PendingRepairManager#getNextRepairFinishedTask}}, as there is no more > in-progress repair. > 6. Node1 actually sends the sstable to node3