[jira] [Updated] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-17456: - Fix Version/s: 4.1 (was: 4.x) > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.1 > > Time Spent: 1h > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-17456: - Status: Ready to Commit (was: Review In Progress) > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 1h > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-17456: - Status: Review In Progress (was: Needs Committer) > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 1h > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-17456: - Since Version: 4.1 Source Control Link: Committed as 9f3bc657273dfa9e20d233636adf662904f01f34 to 4.1 and 11bdf1bf8038fa7f872fe9161a0568d023e6cfac to trunk. Resolution: Fixed Status: Resolved (was: Ready to Commit) > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 1h > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533708#comment-17533708 ] Aleksandr Sorokoumov commented on CASSANDRA-17456: -- Committed as [9f3bc657273dfa9e20d233636adf662904f01f34|https://github.com/apache/cassandra/commit/9f3bc657273dfa9e20d233636adf662904f01f34] to 4.1 and [11bdf1bf8038fa7f872fe9161a0568d023e6cfac|https://github.com/apache/cassandra/commit/11bdf1bf8038fa7f872fe9161a0568d023e6cfac] to trunk. > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 50m > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532703#comment-17532703 ] Aleksandr Sorokoumov commented on CASSANDRA-17456: -- CI looks good. I am moving the issue to ready to commit. > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 50m > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-17456: - Status: Needs Committer (was: Patch Available) > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 50m > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530386#comment-17530386 ] Aleksandr Sorokoumov commented on CASSANDRA-17456: -- I've put the size validation back to CommitLog#add and added the NEWS entry. > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 50m > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529899#comment-17529899 ] Aleksandr Sorokoumov commented on CASSANDRA-17456: -- Looking at the CommitLog code, it will allocate a new segment if the mutation does not fit in the current segment ([link|https://github.com/apache/cassandra/blob/7ce140bd1dea311b9f98cdfbcd07dcff9fbd457c/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerStandard.java#L52-L57]). This effectively gives us the desired behavior as long as a single mutation fits in a segment or am I missing something? I can think of one corner case when a mutation is larger than a segment. I can add an assertion for that, just to be on the safe side. > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > Time Spent: 50m > Remaining Estimate: 0h > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-17456: - Test and Documentation Plan: I made the existing dtest applicable to C* versions until 4.0.x and added an in-jvm dtest to cover rejection of oversized mutations on insert. Status: Patch Available (was: In Progress) As Benedict suggested, I moved the mutation size check from CommitLog to the client and internode connections. Patches: * [17456-trunk|https://github.com/apache/cassandra/compare/trunk...Ge:17456-trunk?expand=1] * [dtest|https://github.com/apache/cassandra-dtest/pull/186] [Jenkins CI run|https://ci-cassandra.apache.org/job/Cassandra-devbranch/1626/#showFailuresLink] > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-17456) Test Failures: write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation
[ https://issues.apache.org/jira/browse/CASSANDRA-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov reassigned CASSANDRA-17456: Assignee: Aleksandr Sorokoumov (was: Josh McKenzie) > Test Failures: > write_failures_test.TestMultiDCWriteFailures.test_oversized_mutation > --- > > Key: CASSANDRA-17456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17456 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > https://ci-cassandra.apache.org/job/Cassandra-trunk/1002/testReport/dtest-offheap.write_failures_test/TestMultiDCWriteFailures/test_oversized_mutation/ > {code:java} > Error Message > AssertionError: assert 0 == 8 + where 8 = JolokiaAgent.read_attribute of 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') +where > = > .read_attribute + > and 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > Stacktrace > self = > def test_oversized_mutation(self): > """ > Test that multi-DC write failures return operation failed rather > than a timeout. > @jira_ticket CASSANDRA-16334. > """ > > cluster = self.cluster > cluster.populate([2, 2]) > cluster.set_configuration_options(values={'max_mutation_size_in_kb': > 128}) > cluster.start() > > node1 = cluster.nodelist()[0] > session = self.patient_exclusive_cql_connection(node1) > > session.execute("CREATE KEYSPACE k WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2}") > session.execute("CREATE TABLE k.t (key int PRIMARY KEY, val blob)") > > payload = '1' * 1024 * 256 > query = "INSERT INTO k.t (key, val) VALUES (1, > textAsBlob('{}'))".format(payload) > > assert_write_failure(session, query, ConsistencyLevel.LOCAL_ONE) > assert_write_failure(session, query, ConsistencyLevel.ONE) > > # verify that no hints are created > with JolokiaAgent(node1) as jmx: > > assert 0 == jmx.read_attribute(make_mbean('metrics', > > type='Storage', name='TotalHints'), 'Count') > E AssertionError: assert 0 == 8 > E+ where 8 = 0x7f1fca78dac0>>('org.apache.cassandra.metrics:type=Storage,name=TotalHints', > 'Count') > E+where > = > .read_attribute > E+and > 'org.apache.cassandra.metrics:type=Storage,name=TotalHints' = > make_mbean('metrics', type='Storage', name='TotalHints') > write_failures_test.py:277: AssertionError > REST API > CloudBees CI Client Controller 2.319.3.4-rolling > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494218#comment-17494218 ] Aleksandr Sorokoumov edited comment on CASSANDRA-16349 at 2/17/22, 8:43 PM: I've rebased the patch and the dtest. [~blerer] can you please review? Links: * [patch|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-sstableloader-4.0?expand=1] * [dtest|https://github.com/apache/cassandra-dtest/pull/151] was (Author: ge): I've rebased the patch and the dtest. [~blerer] can you please review? > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at >
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494218#comment-17494218 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- I've rebased the patch and the dtest. [~blerer] can you please review? > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at >
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469088#comment-17469088 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- [~e.dimitrova] Do you have spare cycles to review this patch? > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at >
[jira] [Updated] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-15215: - Reviewers: Benedict Elliott Smith, Branimir Lambov (was: Benedict Elliott Smith) > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468706#comment-17468706 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- I added [~blambov] as a reviewer as he approved the PR to trunk. [~benedict] is there anything I can do to facilitate the merge? > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459045#comment-17459045 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- No worries, CQLConnectionTest failures indeed looked suspicious. I agree with your commit and am looking forward to green CI and merge :) > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457638#comment-17457638 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- I looked into the most recent test failures and I am fairly convinced that none of them are caused by this patch: * [CQLConnectionTest test failures |https://app.circleci.com/pipelines/github/belliottsmith/cassandra/216/workflows/9b2ff75d-d2fd-47ad-a4d6-a407a649780c/jobs/5659/tests#failed-test-2] - there are recent bug reports regarding this test suite failing in various ways - CASSANDRA-16677 as an "aggregate issue" and a number of linked duplicates. One example is [this build|https://app.circleci.com/pipelines/github/dcapwell/cassandra/1037/workflows/c728d370-49b9-41aa-bdfb-8c41cf0355d8/jobs/6577/tests] from CASSANDRA-16949 that has exactly the same failures. * [TestClientRequestMetrics|https://app.circleci.com/pipelines/github/belliottsmith/cassandra/216/workflows/418d4b46-8d8b-41df-ad80-06f377593caf/jobs/5646/tests#failed-test-0] - was also observed in https://issues.apache.org/jira/browse/CASSANDRA-15234?focusedCommentId=17454221=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17454221. * [MessagingServiceTest |https://app.circleci.com/pipelines/github/belliottsmith/cassandra/216/workflows/418d4b46-8d8b-41df-ad80-06f377593caf/jobs/5637] - CASSANDRA-17033. > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454582#comment-17454582 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- I'll fix the last test failures on the weekend. Enjoy your holiday :) > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453616#comment-17453616 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- The issue was caused by the slow path in {{BufferedDataOutputStreamPlus#writeBytes}} when the underlying buffer has less than 8 bytes remaining. Previously, this method fell back to {{writeSlow}}. This was not correct because it writes N least significant bytes to the wire. As {{writeBytes}} treats the register as an optimized version of a byte array, it should write N most significant bytes instead. I added a test case that isolates the issue and fixed it in all branches. [~benedict] can you please re-run the CI? > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17169) Flaky RecomputingSupplierTest
[ https://issues.apache.org/jira/browse/CASSANDRA-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452291#comment-17452291 ] Aleksandr Sorokoumov commented on CASSANDRA-17169: -- The patch makes sense to me and seems to fix the test. +1 to merge it. > Flaky RecomputingSupplierTest > - > > Key: CASSANDRA-17169 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17169 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0.x, 4.x > > > See > https://ci-cassandra.apache.org/job/Cassandra-4.0/293/testReport/junit/org.apache.cassandra.utils/RecomputingSupplierTest/recomputingSupplierTest/ > {noformat} > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) > at > org.apache.cassandra.utils.RecomputingSupplier.get(RecomputingSupplier.java:110) > at > org.apache.cassandra.utils.RecomputingSupplierTest.recomputingSupplierTest(RecomputingSupplierTest.java:120) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451023#comment-17451023 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- I'll fix test failures closer to the end of the week, probably on the weekend. > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17142) Limit the maximum hints size per host
[ https://issues.apache.org/jira/browse/CASSANDRA-17142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449856#comment-17449856 ] Aleksandr Sorokoumov edited comment on CASSANDRA-17142 at 11/27/21, 3:19 PM: - This change can fit nicely in the Guardrails framework, similar to CASSANDRA-17150. was (Author: ge): This change can fit nicely in the Guardrails framework, similarly to https://issues.apache.org/jira/browse/CASSANDRA-17150. > Limit the maximum hints size per host > - > > Key: CASSANDRA-17142 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17142 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > The hints system defines a time window, i.e. max_hint_window_in_ms, to store > the hints. > It defines no limit on how much data can be kept during the time window. The > hints can grow excessively and make the node running out of disk. In such > scenario, the operators have to truncate the hints manually. > I'd propose that in addition to the conventional hints window, operators > should be able to define the maximum hints size per host, i.e. > max_hints_size_per_host_in_mb, to provide an another layer of protection. A > node stops to store hints for the down node whenever it reaches to the time > cap or the size cap. In order to not surprise the users, the config should be > disabled by default. It should also be configurable via JMX. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17142) Limit the maximum hints size per host
[ https://issues.apache.org/jira/browse/CASSANDRA-17142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449856#comment-17449856 ] Aleksandr Sorokoumov commented on CASSANDRA-17142: -- This change can fit nicely in the Guardrails framework, similarly to https://issues.apache.org/jira/browse/CASSANDRA-17150. > Limit the maximum hints size per host > - > > Key: CASSANDRA-17142 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17142 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > The hints system defines a time window, i.e. max_hint_window_in_ms, to store > the hints. > It defines no limit on how much data can be kept during the time window. The > hints can grow excessively and make the node running out of disk. In such > scenario, the operators have to truncate the hints manually. > I'd propose that in addition to the conventional hints window, operators > should be able to define the maximum hints size per host, i.e. > max_hints_size_per_host_in_mb, to provide an another layer of protection. A > node stops to store hints for the down node whenever it reaches to the time > cap or the size cap. In order to not surprise the users, the config should be > disabled by default. It should also be configurable via JMX. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16840) Close native transport port before hint transfer during decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16840: - Reviewers: Aleksandr Sorokoumov, Brandon Williams (was: Brandon Williams) > Close native transport port before hint transfer during decommission > > > Key: CASSANDRA-16840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16840 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.x > > > New hints can be generated on a node when it's decommissioning which is a > problem if the node has already started hint transfer because any hints that > come in after the transfer has begun will remain on-disk and not be > transferred to a peer. > You can work around this problem by manually closing the native transport > port before starting the decommission with {{nodetool disablebinary}} but it > feels like something we might want to do automatically. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16840) Close native transport port before hint transfer during decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449855#comment-17449855 ] Aleksandr Sorokoumov commented on CASSANDRA-16840: -- Hey Matt! This patch looks good to me. In my opinion the interaction between {{nodetool decomission}} and transferring and creating new hints is subtle enough to benefit from a (d)test. WDYT? Please let me know if you need a hand with it. > Close native transport port before hint transfer during decommission > > > Key: CASSANDRA-16840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16840 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Hints >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.x > > > New hints can be generated on a node when it's decommissioning which is a > problem if the node has already started hint transfer because any hints that > come in after the transfer has begun will remain on-disk and not be > transferred to a peer. > You can work around this problem by manually closing the native transport > port before starting the decommission with {{nodetool disablebinary}} but it > feels like something we might want to do automatically. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-15215: - Test and Documentation Plan: Added unit tests for new methods and benchmarks to show performance improvements. Status: Patch Available (was: In Progress) > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449488#comment-17449488 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- While working on the read path I've realized that it already has the optimization we discussed since CASSANDRA-8630 - https://github.com/apache/cassandra/blob/951d72cd929d1f6c9329becbdd7604a9e709587b/src/java/org/apache/cassandra/io/util/RebufferingInputStream.java#L239-L268. Since the last update I added new test cases to {{VIntCodingTest}} to cover buffered and unbuffered reads and writes as well as extended {{DataOutputTest}} to cover {{DataOutputPlus#writeBytes}}. Patches: * [3.0|https://github.com/apache/cassandra/pull/1343] * [3.11|https://github.com/apache/cassandra/pull/1344] * [4.0|https://github.com/apache/cassandra/pull/1345] * [trunk|https://github.com/apache/cassandra/pull/1346] To demonstrate the results I picked a single benchmark - {{testWriteRandomLongDOP}} as it shows overall performance improvement and is relevant for all Cassandra versions. The results are for the megamorphic benchmark variation. !testWriteRandomLongDOP_final.png|width=800px! [~benedict], [~blambov] Can you please review the patch and run the CI? > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-15215: - Attachment: testWriteRandomLongDOP_final.png > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: testWriteRandomLongDOP_final.png, > writeUnsignedVInt_megamorphic_BB.png, writeUnsignedVInt_megamorphic_DOP.png > > Time Spent: 40m > Remaining Estimate: 0h > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447439#comment-17447439 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- Thanks for the review and the suggestions [~benedict] ! This Wednesday I plan to work on read performance, applying your changes, and adding unit tests for new code branches. Hopefully, the patch is going to be ready by the end of the week. > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: writeUnsignedVInt_megamorphic_BB.png, > writeUnsignedVInt_megamorphic_DOP.png > > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444639#comment-17444639 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- I implemented {{DataOutputPlus#writeBytes}} and added benchmarks that use the {{DataOutputPlus}} version of the method. The register approach definitely improves write throughput. Due to increased number of benchmarks, I also added a visualization for megamorphic calls in addition to the raw results. "Multiple writes" below refers to the initial approach I tried with switch-cases for different number of bytes. I am going to apply the same register approach to reads next. !writeUnsignedVInt_megamorphic_DOP.png|width=800! !writeUnsignedVInt_megamorphic_BB.png|width=800! h4. Register {noformat} Benchmark(allocation) Mode Cnt Score Error Units VIntCodingBench.testComputeUnsignedVIntSize monomorphic avgt 15 15.939 ± 0.235 ns/op VIntCodingBench.testComputeUnsignedVIntSize bimorphic avgt 15 15.972 ± 0.170 ns/op VIntCodingBench.testComputeUnsignedVIntSize megamorphic avgt 15 15.976 ± 0.225 ns/op VIntCodingBench.testWrite1ByteBB monomorphic avgt 15 9.555 ± 0.059 ns/op VIntCodingBench.testWrite1ByteBBbimorphic avgt 15 16.777 ± 0.107 ns/op VIntCodingBench.testWrite1ByteBB megamorphic avgt 15 18.286 ± 0.155 ns/op VIntCodingBench.testWrite1ByteDOP monomorphic avgt 15 10.507 ± 0.522 ns/op VIntCodingBench.testWrite1ByteDOP bimorphic avgt 15 19.048 ± 0.262 ns/op VIntCodingBench.testWrite1ByteDOP megamorphic avgt 15 19.339 ± 0.155 ns/op VIntCodingBench.testWrite2BytesBB monomorphic avgt 15 14.688 ± 0.170 ns/op VIntCodingBench.testWrite2BytesBB bimorphic avgt 15 19.421 ± 0.115 ns/op VIntCodingBench.testWrite2BytesBB megamorphic avgt 15 21.975 ± 0.110 ns/op VIntCodingBench.testWrite2BytesDOPmonomorphic avgt 15 14.675 ± 0.102 ns/op VIntCodingBench.testWrite2BytesDOP bimorphic avgt 15 22.644 ± 0.217 ns/op VIntCodingBench.testWrite2BytesDOPmegamorphic avgt 15 22.789 ± 0.854 ns/op VIntCodingBench.testWrite3BytesBB monomorphic avgt 15 14.764 ± 0.112 ns/op VIntCodingBench.testWrite3BytesBB bimorphic avgt 15 19.543 ± 0.363 ns/op VIntCodingBench.testWrite3BytesBB megamorphic avgt 15 22.054 ± 0.138 ns/op VIntCodingBench.testWrite3BytesDOPmonomorphic avgt 15 14.706 ± 0.115 ns/op VIntCodingBench.testWrite3BytesDOP bimorphic avgt 15 22.549 ± 0.151 ns/op VIntCodingBench.testWrite3BytesDOPmegamorphic avgt 15 22.560 ± 0.370 ns/op VIntCodingBench.testWrite4BytesBB monomorphic avgt 15 14.679 ± 0.158 ns/op VIntCodingBench.testWrite4BytesBB bimorphic avgt 15 19.593 ± 0.254 ns/op VIntCodingBench.testWrite4BytesBB megamorphic avgt 15 22.202 ± 0.194 ns/op VIntCodingBench.testWrite4BytesDOPmonomorphic avgt 15 14.669 ± 0.098 ns/op VIntCodingBench.testWrite4BytesDOP bimorphic avgt 15 22.469 ± 0.195 ns/op VIntCodingBench.testWrite4BytesDOPmegamorphic avgt 15 22.681 ± 0.643 ns/op VIntCodingBench.testWrite5BytesBB monomorphic avgt 15 14.655 ± 0.142 ns/op VIntCodingBench.testWrite5BytesBB bimorphic avgt 15 19.390 ± 0.100 ns/op VIntCodingBench.testWrite5BytesBB megamorphic avgt 15 22.086 ± 0.185 ns/op VIntCodingBench.testWrite5BytesDOPmonomorphic avgt 15 14.668 ± 0.137 ns/op VIntCodingBench.testWrite5BytesDOP bimorphic avgt 15 22.833 ± 0.615 ns/op VIntCodingBench.testWrite5BytesDOPmegamorphic avgt 15 22.127 ± 0.298 ns/op VIntCodingBench.testWrite6BytesBB monomorphic avgt 15 14.766 ± 0.252 ns/op VIntCodingBench.testWrite6BytesBB bimorphic avgt 15 19.502 ± 0.128 ns/op VIntCodingBench.testWrite6BytesBB megamorphic avgt 15 22.386 ± 0.314 ns/op VIntCodingBench.testWrite6BytesDOPmonomorphic avgt 15 14.690 ± 0.122 ns/op VIntCodingBench.testWrite6BytesDOP bimorphic avgt 15 22.543 ± 0.200 ns/op VIntCodingBench.testWrite6BytesDOPmegamorphic avgt 15 22.278 ± 0.469 ns/op VIntCodingBench.testWrite7BytesBB monomorphic avgt 15 14.687 ± 0.268 ns/op VIntCodingBench.testWrite7BytesBB bimorphic avgt 15 19.434 ± 0.179 ns/op VIntCodingBench.testWrite7BytesBB megamorphic avgt 15 21.991 ± 0.160 ns/op VIntCodingBench.testWrite7BytesDOPmonomorphic avgt 15 14.677 ± 0.131 ns/op
[jira] [Updated] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-15215: - Attachment: writeUnsignedVInt_megamorphic_BB.png writeUnsignedVInt_megamorphic_DOP.png > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > Attachments: writeUnsignedVInt_megamorphic_BB.png, > writeUnsignedVInt_megamorphic_DOP.png > > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440071#comment-17440071 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- Short status update: I just pushed changes to the benchmarks as suggested by Branimir and Benedict. The changes are: * All tests have monomorphic, bimorphic, and megamorphic versions. * Added a test that writes random longs to the ByteBuffer. * Added a test for calculating VInt size and applied Branimir's formula. Results: h4. Baseline {noformat} Benchmark(allocation) Mode Cnt Score Error Units VIntCodingBench.testComputeUnsignedVIntSize monomorphic avgt 15 17.069 ± 1.087 ns/op VIntCodingBench.testComputeUnsignedVIntSize bimorphic avgt 15 17.323 ± 0.656 ns/op VIntCodingBench.testComputeUnsignedVIntSize megamorphic avgt 15 16.791 ± 0.473 ns/op VIntCodingBench.testWrite1Bytemonomorphic avgt 15 9.047 ± 0.254 ns/op VIntCodingBench.testWrite1Byte bimorphic avgt 15 16.935 ± 0.207 ns/op VIntCodingBench.testWrite1Bytemegamorphic avgt 15 17.835 ± 0.090 ns/op VIntCodingBench.testWrite2Bytes monomorphic avgt 15 18.612 ± 0.194 ns/op VIntCodingBench.testWrite2Bytes bimorphic avgt 15 25.033 ± 0.239 ns/op VIntCodingBench.testWrite2Bytes megamorphic avgt 15 28.352 ± 0.115 ns/op VIntCodingBench.testWrite3Bytes monomorphic avgt 15 21.333 ± 0.197 ns/op VIntCodingBench.testWrite3Bytes bimorphic avgt 15 26.173 ± 0.170 ns/op VIntCodingBench.testWrite3Bytes megamorphic avgt 15 29.983 ± 0.208 ns/op VIntCodingBench.testWrite4Bytes monomorphic avgt 15 21.229 ± 0.245 ns/op VIntCodingBench.testWrite4Bytes bimorphic avgt 15 28.966 ± 0.606 ns/op VIntCodingBench.testWrite4Bytes megamorphic avgt 15 33.219 ± 1.276 ns/op VIntCodingBench.testWrite5Bytes monomorphic avgt 15 22.886 ± 0.602 ns/op VIntCodingBench.testWrite5Bytes bimorphic avgt 15 29.209 ± 1.077 ns/op VIntCodingBench.testWrite5Bytes megamorphic avgt 15 32.731 ± 0.944 ns/op VIntCodingBench.testWrite6Bytes monomorphic avgt 15 22.579 ± 0.794 ns/op VIntCodingBench.testWrite6Bytes bimorphic avgt 15 29.067 ± 0.678 ns/op VIntCodingBench.testWrite6Bytes megamorphic avgt 15 35.419 ± 1.496 ns/op VIntCodingBench.testWrite7Bytes monomorphic avgt 15 22.823 ± 0.527 ns/op VIntCodingBench.testWrite7Bytes bimorphic avgt 15 29.521 ± 1.216 ns/op VIntCodingBench.testWrite7Bytes megamorphic avgt 15 34.295 ± 2.327 ns/op VIntCodingBench.testWrite8Bytes monomorphic avgt 15 22.032 ± 0.918 ns/op VIntCodingBench.testWrite8Bytes bimorphic avgt 15 30.388 ± 1.015 ns/op VIntCodingBench.testWrite8Bytes megamorphic avgt 15 33.632 ± 1.200 ns/op VIntCodingBench.testWrite9Bytes monomorphic avgt 15 22.616 ± 1.309 ns/op VIntCodingBench.testWrite9Bytes bimorphic avgt 15 29.291 ± 1.096 ns/op VIntCodingBench.testWrite9Bytes megamorphic avgt 15 32.597 ± 0.807 ns/op VIntCodingBench.testWriteRandomLong monomorphic avgt 15 35.010 ± 1.145 ns/op VIntCodingBench.testWriteRandomLong bimorphic avgt 15 43.090 ± 0.615 ns/op VIntCodingBench.testWriteRandomLong megamorphic avgt 15 43.196 ± 1.742 ns/op {noformat} h4. Patch {noformat} VIntCodingBench.testComputeUnsignedVIntSize monomorphic avgt 15 16.339 ± 0.418 ns/op VIntCodingBench.testComputeUnsignedVIntSize bimorphic avgt 15 16.340 ± 0.417 ns/op VIntCodingBench.testComputeUnsignedVIntSize megamorphic avgt 15 16.435 ± 0.408 ns/op VIntCodingBench.testWrite1Bytemonomorphic avgt 15 9.362 ± 0.208 ns/op VIntCodingBench.testWrite1Byte bimorphic avgt 15 18.164 ± 0.839 ns/op VIntCodingBench.testWrite1Bytemegamorphic avgt 15 19.800 ± 0.942 ns/op VIntCodingBench.testWrite2Bytes monomorphic avgt 15 10.094 ± 0.444 ns/op VIntCodingBench.testWrite2Bytes bimorphic avgt 15 18.310 ± 0.813 ns/op VIntCodingBench.testWrite2Bytes megamorphic avgt 15 19.685 ± 0.692 ns/op VIntCodingBench.testWrite3Bytes monomorphic avgt 15 11.541 ± 0.433 ns/op VIntCodingBench.testWrite3Bytes bimorphic avgt 15 19.087 ± 0.720 ns/op VIntCodingBench.testWrite3Bytes megamorphic avgt 15 20.518 ± 1.035 ns/op VIntCodingBench.testWrite4Bytes monomorphic avgt
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438899#comment-17438899 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- I've rebased the dtest and added {{@since("3.0")}} to it. > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at >
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438226#comment-17438226 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- Thank you for the review and running the tests [~e.dimitrova]! Tomorrow I will mark the dtest to run only since 3.0, rebase the patch against latest trunk and backport it to 4.0. > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at >
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437257#comment-17437257 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- Thank you for the suggestion! I will add a benchmark to see if there is measurable difference. > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436539#comment-17436539 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- Hey [~benedict]! Thank you so much for a quick and elaborate answer! As a next step, I am going to extend the benchmark for implementations of {{DataOutput}} to estimate how well my patch works there. After that, I will extend {{DataOutputPlus}} and {{DataInputPlus}} as you suggested. As I am working on this patch in my free time, it might take a bit. I hope to provide an update by the end of next week. > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436519#comment-17436519 ] Aleksandr Sorokoumov commented on CASSANDRA-15215: -- [~benedict] if I understood your idea correctly, you suggest writing relevant bytes directly instead of preparing the thread-local byte array and memcpy'ing it into the given buffer. Is my interpretation correct? In the title and description you also mentioned reads, but I haven't figured how to adjust this idea there, so here are results for writes only. h2. Code ||Branch||Description|| |[baseline|https://github.com/apache/cassandra/compare/trunk...Ge:15215-baseline-trunk?expand=1]|trunk + benchmark| |[patch|https://github.com/apache/cassandra/compare/trunk...Ge:15215-trunk?expand=1]|patch + benchmark| h2. Setup Each benchmark does {{VIntCoding.writeUnsignedVInt}} on a {{long}} from 1 to 9 bytes. The write target is a {{ByteBuffer}} - both on- and off- heap. The results are produced on a MBP 2019 - 2,3 GHz 8-Core Intel Core i9. h2. Results h3. Baseline {noformat} Benchmark(allocation) Mode Cnt Score Error Units VIntCodingBench.testWrite1Byte HEAP avgt 15 9.084 ± 5.196 ns/op VIntCodingBench.testWrite1Byte DIRECT avgt 15 5.037 ± 0.638 ns/op VIntCodingBench.testWrite2Bytes HEAP avgt 15 15.604 ± 0.646 ns/op VIntCodingBench.testWrite2BytesDIRECT avgt 15 15.028 ± 0.568 ns/op VIntCodingBench.testWrite3Bytes HEAP avgt 15 16.704 ± 0.461 ns/op VIntCodingBench.testWrite3BytesDIRECT avgt 15 17.410 ± 0.489 ns/op VIntCodingBench.testWrite4Bytes HEAP avgt 15 17.086 ± 0.527 ns/op VIntCodingBench.testWrite4BytesDIRECT avgt 15 20.307 ± 0.705 ns/op VIntCodingBench.testWrite5Bytes HEAP avgt 15 17.395 ± 0.578 ns/op VIntCodingBench.testWrite5BytesDIRECT avgt 15 17.558 ± 0.512 ns/op VIntCodingBench.testWrite6Bytes HEAP avgt 15 18.114 ± 0.967 ns/op VIntCodingBench.testWrite6BytesDIRECT avgt 15 19.023 ± 0.591 ns/op VIntCodingBench.testWrite7Bytes HEAP avgt 15 18.004 ± 0.298 ns/op VIntCodingBench.testWrite7BytesDIRECT avgt 15 19.081 ± 0.601 ns/op VIntCodingBench.testWrite8Bytes HEAP avgt 15 18.466 ± 0.463 ns/op VIntCodingBench.testWrite8BytesDIRECT avgt 15 20.228 ± 5.620 ns/op VIntCodingBench.testWrite9Bytes HEAP avgt 15 18.553 ± 0.537 ns/op VIntCodingBench.testWrite9BytesDIRECT avgt 15 20.101 ± 0.476 ns/op {noformat} h3. Patch {noformat} Benchmark(allocation) Mode Cnt Score Error Units VIntCodingBench.testWrite1Byte HEAP avgt 15 4.728 ± 0.077 ns/op VIntCodingBench.testWrite1Byte DIRECT avgt 15 6.415 ± 3.157 ns/op VIntCodingBench.testWrite2Bytes HEAP avgt 15 8.244 ± 0.440 ns/op VIntCodingBench.testWrite2BytesDIRECT avgt 15 9.136 ± 3.979 ns/op VIntCodingBench.testWrite3Bytes HEAP avgt 15 8.714 ± 0.134 ns/op VIntCodingBench.testWrite3BytesDIRECT avgt 15 9.690 ± 2.735 ns/op VIntCodingBench.testWrite4Bytes HEAP avgt 15 8.634 ± 0.164 ns/op VIntCodingBench.testWrite4BytesDIRECT avgt 15 6.830 ± 0.061 ns/op VIntCodingBench.testWrite5Bytes HEAP avgt 15 8.389 ± 0.207 ns/op VIntCodingBench.testWrite5BytesDIRECT avgt 15 8.059 ± 1.537 ns/op VIntCodingBench.testWrite6Bytes HEAP avgt 15 10.861 ± 0.336 ns/op VIntCodingBench.testWrite6BytesDIRECT avgt 15 9.816 ± 1.482 ns/op VIntCodingBench.testWrite7Bytes HEAP avgt 15 11.045 ± 0.419 ns/op VIntCodingBench.testWrite7BytesDIRECT avgt 15 10.702 ± 2.377 ns/op VIntCodingBench.testWrite8Bytes HEAP avgt 15 10.375 ± 0.423 ns/op VIntCodingBench.testWrite8BytesDIRECT avgt 15 7.237 ± 0.176 ns/op VIntCodingBench.testWrite9Bytes HEAP avgt 15 11.200 ± 0.365 ns/op VIntCodingBench.testWrite9BytesDIRECT avgt 15 8.152 ± 0.282 ns/op {noformat} > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable,
[jira] [Assigned] (CASSANDRA-15215) VIntCoding should read and write more efficiently
[ https://issues.apache.org/jira/browse/CASSANDRA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov reassigned CASSANDRA-15215: Assignee: Aleksandr Sorokoumov > VIntCoding should read and write more efficiently > - > > Key: CASSANDRA-15215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15215 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > Most vints occupy significantly fewer than 8 bytes, and most buffers have >= > 8 bytes spare, in which case we can construct the relevant bytes in a > register and memcpy them to the correct position. Since we read and write a > lot of vints, this waste is probably measurable, particularly during > compaction and flush, and can probably be considered a performance bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Status: Needs Committer (was: Review In Progress) > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at >
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Authors: Aleksandr Sorokoumov, Serban Teodorescu (was: Serban Teodorescu) > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at >
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434736#comment-17434736 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- Thank you for a quick review [~marcuse]! I fixed the nit. AFAIU, with one +1 from a committer, the correct status for this issue is {{NEEDS COMMITTER}}; I will change it accordingly. > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at >
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Reviewers: Aleksandr Sorokoumov, Marcus Eriksson, Aleksandr Sorokoumov (was: Aleksandr Sorokoumov, Marcus Eriksson) Aleksandr Sorokoumov, Marcus Eriksson, Aleksandr Sorokoumov (was: Aleksandr Sorokoumov, Marcus Eriksson) Status: Review In Progress (was: Patch Available) > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at >
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Reviewers: Aleksandr Sorokoumov, Marcus Eriksson (was: Aleksandr Sorokoumov) > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at >
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434226#comment-17434226 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- [~bdeggleston], [~marcuse] Do you have cycles to review? [Streaming fix + SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-sstableloader-4.0?expand=1] from the comment above is the patch I think we should merge. > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at >
[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429575#comment-17429575 ] Aleksandr Sorokoumov commented on CASSANDRA-16334: -- Thank you for the review and running the CI [~adelapena]! I added the non-null check in 3.0 and 3.11 branches. The same check is not necessary in 4.0 onward, because BatchlogManager no longer passes null as a configured CL level. > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > Time Spent: 1h > Remaining Estimate: 0h > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; > CREATE TABLE test.test (key int PRIMARY KEY, val blob); > exit; > # Insert data > python > from cassandra.cluster import Cluster > cluster = Cluster() > session = cluster.connect('test') > blob = f = open("2mbBlob", "rb").read().hex() > session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob > + "'))") > {noformat} > Reproduced in 3.0, 3.11, 4.0, trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16334: - Fix Version/s: 4.x 4.0.x 3.11.x 3.0.x > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; > CREATE TABLE test.test (key int PRIMARY KEY, val blob); > exit; > # Insert data > python > from cassandra.cluster import Cluster > cluster = Cluster() > session = cluster.connect('test') > blob = f = open("2mbBlob", "rb").read().hex() > session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob > + "'))") > {noformat} > Reproduced in 3.0, 3.11, 4.0, trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16334: - Test and Documentation Plan: Added a dtest that covers both bugs causing replica failures to appear as timeouts. Status: Patch Available (was: In Progress) > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; > CREATE TABLE test.test (key int PRIMARY KEY, val blob); > exit; > # Insert data > python > from cassandra.cluster import Cluster > cluster = Cluster() > session = cluster.connect('test') > blob = f = open("2mbBlob", "rb").read().hex() > session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob > + "'))") > {noformat} > Reproduced in 3.0, 3.11, 4.0, trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426575#comment-17426575 ] Aleksandr Sorokoumov commented on CASSANDRA-16334: -- I have described the root cause in the previous comment. Two distinct bugs make replica failures appear as timeouts, one for DC- local and -global consistency levels. Fixing the latter also resolves the "zombie-hint" issue I described at the end of the previous message. The reason replica failure appears as a timeout in DC-local consistency level is that {{AbstractWriteResponseHandler}} counts nodes in all DCs as potential candidates to wait for. The fix is to wait only for the DC-local nodes. The second bug that is responsible both for the "zombie-hints" and the timeout issue with global consistency levels is related to forwarding replica failures to the correct address. This patch makes replicas send request failures to the original coordinator rather than the DC-local one that forwarded them the message. Besides, in 3.0 and 3.11, I also added missing respond-on-failure flag to the forwarded messages. Patches: * [dtest|https://github.com/apache/cassandra-dtest/pull/165] * [3.0|https://github.com/apache/cassandra/pull/1259] * [3.11|https://github.com/apache/cassandra/pull/1260] * [4.0|https://github.com/apache/cassandra/pull/1261] * [trunk|https://github.com/apache/cassandra/pull/1262] [~paulo] Can you please start the CI? > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; > CREATE TABLE test.test (key int PRIMARY KEY, val blob); > exit; > # Insert data > python > from cassandra.cluster import Cluster > cluster = Cluster() > session = cluster.connect('test') > blob = f = open("2mbBlob", "rb").read().hex() > session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob > + "'))") > {noformat} > Reproduced in 3.0, 3.11, 4.0, trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426570#comment-17426570 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- The CI results are a bit far from green, but none of the failures seem to be related to this patch. Is there anything else I should do before this patch is ready to commit [~e.dimitrova] [~azotcsit] [~stefan.miklosovic]? > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426196#comment-17426196 ] Aleksandr Sorokoumov edited comment on CASSANDRA-14795 at 10/8/21, 5:32 PM: As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/989ace231731a822a7d583625f3f0615ceba4a35] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * [CI|https://ci-cassandra.apache.org/job/Cassandra-devbranch/1202/] was (Author: ge): As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/989ace231731a822a7d583625f3f0615ceba4a35] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * CI (TBA) > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426196#comment-17426196 ] Aleksandr Sorokoumov edited comment on CASSANDRA-14795 at 10/8/21, 5:27 PM: As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/989ace231731a822a7d583625f3f0615ceba4a35] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * CI (TBA) was (Author: ge): As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/989ace231731a822a7d583625f3f0615ceba4a35] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * [CI|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1196/] > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426196#comment-17426196 ] Aleksandr Sorokoumov edited comment on CASSANDRA-14795 at 10/8/21, 5:26 PM: As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/989ace231731a822a7d583625f3f0615ceba4a35] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * [CI|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1196/] was (Author: ge): As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/0f733e070bb6a33b6d1f7b7bde33d40383d5fcfa] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * [CI|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1196/] > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426196#comment-17426196 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- As there are no more outstanding review feedback, I squashed the changes to prepare for commit: * [patch|https://github.com/apache/cassandra/pull/1232/commits/0f733e070bb6a33b6d1f7b7bde33d40383d5fcfa] * [dtest|https://github.com/apache/cassandra-dtest/pull/162/commits/ca882c704e0cba48027a2f6b84e603b63b81f882] * [CI|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1196/] > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426110#comment-17426110 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- Thank you [~stefan.miklosovic] for starting the CI! In this run there were no timeout failures which confirms overall running time of {{HintsServiceTest}} as a likely reason. To fix it, I moved newly added {{testListPendingHints}} to a separate test suite and reverted increased timeout. [~azotcsit] I don't see any new comments in the PR. Perhaps, you should click "Submit review" for them to appear in the PR? > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425425#comment-17425425 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- I have a suspicion that the test times out because the entire suite takes too long to run for the given timeout. We observe it as {{HintsServiceTest.testListPendingHints}} failure because it is the last test case in the suite. For test report and logs see e.g. https://nightlies.apache.org/cassandra/devbranch/Cassandra-devbranch/1187/ {{test.timeout}} is set to 240 seconds. On successful runs the suite takes a bit longer than 200 seconds to finish, each case taking between 30 and 60 seconds. As a speculation, a small hiccup or a slight deviation in test time might lead to a timeout. I'd like to verify this idea by increasing {{test.timeout}} to 360 seconds and re-running the CI. If this theory is correct, {{HintsServiceTest}} should succeed and overall test duration might exceed 240 seconds on 1 or more attempts. [~azotcsit], [~e.dimitrova] Can I kindly ask you to re-run https://ci-cassandra.apache.org/job/Cassandra-devbranch/1187/? > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 5.5h > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16334: - Description: Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws a write error on a single DC keyspace with RF=3: {noformat} cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 3 failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 3} {noformat} The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): {noformat} cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} {noformat} Reproduction steps: {noformat} # Setup cluster ccm create -n 3:3 test for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> ~/.ccm/test/node$i/conf/cassandra.yaml; done ccm start # Create schema ccm node1 cqlsh CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}; CREATE TABLE test.test (key int PRIMARY KEY, val blob); exit; # Insert data python from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('test') blob = f = open("2mbBlob", "rb").read().hex() session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob + "'))") {noformat} Reproduced in 3.0, 3.11, 4.0, trunk. was: Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws a write error on a single DC keyspace with RF=3: {noformat} cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 3 failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 3} {noformat} The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): {noformat} cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} {noformat} Reproduction steps: {noformat} # Setup cluster ccm create -n 3:3 test for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> ~/.ccm/test/node$i/conf/cassandra.yaml; done ccm start # Create schema ccm node1 cqlsh CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}; CREATE TABLE test.test (key int PRIMARY KEY, val blob); exit; # Insert data python from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('test') blob = f = open("2mbBlob", "rb").read().hex() session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob + "'))") {noformat} Reproduced in 3.0, 3.11, trunk. > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
[jira] [Updated] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16334: - Description: Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws a write error on a single DC keyspace with RF=3: {noformat} cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 3 failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 3} {noformat} The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): {noformat} cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} {noformat} Reproduction steps: {noformat} # Setup cluster ccm create -n 3:3 test for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> ~/.ccm/test/node$i/conf/cassandra.yaml; done ccm start # Create schema ccm node1 cqlsh CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}; CREATE TABLE test.test (key int PRIMARY KEY, val blob); exit; # Insert data python from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('test') blob = f = open("2mbBlob", "rb").read().hex() session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob + "'))") {noformat} Reproduced in 3.0, 3.11, trunk. was: Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws a write error on a single DC keyspace with RF=3: {noformat} cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 3 failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 3} {noformat} The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): {noformat} cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} {noformat} Reproduction steps: {noformat} # Setup cluster ccm create -n 3:3 test for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> ~/.ccm/test/node$i/conf/cassandra.yaml; done ccm start # Create schema ccm node1 cqlsh CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}; CREATE TABLE test.test (key int PRIMARY KEY, val blob); exit; # Insert data python from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('test') blob = f = open("2mbBlob", "rb").read().hex() session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob + "'))") {noformat} Reproduced in 3.11, trunk. > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', >
[jira] [Updated] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16334: - Description: Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws a write error on a single DC keyspace with RF=3: {noformat} cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 3 failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 3} {noformat} The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): {noformat} cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} {noformat} Reproduction steps: {noformat} # Setup cluster ccm create -n 3:3 test for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> ~/.ccm/test/node$i/conf/cassandra.yaml; done ccm start # Create schema ccm node1 cqlsh CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}; CREATE TABLE test.test (key int PRIMARY KEY, val blob); exit; # Insert data python from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('test') blob = f = open("2mbBlob", "rb").read().hex() session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob + "'))") {noformat} Reproduced in 3.11, trunk. was: Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws a write error on a single DC keyspace with RF=3: {noformat} cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 3 failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 3} {noformat} The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): {noformat} cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} {noformat} Reproduction steps: {noformat} # Setup cluster ccm create -n 3:3 test for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> ~/.ccm/test/node$i/conf/cassandra.yaml; done ccm start # Create schema ccm node1 cqlsh CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}; CREATE TABLE test.test (key int PRIMARY KEY, val blob); exit; # Insert data python from cassandra.cluster import Cluster session = cluster.connect('test') blob = f = open("2mbBlob", "rb").read().hex() session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob + "'))") {noformat} Reproduced in 3.11, trunk. > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; >
[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425115#comment-17425115 ] Aleksandr Sorokoumov commented on CASSANDRA-16334: -- This bug happens in the [AbstractWriteResponseHandler#onFailure|https://github.com/apache/cassandra/blob/2e2db4dc40c4935305b9a2d5d271580e96dabe42/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L252-L265]: {code} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { logger.trace("Got failure from {}", from); int n = waitingFor(from) ? failuresUpdater.incrementAndGet(this) : failures; failureReasonByEndpoint.put(from, failureReason); if (blockFor() + n > candidateReplicaCount()) signal(); } {code} In the reproduction steps, {{INSERT INTO TEST}} uses CL {{LOCAL_ONE}}. Accordingly, [DatacenterWriteResponseHandler#waitingFor|https://github.com/apache/cassandra/blob/2e2db4dc40c4935305b9a2d5d271580e96dabe42/src/java/org/apache/cassandra/service/DatacenterWriteResponseHandler.java#L59-L63] only waits for the local nodes: {code} private final Predicate waitingFor = InOurDcTester.endpoints(); @Override protected boolean waitingFor(InetAddressAndPort from) { return waitingFor.test(from); } {code} [AbstractWriteResponseHandler#candidateReplicaCount()|https://github.com/apache/cassandra/blob/2e2db4dc40c4935305b9a2d5d271580e96dabe42/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L205-L213] in the condition above, however, counts live and down replicas in ALL DCs as valid candidates: {code} protected int candidateReplicaCount() { return replicaPlan.liveAndDown().size(); } {code} As a result, even after all local nodes respond with {{FAILURE_RSP}}, the coordinator waits for responses from nodes in other DCs... but never counts them in. There is more! Following the timeout or request failure, the coordinator creates hints for the nodes in other DCs which it will try to deliver forever. > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; > CREATE TABLE test.test (key int PRIMARY KEY, val blob); > exit; > # Insert data > python > from cassandra.cluster import Cluster > session = cluster.connect('test') > blob = f = open("2mbBlob", "rb").read().hex() > session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob > + "'))") > {noformat} > Reproduced in 3.11, trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424934#comment-17424934 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- Thank you for the review [~stefan.miklosovic]! I do not think that CASSANDRA-14309 collides with this patch. Brief review of the code did not show any conceptual clash - the changes in 14309 should not be affected by the changes in my patch. I also cherry-picked [https://github.com/instaclustr/cassandra/tree/CASSANDRA-14309] and [https://github.com/apache/cassandra-dtest/pull/153.] Resolving git conflicts was trivial and all tests added by both patches passed. > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 5.5h > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424439#comment-17424439 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- [~e.dimitrova], [~azotcsit] I rebased the PR against latest trunk and squashed review commits; haven't started new CI runs as I no longer have access to the CircleCi's enterprise account. Please let me know if you have more suggestions. > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 5h > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423500#comment-17423500 ] Aleksandr Sorokoumov commented on CASSANDRA-16986: -- [~maedhroz] Yes, please! > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write
[ https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov reassigned CASSANDRA-16334: Assignee: Aleksandr Sorokoumov > Replica failure causes timeout on multi-DC write > > > Key: CASSANDRA-16334 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16334 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Messaging/Internode >Reporter: Paulo Motta >Assignee: Aleksandr Sorokoumov >Priority: Normal > > Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws > a write error on a single DC keyspace with RF=3: > {noformat} > cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to > execute write] message="Operation failed - received 0 responses and 3 > failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN > from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': > 1, 'received_responses': 0, 'failures': 3} > {noformat} > The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each): > {noformat} > cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed > out waiting for replica nodes' responses] message="Operation timed out - > received only 0 responses." info={'consistency': 'LOCAL_ONE', > 'required_responses': 1, 'received_responses': 0} > {noformat} > Reproduction steps: > {noformat} > # Setup cluster > ccm create -n 3:3 test > for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> > ~/.ccm/test/node$i/conf/cassandra.yaml; done > ccm start > # Create schema > ccm node1 cqlsh > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', > 'dc1': 3, 'dc2': 3}; > CREATE TABLE test.test (key int PRIMARY KEY, val blob); > exit; > # Insert data > python > from cassandra.cluster import Cluster > session = cluster.connect('test') > blob = f = open("2mbBlob", "rb").read().hex() > session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob > + "'))") > {noformat} > Reproduced in 3.11, trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422678#comment-17422678 ] Aleksandr Sorokoumov commented on CASSANDRA-16975: -- Ah, you are right! I am still not used to the fact that 4.0 is not trunk :) > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422664#comment-17422664 ] Aleksandr Sorokoumov commented on CASSANDRA-16975: -- Thank you! > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422646#comment-17422646 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- [~e.dimitrova] {quote} I would suggest running the two new tests in a loop in the Circle CI multiplexer to ensure no weird flakiness appears in the future. {quote} * [j8 repeated tests|https://app.circleci.com/pipelines/github/Ge/cassandra/216/workflows/c789a0c0-2974-48b5-bd27-2a33de2d72b0] * [j11 repeated tests|https://app.circleci.com/pipelines/github/Ge/cassandra/216/workflows/b59c5c5a-4f25-47ba-ae3b-917b01db8b67] > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422601#comment-17422601 ] Aleksandr Sorokoumov commented on CASSANDRA-16975: -- [~adelapena] As this patch has two +1s, should I move it to {{READY TO COMMIT}}? > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422105#comment-17422105 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- Thank you for the review [~azotcsit], [~e.dimitrova]! I've created a PR https://github.com/apache/cassandra/pull/1232 as you asked. In my opinion, per-target hints together with the status column are helpful to understand what nodes we accumulate hints for and what nodes are ready for the hand-off. I added information about dc and rack to correlate the number of hints and the nodes' status with their location in a single output. I don't have too much experience operating C*, so maybe I am over-complicating it in an attempt to design a convenient UX :) Looking forward to seeing other opinions. > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16975: - Fix Version/s: 3.11.x 3.0.x > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16986: - Change Category: Code Clarity (was: Performance) > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420753#comment-17420753 ] Aleksandr Sorokoumov commented on CASSANDRA-16986: -- Patches: * [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...Ge:16986-3.0?expand=1] * [3.11, 4.0, 4.1|https://github.com/apache/cassandra/compare/cassandra-3.11...Ge:16986-3.11?expand=1] > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16986: - Fix Version/s: 4.0.x 3.11.x 3.0.x > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16986: - Priority: Low (was: Normal) > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420607#comment-17420607 ] Aleksandr Sorokoumov commented on CASSANDRA-16986: -- Thank you for the discussion! I agree with Caleb's points and will create a new patch later today with a comment that explains why we still need to recycle segments on DROP TABLE. > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419144#comment-17419144 ] Aleksandr Sorokoumov commented on CASSANDRA-16975: -- I removed {{throws Exception}} from the test and added patches + CI for 3.0 and 3.11 to the table in the previous comment. > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417326#comment-17417326 ] Aleksandr Sorokoumov edited comment on CASSANDRA-16975 at 9/23/21, 11:24 AM: - ||Branch||CI|| |[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...Ge:16975-3.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/208/workflows/ed8ff9b7-f126-477b-8191-b39efd61345d]| |[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...Ge:16975-3.11?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/207/workflows/12bf105b-60b2-4fec-b98e-ab08ef6d33bc]| |[4.0|https://github.com/apache/cassandra/compare/trunk...Ge:CASSANDRA-16975?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/9f0978f7-b363-440d-aa88-1a8a2b4b6316] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/6fbd5910-0e98-457f-8d1a-0b1f2048052c]| was (Author: ge): ||Branch||CI|| |[4.0|https://github.com/apache/cassandra/compare/trunk...Ge:CASSANDRA-16975?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/9f0978f7-b363-440d-aa88-1a8a2b4b6316] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/6fbd5910-0e98-457f-8d1a-0b1f2048052c]| > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16986: - Test and Documentation Plan: I added a test that ensures that a dropped table does not leave dirty intervals in the active segments. Status: Patch Available (was: Open) This patch removes {{CommitLog.instance#forceRecycleAllSegments}} from the {{DROP TABLE}} path. In addition, {{AbstractCommitLogSegmentManager#forceRecycleAll}} no longer explicitly cleans up the segment intervals for the dropping table. ||Branch||CI|| |[trunk|https://github.com/apache/cassandra/compare/trunk...Ge:CASSANDRA-16986?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/204/workflows/1125c27e-392f-49b7-9488-e702d5afbc84] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/204/workflows/a55543ad-e4ca-469a-8037-79dd508ba606]| > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16986: - Change Category: Performance Complexity: Normal Status: Open (was: Triage Needed) > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
[ https://issues.apache.org/jira/browse/CASSANDRA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16986: - Fix Version/s: 4.x > DROP Table should not recycle active CommitLog segments > --- > > Key: CASSANDRA-16986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 > Project: Cassandra > Issue Type: Improvement > Components: Local/Commit Log >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.x > > > Right now, DROP TABLE recycles all active CL segments and explicitly marks > intervals as clean for all dropping tables. I believe that this is not > necessary. > Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was > necessary to recycle all active segments because: > 1. CommitLog reused old segments after they were clean. This is no longer the > case, I believe, since CASSANDRA-6809. > 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to > avoid resurrecting data if a table with the same name is created. This was an > issue because tables didn't have unique ids yet (CASSANDRA-5202). > Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals > in Keyspace#unloadCF, I think that we can avoid the call to > {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16986) DROP Table should not recycle active CommitLog segments
Aleksandr Sorokoumov created CASSANDRA-16986: Summary: DROP Table should not recycle active CommitLog segments Key: CASSANDRA-16986 URL: https://issues.apache.org/jira/browse/CASSANDRA-16986 Project: Cassandra Issue Type: Improvement Components: Local/Commit Log Reporter: Aleksandr Sorokoumov Assignee: Aleksandr Sorokoumov Right now, DROP TABLE recycles all active CL segments and explicitly marks intervals as clean for all dropping tables. I believe that this is not necessary. Recycling of CL segments was introduced in CASSANDRA-3578. Back then, it was necessary to recycle all active segments because: 1. CommitLog reused old segments after they were clean. This is no longer the case, I believe, since CASSANDRA-6809. 2. CommitLog segments must have been closed and recycled on {{DROP TABLE}} to avoid resurrecting data if a table with the same name is created. This was an issue because tables didn't have unique ids yet (CASSANDRA-5202). Given that {{DROP TABLE}} triggers flush, which in turn cleans CL intervals in Keyspace#unloadCF, I think that we can avoid the call to {{forceRecycleAll}} there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419060#comment-17419060 ] Aleksandr Sorokoumov commented on CASSANDRA-14795: -- This patch introduces both a nodetool command and a virtual table for pending hints. ||Branch||CI|| |[dtest|https://github.com/apache/cassandra-dtest/pull/162]| | |[trunk|https://github.com/apache/cassandra/compare/trunk...Ge:14795?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/203/workflows/81678055-f5ee-44cc-b975-49225a2dc6b0] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/203/workflows/3a1b9c54-befb-4a7a-957c-a18cf536ab93]| > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14795) Expose information about stored hints via JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-14795: - Test and Documentation Plan: Added a dtest for the new nodetool command and the virtual table. Status: Patch Available (was: In Progress) > Expose information about stored hints via JMX > - > > Key: CASSANDRA-14795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14795 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Observability >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Low > Fix For: 4.x > > > Currently there is no way to determine what kind of hints a node has, apart > from looking at the filenames (thus host-ids) on disk. Having a way to access > this information would help with debugging hint creation/replay scenarios. > In addition to the JMX method, there is a new nodetool command: > {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints > Host ID Address Rack DC Status Total files Newest Oldest > 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 > 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16975: - Test and Documentation Plan: I added a test to {{CompactionTaskTest}} that ensures that SSTables produced by offline CompactionTasks are not released. Status: Patch Available (was: Open) > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16975: - Bug Category: Parent values: Correctness(12982) Complexity: Normal Discovered By: Code Inspection Severity: Low Status: Open (was: Triage Needed) > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417326#comment-17417326 ] Aleksandr Sorokoumov commented on CASSANDRA-16975: -- ||Branch||CI|| |[4.0|https://github.com/apache/cassandra/compare/trunk...Ge:CASSANDRA-16975?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/9f0978f7-b363-440d-aa88-1a8a2b4b6316] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/6fbd5910-0e98-457f-8d1a-0b1f2048052c]| > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417326#comment-17417326 ] Aleksandr Sorokoumov edited comment on CASSANDRA-16975 at 9/19/21, 12:28 PM: - ||Branch||CI|| |[4.0|https://github.com/apache/cassandra/compare/trunk...Ge:CASSANDRA-16975?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/9f0978f7-b363-440d-aa88-1a8a2b4b6316] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/6fbd5910-0e98-457f-8d1a-0b1f2048052c]| was (Author: ge): ||Branch||CI|| |[4.0|https://github.com/apache/cassandra/compare/trunk...Ge:CASSANDRA-16975?expand=1] |[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/9f0978f7-b363-440d-aa88-1a8a2b4b6316] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/200/workflows/6fbd5910-0e98-457f-8d1a-0b1f2048052c]| > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16975) CompactionTask#runMayThrow should not release new SSTables for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16975: - Summary: CompactionTask#runMayThrow should not release new SSTables for offline transactions (was: CompactionTask#runMayThrow should not remove new SSTables from the tracker for offline transactions) > CompactionTask#runMayThrow should not release new SSTables for offline > transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16975) CompactionTask#runMayThrow should not remove new SSTables from the tracker for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16975: - Description: Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline transactions ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). This change was added in CASSANDRA-8962, prior to the introduction of lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might be undesired and could have just fallen through the cracks. To my knowledge, this code does not cause any known bugs solely because in-tree tools do not access the SSTables they produce before exiting. However, if someone is to write, say, offline compaction daemon, it might break on subsequent compactions because newly created SSTables will be released. was: Right now, {{CompactionTask#runMayThrow}} removes new SSTables from the tracker for offline transactions ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). This change was added in CASSANDRA-8962, prior to the introduction of lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might be undesired and could have just fallen through the cracks. To my knowledge, this code does not cause any known bugs solely because in-tree tools do not access the SSTables they produce before exiting. However, if someone is to write, say, offline compaction daemon, it might break on subsequent iterations because newly created SSTables won't be in the tracker. > CompactionTask#runMayThrow should not remove new SSTables from the tracker > for offline transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} releases new SSTables for offline > transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent compactions because newly created SSTables will be > released. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16975) CompactionTask#runMayThrow should not remove new SSTables from the tracker for offline transactions
Aleksandr Sorokoumov created CASSANDRA-16975: Summary: CompactionTask#runMayThrow should not remove new SSTables from the tracker for offline transactions Key: CASSANDRA-16975 URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 Project: Cassandra Issue Type: Bug Components: Local/Compaction Reporter: Aleksandr Sorokoumov Assignee: Aleksandr Sorokoumov Right now, {{CompactionTask#runMayThrow}} removes new SSTables from the tracker for offline transactions ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). This change was added in CASSANDRA-8962, prior to the introduction of lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might be undesired and could have just fallen through the cracks. To my knowledge, this code does not cause any known bugs solely because in-tree tools do not access the SSTables they produce before exiting. However, if someone is to write, say, offline compaction daemon, it might break on subsequent iterations because newly created SSTables won't be in the tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16975) CompactionTask#runMayThrow should not remove new SSTables from the tracker for offline transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16975: - Fix Version/s: 4.0.x > CompactionTask#runMayThrow should not remove new SSTables from the tracker > for offline transactions > --- > > Key: CASSANDRA-16975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16975 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 4.0.x > > > Right now, {{CompactionTask#runMayThrow}} removes new SSTables from the > tracker for offline transactions > ([code|https://github.com/apache/cassandra/blob/f7c71f65c000c2c3ef7df1b034b8fdd822a396d8/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L227-L230]). > This change was added in CASSANDRA-8962, prior to the introduction of > lifecycle transactions in CASSANDRA-8568. I suspect that this behavior might > be undesired and could have just fallen through the cracks. > To my knowledge, this code does not cause any known bugs solely because > in-tree tools do not access the SSTables they produce before exiting. > However, if someone is to write, say, offline compaction daemon, it might > break on subsequent iterations because newly created SSTables won't be in the > tracker. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411254#comment-17411254 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- Hey [~ascott], Can you please try to reproduce the error with the *Streaming fix* patch I linked above? If you still can reproduce it, it'd help if you can attach relevant stack traces from the failing nodes. > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at >
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Fix Version/s: 4.0.x > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Fix For: 4.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at
[jira] [Commented] (CASSANDRA-15985) python dtest TestCqlsh added enable_scripted_user_defined_functions which breaks on 2.2
[ https://issues.apache.org/jira/browse/CASSANDRA-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384729#comment-17384729 ] Aleksandr Sorokoumov commented on CASSANDRA-15985: -- I added fixes for the rest of the broken tests mentioned in [my comment above|https://issues.apache.org/jira/browse/CASSANDRA-15985?focusedCommentId=17264377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17264377] and started [CI|https://app.circleci.com/pipelines/github/Ge/cassandra/196/workflows/61015777-9b3f-4994-8098-405b3485d658]. > python dtest TestCqlsh added enable_scripted_user_defined_functions which > breaks on 2.2 > --- > > Key: CASSANDRA-15985 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15985 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 2.2.x > > > {code} > ERROR [main] 2020-07-26 03:03:14,108 CassandraDaemon.java:744 - Exception > encountered during startup > org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml. Please > remove properties [enable_scripted_user_defined_functions] from your > cassandra.yaml > at > org.apache.cassandra.config.YamlConfigurationLoader$MissingPropertiesChecker.check(YamlConfigurationLoader.java:146) > ~[main/:na] > at > org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:113) > ~[main/:na] > at > org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:85) > ~[main/:na] > at > org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:151) > ~[main/:na] > at > org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:133) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:604) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:731) > [main/:na]] > {code} > This test doesn’t put a version limit, so all tests fail on 2.2 since the > property was added to all clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382609#comment-17382609 ] Aleksandr Sorokoumov edited comment on CASSANDRA-16349 at 7/19/21, 8:09 AM: *Short version of the review* * The bug is reproducible in 4.0+ * The fix for SSTableLoader LGTM as a way to avoid useless streaming tasks * I added a python dtest for the issue * We should also fix the way streaming handles empty SSTables after CASSANDRA-14115 *Code and CI* ||branch||CI|| |[dtest|https://github.com/apache/cassandra-dtest/pull/151]| | |[4.0 (baseline)|https://github.com/ge/cassandra/tree/cassandra-4.0-16349-dtest]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/27d68d7c-3ae8-4dcd-869b-d8bbd47157a4] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/844e33c6-1327-439d-980f-0112cf958829]| |[SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-sstableloader-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/2a308aa6-6ff6-4294-842a-6e691831c59f] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/c9729460-f035-49ab-873d-14f0cf6e2cc5]| |[Streaming fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/193/workflows/3d4d2069-0dd5-4b86-9510-d3d140ed49bf] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/193/workflows/0738e555-aa9a-4367-9c77-96ff957147c5]| |[Streaming fix + SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-sstableloader-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/194/workflows/a07f3909-987c-4ccc-8625-9166f74a7000] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/194/workflows/524e0774-ddd3-4d0d-b2d7-6d55d673a773]| *Long version of the review* I was able to reproduce the bug following the steps in the issue description in {{cassandra-4.0}} and {{trunk}}. The issue does not reproduce in the earlier versions. Given no changes in the SSTableLoader between {{3.11}} and {{trunk}}, it got me curious if the fix should be on the streaming side instead. AFAIU the failing assertion ([link|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/streaming/CassandraIncomingFile.java#L96]) was introduced in CASSANDRA-14115 as a sanity check that the file's size is not accessed before it has been read. However, this assertion might be incorrect as the default state for the size is -1, and the intention is to verify that the value has been updated. As an experiment, I changed the assertion in {{getSize}} and re-ran the test. Streaming tasks started to crash in [StreamReceiveTask#receive|https://github.com/apache/cassandra/blob/9cc7a0025d8b0859d8e9c947f6fdffd8455dd141/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L87] to due [no open SSTable writers|https://github.com/apache/cassandra/blob/9cc7a0025d8b0859d8e9c947f6fdffd8455dd141/src/java/org/apache/cassandra/io/sstable/format/RangeAwareSSTableWriter.java#L168-L171]. In my opinion, this is a bug as C* could handle streaming empty SSTables in prior versions, so I created a patch that handles empty streams without throwing exceptions. Even though it works without Serban's SSTableLoader fix, we should include it to prevent SSTableLoader from doing unnecessary work. was (Author: ge): *Short version of the review* * The bug is reproducible in 4.0+ * The fix for SSTableLoader LGTM as a way to avoid useless streaming tasks * I added a python dtest for the issue * We should also fix the way streaming handles empty SSTables after CASSANDRA-14115 *Code and CI* ||branch||CI|| |[dtest|https://github.com/apache/cassandra-dtest/pull/151]| | |[4.0 (baseline)|https://github.com/ge/cassandra/tree/cassandra-4.0-16349-dtest]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/27d68d7c-3ae8-4dcd-869b-d8bbd47157a4] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/844e33c6-1327-439d-980f-0112cf958829]| |[SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-sstableloader-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/2a308aa6-6ff6-4294-842a-6e691831c59f] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/c9729460-f035-49ab-873d-14f0cf6e2cc5]| |[Streaming fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/191/workflows/4855a5e0-8ba3-4007-8e87-3c4094702b53]
[jira] [Comment Edited] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382609#comment-17382609 ] Aleksandr Sorokoumov edited comment on CASSANDRA-16349 at 7/17/21, 4:12 PM: *Short version of the review* * The bug is reproducible in 4.0+ * The fix for SSTableLoader LGTM as a way to avoid useless streaming tasks * I added a python dtest for the issue * We should also fix the way streaming handles empty SSTables after CASSANDRA-14115 *Code and CI* ||branch||CI|| |[dtest|https://github.com/apache/cassandra-dtest/pull/151]| | |[4.0 (baseline)|https://github.com/ge/cassandra/tree/cassandra-4.0-16349-dtest]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/27d68d7c-3ae8-4dcd-869b-d8bbd47157a4] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/844e33c6-1327-439d-980f-0112cf958829]| |[SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-sstableloader-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/2a308aa6-6ff6-4294-842a-6e691831c59f] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/c9729460-f035-49ab-873d-14f0cf6e2cc5]| |[Streaming fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/191/workflows/4855a5e0-8ba3-4007-8e87-3c4094702b53] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/191/workflows/b3e1df91-fcae-410f-a3d7-2f5176203586]| |[Streaming fix + SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-sstableloader-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/192/workflows/0918fc51-7492-467b-8f87-8ea46830f262] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/192/workflows/0d15992d-c33b-4955-a531-e8c371beab15]| *Long version of the review* I was able to reproduce the bug following the steps in the issue description in {{cassandra-4.0}} and {{trunk}}. The issue does not reproduce in the earlier versions. Given no changes in the SSTableLoader between {{3.11}} and {{trunk}}, it got me curious if the fix should be on the streaming side instead. AFAIU the failing assertion ([link|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/streaming/CassandraIncomingFile.java#L96]) was introduced in CASSANDRA-14115 as a sanity check that the file's size is not accessed before it has been read. However, this assertion might be incorrect as the default state for the size is -1, and the intention is to verify that the value has been updated. As an experiment, I changed the assertion in {{getSize}} and re-ran the test. Streaming tasks started to crash in [StreamReceiveTask#receive|https://github.com/apache/cassandra/blob/9cc7a0025d8b0859d8e9c947f6fdffd8455dd141/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L87] to due [no open SSTable writers|https://github.com/apache/cassandra/blob/9cc7a0025d8b0859d8e9c947f6fdffd8455dd141/src/java/org/apache/cassandra/io/sstable/format/RangeAwareSSTableWriter.java#L168-L171]. In my opinion, this is a bug as C* could handle streaming empty SSTables in prior versions, so I created a patch that handles empty streams without throwing exceptions. Even though it works without Serban's SSTableLoader fix, we should include it to prevent SSTableLoader from doing unnecessary work. was (Author: ge): *The short version of the review* * The bug is reproducible in 4.0+ * The fix for SSTableLoader LGTM as a way to avoid useless streaming tasks * I added a python dtest for the issue * We should also fix the way streaming handles empty SSTables after CASSANDRA-14115 *Code and CI* ||branch||CI| |[dtest|https://github.com/apache/cassandra-dtest/pull/151]| | |[4.0 (baseline)|https://github.com/ge/cassandra/tree/cassandra-4.0-16349-dtest]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/27d68d7c-3ae8-4dcd-869b-d8bbd47157a4] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/844e33c6-1327-439d-980f-0112cf958829] |[SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-sstableloader-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/2a308aa6-6ff6-4294-842a-6e691831c59f] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/c9729460-f035-49ab-873d-14f0cf6e2cc5] |[Streaming fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/191/workflows/4855a5e0-8ba3-4007-8e87-3c4094702b53]
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Status: Needs Reviewer (was: Review In Progress) > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at
[jira] [Commented] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382609#comment-17382609 ] Aleksandr Sorokoumov commented on CASSANDRA-16349: -- *The short version of the review* * The bug is reproducible in 4.0+ * The fix for SSTableLoader LGTM as a way to avoid useless streaming tasks * I added a python dtest for the issue * We should also fix the way streaming handles empty SSTables after CASSANDRA-14115 *Code and CI* ||branch||CI| |[dtest|https://github.com/apache/cassandra-dtest/pull/151]| | |[4.0 (baseline)|https://github.com/ge/cassandra/tree/cassandra-4.0-16349-dtest]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/27d68d7c-3ae8-4dcd-869b-d8bbd47157a4] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/188/workflows/844e33c6-1327-439d-980f-0112cf958829] |[SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-sstableloader-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/2a308aa6-6ff6-4294-842a-6e691831c59f] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/189/workflows/c9729460-f035-49ab-873d-14f0cf6e2cc5] |[Streaming fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-fix-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/191/workflows/4855a5e0-8ba3-4007-8e87-3c4094702b53] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/191/workflows/b3e1df91-fcae-410f-a3d7-2f5176203586] |[Streaming fix + SSTableLoader fix|https://github.com/apache/cassandra/compare/trunk...Ge:16349-streaming-sstableloader-4.0?expand=1]|[j8|https://app.circleci.com/pipelines/github/Ge/cassandra/192/workflows/0918fc51-7492-467b-8f87-8ea46830f262] [j11|https://app.circleci.com/pipelines/github/Ge/cassandra/192/workflows/0d15992d-c33b-4955-a531-e8c371beab15] *Long version of the review* I was able to reproduce the bug following the steps in the issue description in {{cassandra-4.0}} and {{trunk}}. The issue does not reproduce in the earlier versions. Given no changes in the SSTableLoader between {{3.11}} and {{trunk}}, it got me curious if the fix should be on the streaming side instead. AFAIU the failing assertion ([link|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/streaming/CassandraIncomingFile.java#L96]) was introduced in CASSANDRA-14115 as a sanity check that the file's size is not accessed before it has been read. However, this assertion might be incorrect as the default state for the size is -1, and the intention is to verify that the value has been updated. As an experiment, I changed the assertion in {{getSize}} and re-ran the test. Streaming tasks started to crash in [StreamReceiveTask#receive|https://github.com/apache/cassandra/blob/9cc7a0025d8b0859d8e9c947f6fdffd8455dd141/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L87] to due [no open SSTable writers|https://github.com/apache/cassandra/blob/9cc7a0025d8b0859d8e9c947f6fdffd8455dd141/src/java/org/apache/cassandra/io/sstable/format/RangeAwareSSTableWriter.java#L168-L171]. In my opinion, this is a bug as C* could handle streaming empty SSTables in prior versions, so I created a patch that handles empty streams without throwing exceptions. Even though it works without Serban's SSTableLoader fix, we should include it to prevent SSTableLoader from doing unnecessary work. > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress:
[jira] [Commented] (CASSANDRA-15985) python dtest TestCqlsh added enable_scripted_user_defined_functions which breaks on 2.2
[ https://issues.apache.org/jira/browse/CASSANDRA-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382572#comment-17382572 ] Aleksandr Sorokoumov commented on CASSANDRA-15985: -- Thanks for reviewing my patch [~e.dimitrova]! Feel free to cherry-pick the fix for TestCqlsh#test_pycodestyle_compliance. Regarding proposed changes, should I maybe ask in Slack, wdyt? > python dtest TestCqlsh added enable_scripted_user_defined_functions which > breaks on 2.2 > --- > > Key: CASSANDRA-15985 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15985 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Aleksandr Sorokoumov >Priority: Normal > Fix For: 2.2.x > > > {code} > ERROR [main] 2020-07-26 03:03:14,108 CassandraDaemon.java:744 - Exception > encountered during startup > org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml. Please > remove properties [enable_scripted_user_defined_functions] from your > cassandra.yaml > at > org.apache.cassandra.config.YamlConfigurationLoader$MissingPropertiesChecker.check(YamlConfigurationLoader.java:146) > ~[main/:na] > at > org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:113) > ~[main/:na] > at > org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:85) > ~[main/:na] > at > org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:151) > ~[main/:na] > at > org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:133) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:604) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:731) > [main/:na]] > {code} > This test doesn’t put a version limit, so all tests fail on 2.2 since the > property was added to all clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16349) SSTableLoader reports error when SSTable(s) do not have data for some nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-16349: - Reviewers: Aleksandr Sorokoumov, Aleksandr Sorokoumov (was: Aleksandr Sorokoumov) Aleksandr Sorokoumov, Aleksandr Sorokoumov Status: Review In Progress (was: Patch Available) > SSTableLoader reports error when SSTable(s) do not have data for some nodes > --- > > Key: CASSANDRA-16349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16349 > Project: Cassandra > Issue Type: Bug > Components: Tool/sstable >Reporter: Serban Teodorescu >Assignee: Serban Teodorescu >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Running SSTableLoader in verbose mode will show error(s) if there are node(s) > that do not own any data from the SSTable(s). This can happen in at least 2 > cases: > # SSTableLoader is used to stream backups while keeping the same token ranges > # SSTable(s) are created with CQLSSTableWriter to match token ranges (this > can bring better performance by using ZeroCopy streaming) > Partial output of the SSTableLoader: > {quote}ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] > Remote peer /127.0.0.4:7000 failed stream session. > ERROR 02:47:47,842 [Stream #fa8e73b0-3da5-11eb-9c47-c5d27ae8fe47] Remote peer > /127.0.0.3:7000 failed stream session. > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.611KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.515KiB/s) > progress: [/127.0.0.4:7000]0:0/1 100% [/127.0.0.3:7000]0:0/1 100% > [/127.0.0.2:7000]0:7/7 100% [/127.0.0.1:7000]0:7/7 100% total: 100% > 0.000KiB/s (avg: 1.427KiB/s) > {quote} > > Stack trace: > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533) > at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:99) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:49) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:505) > at > org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:819) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:595) > at > org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:844) > {quote} > To reproduce create a cluster with ccm with more nodes than the RF, put some > data into it copy a SSTable and stream it. > > The error originates on the nodes, the following stack trace is shown in the > logs: > {quote}java.lang.IllegalStateException: Stream hasn't been read yet > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > org.apache.cassandra.db.streaming.CassandraIncomingFile.getSize(CassandraIncomingFile.java:96) > at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:789) > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:587) > at >