[jira] [Commented] (CASSANDRA-15625) Nodetool toppartitions error
[ https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058434#comment-17058434 ] Yuki Morishita commented on CASSANDRA-15625: I've seen this error before for C* 3.0. The cause of the error was CASSANDRA-9241, which is only fixed for 3.x. I suggest you to upgrade to the latest 3.11.x if possible. > Nodetool toppartitions error > > > Key: CASSANDRA-15625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Assignee: Sam Tunnicliffe >Priority: Normal > > c* version :3.0.15 > here's my test table: > CREATE TABLE app300.test ( > a bigint PRIMARY KEY, > b text, > c text > ) > INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); > when i use topartition :nodetool toppartitions app300 test 50,get error > error: Expected 8 or 0 byte long (1048576) > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long > (1048576) > at > org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > but when i flush this table, topartition can work > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12760) SELECT JSON "firstName" FROM ... results in {"\"firstName\"": "Bill"}
[ https://issues.apache.org/jira/browse/CASSANDRA-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyu Qi updated CASSANDRA-12760: -- Attachment: (was: Cesar Agustin Garcia Vazquez.url) > SELECT JSON "firstName" FROM ... results in {"\"firstName\"": "Bill"} > - > > Key: CASSANDRA-12760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12760 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core > Environment: Cassandra 3.7 >Reporter: Niek Bartholomeus >Assignee: Shivang Nagaria >Priority: Normal > Labels: lhf > > I'm using Cassandra to store data coming from Spark and intended for being > consumed by a javascript front end. > To avoid unnecessary field name mappings I have decided to use mixed case > fields in Cassandra. I also happily leave it to Cassandra to jsonify the data > (using SELECT JSON ...) so my scala/play web server can send the results from > Cassandra straight through to the front end. > I noticed however that all mixed case fields (that were created with quotes > as Cassandra demands) end up having a double set of quotes > {code} > create table user(id text PRIMARY KEY, "firstName" text); > insert into user(id, "firstName") values ('b', 'Bill'); > select json * from user; > [json] > -- > {"id": "b", "\"firstName\"": "Bill"} > {code} > Ideally that would be: > {code} > [json] > -- > {"id": "b", "firstName": "Bill"} > {code} > I worked around it for now by removing all "\""'s before sending the json to > the front end. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12760) SELECT JSON "firstName" FROM ... results in {"\"firstName\"": "Bill"}
[ https://issues.apache.org/jira/browse/CASSANDRA-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyu Qi updated CASSANDRA-12760: -- Attachment: Cesar Agustin Garcia Vazquez.url > SELECT JSON "firstName" FROM ... results in {"\"firstName\"": "Bill"} > - > > Key: CASSANDRA-12760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12760 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core > Environment: Cassandra 3.7 >Reporter: Niek Bartholomeus >Assignee: Shivang Nagaria >Priority: Normal > Labels: lhf > Attachments: Cesar Agustin Garcia Vazquez.url > > > I'm using Cassandra to store data coming from Spark and intended for being > consumed by a javascript front end. > To avoid unnecessary field name mappings I have decided to use mixed case > fields in Cassandra. I also happily leave it to Cassandra to jsonify the data > (using SELECT JSON ...) so my scala/play web server can send the results from > Cassandra straight through to the front end. > I noticed however that all mixed case fields (that were created with quotes > as Cassandra demands) end up having a double set of quotes > {code} > create table user(id text PRIMARY KEY, "firstName" text); > insert into user(id, "firstName") values ('b', 'Bill'); > select json * from user; > [json] > -- > {"id": "b", "\"firstName\"": "Bill"} > {code} > Ideally that would be: > {code} > [json] > -- > {"id": "b", "firstName": "Bill"} > {code} > I worked around it for now by removing all "\""'s before sending the json to > the front end. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15152) Batch Log - Mutation too large while bootstrapping a newly added node
[ https://issues.apache.org/jira/browse/CASSANDRA-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058410#comment-17058410 ] JeongHun Kim commented on CASSANDRA-15152: -- I also have the same problem while removing a node from a cluster. In my node, commitlog_segment_size_in_mb has 64. And inserted mutation size is less than 32MB certainly. I don't know why the ERROR log happen again and again. Has anyone solved this problem? > Batch Log - Mutation too large while bootstrapping a newly added node > - > > Key: CASSANDRA-15152 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15152 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Batch Log >Reporter: Avraham Kalvo >Priority: Normal > > Scaling our six nodes cluster by three more nodes, we came upon behavior in > which bootstrap appears hung under `UJ` (two previously added were joined > within approximately 2.5 hours). > Examining the logs the following became apparent shortly after the bootstrap > process has commenced for this node: > ``` > ERROR [BatchlogTasks:1] 2019-06-05 14:43:46,508 CassandraDaemon.java:207 - > Exception in thread Thread[BatchlogTasks:1,5,main] > java.lang.IllegalArgumentException: Mutation of 108035175 bytes is too large > for the maximum size of 16777216 > at > org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:520) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:399) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:213) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendSingleReplayMutation(BatchlogManager.java:427) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendReplays(BatchlogManager.java:402) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.replay(BatchlogManager.java:318) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(BatchlogManager.java:238) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches(BatchlogManager.java:207) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_201] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_201] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_201] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_201] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_201] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_201] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201] > ``` > And since then, repeating itself in the logs. > We decided to discard the newly added apparently still joining node by doing > the following: > 1. at first - simply restarting it, which resulted in it starting up > apparently normally > 2. then - decommission it by issuing `nodetool decommission`, this took long > (over 2.5 hours) and eventually was terminated by issuing `nodetool > removenode` > 3. node removal was hung on a specific token, which led us to complete it by > force. > 4. forcing the node removal has generated a corruption with one of the > `system.batches` table SSTables, which was removed (backed up) from its > underlying data dir as mitigation (78MB worth) > 5. cluster-wide repair was run > 6. `Mutation too large` error is now repeating itself in three different > permutations (alerted sizes) under three different nodes (our standard > replication factor is of three) > We're not sure whether we're hitting > https://issues.apache.org/jira/browse/CASSANDRA-11670 or not, as it's said to > be resolved in our current version of 3.0.10. > Still would like to verify what's the
[jira] [Updated] (CASSANDRA-15639) Jenkins build for jvm test should use testclasslist to support parameterized tests
[ https://issues.apache.org/jira/browse/CASSANDRA-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15639: --- Labels: pull-request-available (was: ) > Jenkins build for jvm test should use testclasslist to support parameterized > tests > -- > > Key: CASSANDRA-15639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15639 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > > We switched Circle CI to use testclasslist in CASSANDRA-15508 this was to > solve the following exception > {code} > ava.lang.Exception: No tests found matching Method > testFailingMessage(org.apache.cassandra.distributed.test.FailingRepairTest) > from org.junit.internal.requests.ClassRequest@551aa95a > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > {code} > The core issue is that the test locating logic in > org.apache.cassandra.distributed.test.TestLocator does not handle > parameterized tests so fails to find any tests for those classes. > I think it is better to switch to testclasslist as it helps make way for > running tests concurrently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15639) Jenkins build for jvm test should use testclasslist to support parameterized tests
[ https://issues.apache.org/jira/browse/CASSANDRA-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15639: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Low Hanging Fruit Component/s: Build Discovered By: Unit Test Severity: Normal Status: Open (was: Triage Needed) > Jenkins build for jvm test should use testclasslist to support parameterized > tests > -- > > Key: CASSANDRA-15639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15639 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > > We switched Circle CI to use testclasslist in CASSANDRA-15508 this was to > solve the following exception > {code} > ava.lang.Exception: No tests found matching Method > testFailingMessage(org.apache.cassandra.distributed.test.FailingRepairTest) > from org.junit.internal.requests.ClassRequest@551aa95a > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > {code} > The core issue is that the test locating logic in > org.apache.cassandra.distributed.test.TestLocator does not handle > parameterized tests so fails to find any tests for those classes. > I think it is better to switch to testclasslist as it helps make way for > running tests concurrently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15639) Jenkins build for jvm test should use testclasslist to support parameterized tests
David Capwell created CASSANDRA-15639: - Summary: Jenkins build for jvm test should use testclasslist to support parameterized tests Key: CASSANDRA-15639 URL: https://issues.apache.org/jira/browse/CASSANDRA-15639 Project: Cassandra Issue Type: Bug Reporter: David Capwell Assignee: David Capwell We switched Circle CI to use testclasslist in CASSANDRA-15508 this was to solve the following exception {code} ava.lang.Exception: No tests found matching Method testFailingMessage(org.apache.cassandra.distributed.test.FailingRepairTest) from org.junit.internal.requests.ClassRequest@551aa95a at java.lang.reflect.Constructor.newInstance(Constructor.java:423) {code} The core issue is that the test locating logic in org.apache.cassandra.distributed.test.TestLocator does not handle parameterized tests so fails to find any tests for those classes. I think it is better to switch to testclasslist as it helps make way for running tests concurrently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15593) seems reading repair bug
[ https://issues.apache.org/jira/browse/CASSANDRA-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio reassigned CASSANDRA-15593: --- Assignee: Alex Lumpov > seems reading repair bug > > > Key: CASSANDRA-15593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15593 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Assignee: Alex Lumpov >Priority: Normal > > cassandra version:2.1.15 > i have one dc and 3nodes > 1. create KEYSPACE test WITH replication = \{'class': > 'NetworkTopologyStrategy', 'DC1':'3' } and durable_writes = 'true'; > 2. create table test(a int , b int , c int , primary key(a)) with > dclocal_read_repair_chance = 1.0; > 3. insert one row into table test,instert into test(a, b, c) values (1, 1, > 1); and remove sstable on two nodes and result below: > {code} > node1:have correct row 1 1 1 > node2:doesn't have rf > node3:doesn't have rf > {code} > 4. and i use local_one select one by one like this: > {code} > node1 un,node2 dn,node3 dn:return 1 1 1 > node1 dn,node2 un,node3 dn:return null > node1 dn,node2 dn,node3 dn:return null > {code} > this prove node1 have correct rf > 5. and i let all node un,user local_quarum to select , select * from test > where a = 1; > but the read repair does't work every time(i test many times),that's the > problem(same in 3.0.15) > > i hope if i set dclocal_read_repair_chance = 1.0,every time i read by > local_quarum, if any rf digest does't match,read repair will work,and repair > all nodes > > i.m not sure does's the problem happends in this code() > wish for your reply,thanks > {code} > public void response(MessageIn message) > { > resolver.preprocess(message);int n = waitingFor(message) > ? recievedUpdater.incrementAndGet(this) > : received;if (n >= blockfor && > resolver.isDataPresent()) > { > this mean if return responses >= rf/2 +1 and a data response > return,it start compare,does't all response > condition.signalAll();// kick off a background digest > comparison if this is a result that (may have) arrived after// > the original resolve that get() kicks off as soon as the condition is > signaledif (blockfor < endpoints.size() && n == endpoints.size()) > { > TraceState traceState = Tracing.instance.get(); > if (traceState != null) > traceState.trace("Initiating read-repair"); > StageManager.getStage(Stage.READ_REPAIR).execute(new > AsyncRepairRunner(traceState)); > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15625) Nodetool toppartitions error
[ https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio reassigned CASSANDRA-15625: --- Assignee: Sam Tunnicliffe (was: Alex Lumpov) > Nodetool toppartitions error > > > Key: CASSANDRA-15625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Assignee: Sam Tunnicliffe >Priority: Normal > > c* version :3.0.15 > here's my test table: > CREATE TABLE app300.test ( > a bigint PRIMARY KEY, > b text, > c text > ) > INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); > when i use topartition :nodetool toppartitions app300 test 50,get error > error: Expected 8 or 0 byte long (1048576) > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long > (1048576) > at > org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > but when i flush this table, topartition can work > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15625) Nodetool toppartitions error
[ https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058351#comment-17058351 ] Antonio commented on CASSANDRA-15625: - Dear [~samt] : thanks for your reply. app300.test was my newly created, and i only insert one row, i also tried 1s, the same stacktrace appears but when i flush this table, the toppartitions can work well, reads and writes toppartitions can well display it seems if a row is newly insert, and use toppartitions(1s or 2s or 5s) to get READS Sampler, and select this row at the same time, that's does'nt work hope for your reply. > Nodetool toppartitions error > > > Key: CASSANDRA-15625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Assignee: Alex Lumpov >Priority: Normal > > c* version :3.0.15 > here's my test table: > CREATE TABLE app300.test ( > a bigint PRIMARY KEY, > b text, > c text > ) > INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); > when i use topartition :nodetool toppartitions app300 test 50,get error > error: Expected 8 or 0 byte long (1048576) > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long > (1048576) > at > org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > but when i flush this table, topartition can work > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12760) SELECT JSON "firstName" FROM ... results in {"\"firstName\"": "Bill"}
[ https://issues.apache.org/jira/browse/CASSANDRA-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058285#comment-17058285 ] Tianyu Qi commented on CASSANDRA-12760: --- Hi, I just started learning Cassandra and willing to contribute to this project. As [~shivtej] seems busy, can I join to work on this? I've started to look into the codebase. I will try my best first, but I may need some help later. Thanks! > SELECT JSON "firstName" FROM ... results in {"\"firstName\"": "Bill"} > - > > Key: CASSANDRA-12760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12760 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core > Environment: Cassandra 3.7 >Reporter: Niek Bartholomeus >Assignee: Shivang Nagaria >Priority: Normal > Labels: lhf > > I'm using Cassandra to store data coming from Spark and intended for being > consumed by a javascript front end. > To avoid unnecessary field name mappings I have decided to use mixed case > fields in Cassandra. I also happily leave it to Cassandra to jsonify the data > (using SELECT JSON ...) so my scala/play web server can send the results from > Cassandra straight through to the front end. > I noticed however that all mixed case fields (that were created with quotes > as Cassandra demands) end up having a double set of quotes > {code} > create table user(id text PRIMARY KEY, "firstName" text); > insert into user(id, "firstName") values ('b', 'Bill'); > select json * from user; > [json] > -- > {"id": "b", "\"firstName\"": "Bill"} > {code} > Ideally that would be: > {code} > [json] > -- > {"id": "b", "firstName": "Bill"} > {code} > I worked around it for now by removing all "\""'s before sending the json to > the front end. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058270#comment-17058270 ] David Capwell commented on CASSANDRA-15637: --- marked this as 4.0-alpha but since this is a regression in 3.0 we may want to back port. > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2 rows) > keyspace_name | table_name | range_start | range_end | > mean_partition_size | partitions_count > +--+-+-+-+-- > simpleuniform0 | simpletable0 | 2 | 6148914691236517205 | > 87 | 243072 > (1 rows) > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count >
[jira] [Updated] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc de
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15637: -- Fix Version/s: 4.0-alpha > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2 rows) > keyspace_name | table_name | range_start | range_end | > mean_partition_size | partitions_count > +--+-+-+-+-- > simpleuniform0 | simpletable0 | 2 | 6148914691236517205 | > 87 | 243072 > (1 rows) > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 |
[jira] [Commented] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058257#comment-17058257 ] David Capwell commented on CASSANDRA-15637: --- One feedback I have gotten is that I could avoid breaking the size_estimates table by adding a new one {code} CREATE TABLE system.table_estimates ( keyspace_name text, table_name text, range_type text, range_start text, range_end text, mean_partition_size bigint, partitions_count bigint, PRIMARY KEY (keyspace_name, table_name, range_type, range_start, range_end) ) WITH CLUSTERING ORDER BY (table_name ASC, range_type ASC, range_start ASC, range_end ASC) AND gc_grace_seconds = 0; {code} In this new table I added a new column to define the type of range, this would allow us to add to the table as we need too. Thoughts? > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2
[jira] [Updated] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
[ https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15358: -- Status: Ready to Commit (was: Review In Progress) +1. tests look good, only issues that stand out look unrelated to this work. > Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue > > > Key: CASSANDRA-15358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15358 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Santhosh Kumar Ramalingam >Assignee: Benedict Elliott Smith >Priority: Normal > Labels: 4.0, alpha > Fix For: 4.0, 4.0-beta > > Attachments: all_errors.txt, debug_logs_during_repair.txt, > repair_1_trace.txt, verbose_logs.diff, verbose_logs.txt > > > Hitting a bug with cassandra 4 alpha version. The same bug is repeated with > difefrent version of Java(8,11 &12) [~benedict] > > Stack trace: > {code:java} > INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for > CQL clients on /1.3.0.6:9042 (unencrypted)... > WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - > 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message > of type PING_REQ whose timeout expired before reaching the network > WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - > 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /1.3.0.2:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - > 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of > type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network > INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 > - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect > > ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 > InboundMessageHandler.java:657 - > 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception > caught while processing inbound messages; terminating connection > java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. > at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59) > at > org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95) > at > org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) > at > io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) > at > io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) >
[jira] [Updated] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
[ https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Gallardo updated CASSANDRA-15543: --- Test and Documentation Plan: No doc change needed Status: Patch Available (was: In Progress) > flaky test > org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement > --- > > Key: CASSANDRA-15543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15543 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: David Capwell >Assignee: Kevin Gallardo >Priority: Normal > Fix For: 4.0-alpha > > > This fails infrequently, last seen failure was on java 8 > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
[ https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058206#comment-17058206 ] Kevin Gallardo commented on CASSANDRA-15543: Changes available at https://github.com/newkek/cassandra/tree/15543-trunk/ > flaky test > org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement > --- > > Key: CASSANDRA-15543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15543 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: David Capwell >Assignee: Kevin Gallardo >Priority: Normal > Fix For: 4.0-alpha > > > This fails infrequently, last seen failure was on java 8 > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14773) Overflow of 32-bit integer during compaction.
[ https://issues.apache.org/jira/browse/CASSANDRA-14773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holmberg reassigned CASSANDRA-14773: - Assignee: (was: Vladimir Bukhtoyarov) > Overflow of 32-bit integer during compaction. > - > > Key: CASSANDRA-14773 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14773 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Vladimir Bukhtoyarov >Priority: Urgent > Fix For: 4.0, 4.0-beta > > > In scope of CASSANDRA-13444 the compaction was significantly improved from > CPU and memory perspective. Hovewer this improvement introduces the bug in > rounding. When rounding the expriration time which is close to > *Cell.MAX_DELETION_TIME*(it is just *Integer.MAX_VALUE*) the math overflow > happens(because in scope of -CASSANDRA-13444-) data type for point was > changed from Long to Integer in order to reduce memory footprint), as result > point became negative and acts as silent poison for internal structures of > StreamingTombstoneHistogramBuilder like *DistanceHolder* and *DataHolder*. > Then depending of point intervals: > * The TombstoneHistogram produces wrong values when interval of points is > less then binSize, it is not critical. > * Compaction crashes with ArrayIndexOutOfBoundsException if amount of point > intervals is great then binSize, this case is very critical. > > This is pull request [https://github.com/apache/cassandra/pull/273] that > reproduces the issue and provides the fix. > > The stacktrace when running(on codebase without fix) > *testMathOverflowDuringRoundingOfLargeTimestamp* without -ea JVM flag > {noformat} > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$DistanceHolder.add(StreamingTombstoneHistogramBuilder.java:208) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushValue(StreamingTombstoneHistogramBuilder.java:140) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$$Lambda$1/1967205423.consume(Unknown > Source) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.forEach(StreamingTombstoneHistogramBuilder.java:574) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushHistogram(StreamingTombstoneHistogramBuilder.java:124) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.build(StreamingTombstoneHistogramBuilder.java:184) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilderTest.testMathOverflowDuringRoundingOfLargeTimestamp(StreamingTombstoneHistogramBuilderTest.java:183) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41) > at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at org.junit.runners.ParentRunner.run(ParentRunner.java:220) > at org.junit.runner.JUnitCore.run(JUnitCore.java:159) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {noformat} > > The stacktrace when running(on codebase without fix) > *testMathOverflowDuringRoundingOfLargeTimestamp* with
[jira] [Updated] (CASSANDRA-15339) Make available the known JMX endpoints across the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15339: -- Fix Version/s: 4.x > Make available the known JMX endpoints across the cluster > - > > Key: CASSANDRA-15339 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15339 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Labels: 4.0-feature-freeze-review-requested > Fix For: 4.x > > > With the addition of multiple nodes running on the same server using > different ports: CASSANDRA-7544 ; it becomes more difficult for third-party > tools to easily connect to all nodes based on the jmx connection details to > just one node. > By adding jmx host and port to gossip, and saving it in {{system.peers_v2}}, > the list of all jmx endpoints in a cluster can be fetch after just the > initial successful jmx connection and the > {{StorageServiceMBean.getJmxEndpoints()}} method. > And example of the difficulty can be illustrated through the potential > workaround… > Such a third-party tool could make a native protocol connection, and via the > driver obtain the list of all possible `host:port` native protocol > connections, and make a connection to each of these then requesting the > configuration virtual table, from which the jmx port can be obtained. This is > a rather cumbersome approach, and can involve third-party tools having to add > native connection functionality and dependencies. It's also currently not > possible because CASSANDRA-14573 does not provide the jmx port (it only > offers the yaml settings). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058139#comment-17058139 ] David Capwell commented on CASSANDRA-15564: --- if you think you will have the patch this week, w/e you feel like. if you think in 1 month, then maybe update the status to reflect? > Refactor repair coordinator so errors are consistent > > > Key: CASSANDRA-15564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15564 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 17h 20m > Remaining Estimate: 0h > > This is to split the change in CASSANDRA-15399 so the refactor is isolated > out. > Currently the repair coordinator special cases the exit cases at each call > site; this makes it so that errors can be inconsistent and there are cases > where proper complete isn't done (proper notifications, and forgetting to > update ActiveRepairService). > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058138#comment-17058138 ] David Capwell commented on CASSANDRA-15234: --- Yep, think we are on the same page here; I just prefer smaller patches so rather have ~3-4 (based off your split) patches rather than the same number of commits but one patch. > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058131#comment-17058131 ] David Capwell commented on CASSANDRA-15637: --- submitted a patch to update size_estimates to be local range and updated CqlInputFormat to expect this. Open issues: job range (cassandra.input.keyRange) doesn't make a lot of sense in 3.0. If the range alines with the ring then it will do what you expect, but if its offset at all then it becomes inaccurate and no longer able to split (no estimate) > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2 rows) > keyspace_name | table_name | range_start | range_end | > mean_partition_size | partitions_count > +--+-+-+-+-- > simpleuniform0 | simpletable0 | 2 | 6148914691236517205 | >
[jira] [Updated] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc de
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15637: -- Test and Documentation Plan: tested cluster using MapReduce. The test case targeted is in the description Status: Patch Available (was: Open) > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2 rows) > keyspace_name | table_name | range_start | range_end | > mean_partition_size | partitions_count > +--+-+-+-+-- > simpleuniform0 | simpletable0 | 2 | 6148914691236517205 | > 87 | 243072 > (1 rows) > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count >
[jira] [Updated] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc de
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15637: --- Labels: pull-request-available (was: ) > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2 rows) > keyspace_name | table_name | range_start | range_end | > mean_partition_size | partitions_count > +--+-+-+-+-- > simpleuniform0 | simpletable0 | 2 | 6148914691236517205 | > 87 | 243072 > (1 rows) > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -6148914691236517206 | -6148914691236517205 > | 87 |
[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058092#comment-17058092 ] Alex Petrov commented on CASSANDRA-15564: - Thank you for the patch. Committed to trunk with [cd9fd9e83f507e2bab5075399d812e3fb4368920|https://github.com/apache/cassandra/commit/cd9fd9e83f507e2bab5075399d812e3fb4368920]. I'll include dtest part of this ticket to [CASSANDRA-15539], and will ping you again for a rebase in other branches. Shouold we keep this one open for now? > Refactor repair coordinator so errors are consistent > > > Key: CASSANDRA-15564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15564 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 17h 20m > Remaining Estimate: 0h > > This is to split the change in CASSANDRA-15399 so the refactor is isolated > out. > Currently the repair coordinator special cases the exit cases at each call > site; this makes it so that errors can be inconsistent and there are cases > where proper complete isn't done (proper notifications, and forgetting to > update ActiveRepairService). > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15638) Starting of Cassandra node faking with below message
[ https://issues.apache.org/jira/browse/CASSANDRA-15638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15638: - Resolution: Invalid Status: Resolved (was: Triage Needed) Again, jira is not a support system and you should contact the community for help either on the mailing list or slack: http://cassandra.apache.org/community/ > Starting of Cassandra node faking with below message > -- > > Key: CASSANDRA-15638 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15638 > Project: Cassandra > Issue Type: Task >Reporter: Chirantan >Priority: Normal > > Starting of Cassandra node faking with below message. in debug log and > cassandra log > DEBUG [MessagingService-Outgoing-/10.134.180.97-Gossip] 2020-03-12 > 10:21:40,350 OutboundTcpConnection.java:546 - Unable to connect to > /10.134.180.97 > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0] > at sun.nio.ch.Net.connect(Net.java:481) ~[na:1.8.0] > at sun.nio.ch.Net.connect(Net.java:473) ~[na:1.8.0] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:662) > ~[na:1.8.0] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:434) > [apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262) > [apache-cassandra-3.11.6.jar:3.11.6] > DEBUG [MessagingService-Outgoing-/10.134.180.97-Gossip] 2020-03-12 > 10:21:40,451 OutboundTcpConnection.java:546 - Unable to connect to > /10.134.180.97 > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0] > at sun.nio.ch.Net.connect(Net.java:481) ~[na:1.8.0] > at sun.nio.ch.Net.connect(Net.java:473) ~[na:1.8.0] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:662) > ~[na:1.8.0] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:434) > [apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262) > [apache-cassandra-3.11.6.jar:3.11.6] > > > cassandtra log > > Exception (java.lang.RuntimeException) encountered during startup: Unable to > gossip with any peers > java.lang.RuntimeException: Unable to gossip with any peers > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1530) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:586) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:844) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:703) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:652) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:397) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:757) > ERROR [main] 2020-03-12 10:22:11,485 CassandraDaemon.java:774 - Exception > encountered during startup > java.lang.RuntimeException: Unable to gossip with any peers > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1530) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:586) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:844) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:703) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:652) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:397) > [apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) > [apache-cassandra-3.11.6.jar:3.11.6] > at >
[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058024#comment-17058024 ] Benedict Elliott Smith edited comment on CASSANDRA-15234 at 3/12/20, 3:39 PM: -- My personal view is that there's no need to overcomplicate this ticket, and that perhaps David Capwell anticipates greater complexity in addressing it than I do. My goal is to make the config more intuitive, consistent and also to not make any assumptions about reasonable units. Every property has its own units picked arbitrarily, and this is confusing. Publishing a simple regex to validate the value types for tools seems sufficient to address [~rssvihla]'s concerns, I think? And I don't think we should make it pluggable, just accept quantities and rates, and we can be quite restrictive, even just accepting GiB/s, MiB/s and KiB/s. My personal approach would be to, in separate commits but one ticket # Move properties to a single file, with strongly typed methods and sensible names for fetching the property. # Note property names present in either {{Config}} or via system properties, that relate to the same concept but use different terms - not just prefixes like otc_, but also: #* Why do we use {{enable}}, {{enabled}} and {{disable}} in our property names? Why does it sometimes go at the start or end? #* Why do we use both {{dc}} and {{datacenter}}? #* Who does {{max}} sometimes go at the start, sometimes in the middle? #* ... # Rename things, and make sure {{Config}} and system properties look for both old and new forms # Support super simple parsing for rate and size # Done was (Author: benedict): My personal view is that there's no need to overcomplicate this ticket, and that perhaps David Capwell anticipates greater complexity in addressing it than I do. My goal is to make the config more intuitive, consistent and also to not make any assumptions about reasonable units. Every property has its own units picked arbitrarily, and this is confusing. Publishing a simple regex to validate the value types for tools seems sufficient to address [~rssvihla]'s concerns, I think? And I don't think we should make it pluggable, just accept quantities and rates, and we can be quite restrictive, even just accepting GiB/s, MiB/s and KiB/s. My personal approach would be to, in separate commits but one ticket # Move properties to a single file, with strongly typed methods and sensible names for fetching the property. # Note property names present in either {{Config}} or via system properties, that relate to the same concept but use different terms - not just prefixes like otc_, but also: #* Why do we use {{enable}}, {{enabled}} and {{disable}} in our property names? Why does it sometimes go at the start or end? #* Why do we use both {{dc}} and {{datacenter}}? #* Who does {{max}} sometimes go at the start, sometimes in the middle? #* ... # Rename things, and make sure {{Config}} and system properties look for both old and new forms # Done > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058024#comment-17058024 ] Benedict Elliott Smith commented on CASSANDRA-15234: My personal view is that there's no need to overcomplicate this ticket, and that perhaps David Capwell anticipates greater complexity in addressing it than I do. My goal is to make the config more intuitive, consistent and also to not make any assumptions about reasonable units. Every property has its own units picked arbitrarily, and this is confusing. Publishing a simple regex to validate the value types for tools seems sufficient to address [~rssvihla]'s concerns, I think? And I don't think we should make it pluggable, just accept quantities and rates, and we can be quite restrictive, even just accepting GiB/s, MiB/s and KiB/s. My personal approach would be to, in separate commits but one ticket # Move properties to a single file, with strongly typed methods and sensible names for fetching the property. # Note property names present in either {{Config}} or via system properties, that relate to the same concept but use different terms - not just prefixes like otc_, but also: #* Why do we use {{enable}}, {{enabled}} and {{disable}} in our property names? Why does it sometimes go at the start or end? #* Why do we use both {{dc}} and {{datacenter}}? #* Who does {{max}} sometimes go at the start, sometimes in the middle? #* ... # Rename things, and make sure {{Config}} and system properties look for both old and new forms # Done > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15638) Starting of Cassandra node faking with below message
Chirantan created CASSANDRA-15638: - Summary: Starting of Cassandra node faking with below message Key: CASSANDRA-15638 URL: https://issues.apache.org/jira/browse/CASSANDRA-15638 Project: Cassandra Issue Type: Task Reporter: Chirantan Starting of Cassandra node faking with below message. in debug log and cassandra log DEBUG [MessagingService-Outgoing-/10.134.180.97-Gossip] 2020-03-12 10:21:40,350 OutboundTcpConnection.java:546 - Unable to connect to /10.134.180.97 java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0] at sun.nio.ch.Net.connect(Net.java:481) ~[na:1.8.0] at sun.nio.ch.Net.connect(Net.java:473) ~[na:1.8.0] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:662) ~[na:1.8.0] at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:434) [apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262) [apache-cassandra-3.11.6.jar:3.11.6] DEBUG [MessagingService-Outgoing-/10.134.180.97-Gossip] 2020-03-12 10:21:40,451 OutboundTcpConnection.java:546 - Unable to connect to /10.134.180.97 java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0] at sun.nio.ch.Net.connect(Net.java:481) ~[na:1.8.0] at sun.nio.ch.Net.connect(Net.java:473) ~[na:1.8.0] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:662) ~[na:1.8.0] at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:434) [apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262) [apache-cassandra-3.11.6.jar:3.11.6] cassandtra log Exception (java.lang.RuntimeException) encountered during startup: Unable to gossip with any peers java.lang.RuntimeException: Unable to gossip with any peers at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1530) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:586) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:844) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:703) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:652) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:397) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:757) ERROR [main] 2020-03-12 10:22:11,485 CassandraDaemon.java:774 - Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any peers at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1530) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:586) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:844) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:703) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:652) ~[apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:397) [apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) [apache-cassandra-3.11.6.jar:3.11.6] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:757) [apache-cassandra-3.11.6.jar:3.11.6] INFO [StorageServiceShutdownHook] 2020-03-12 10:22:11,500 HintsService.java:209 - Paused hints dispatch WARN [StorageServiceShutdownHook] 2020-03-12 10:22:11,501 Gossiper.java:1655 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown INFO [StorageServiceShutdownHook] 2020-03-12 10:22:11,501 MessagingService.java:985 - Waiting for messaging service to quiesce INFO [ACCEPT-/10.134.179.29] 2020-03-12 10:22:11,502 MessagingService.java:1346 - MessagingService has terminated the accept() thread INFO [StorageServiceShutdownHook]
[jira] [Commented] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc
[ https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058010#comment-17058010 ] David Capwell commented on CASSANDRA-15637: --- Tried a simple prototype to get the below expression working {code} SELECT size_estimate("keyspace", "table") {code} seems the frontend is too limited at the moment. I got it working by faking a table to read from but to properly implement that would be very deep change to the frontend. Given that I think local range is the only thing that makes sense. We would have to drop user defined ranges as that doesn't work with pre-computed estimates (ignoring the fact you could get estimate for local primary range then trim off the estimate for the range removed). > CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference > between thrift and the new system.size_estimates table when dealing with > multiple dc deployments > -- > > Key: CASSANDRA-15637 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15637 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > > In 3.0 CqlInputFormat switched away from thrift in favor of a new > system.size_estimates table, but the semantics changed when dealing with > multiple DCs or when Cassandra is not collocated with Hadoop. > The core issues are: > * system.size_estimates uses the primary range, in a multi-dc setup this > could lead to uneven ranges > example: > {code} > DC1: [0, 10, 20, 30] > DC2: [1, 11, 21, 31] > DC3: [2, 12, 22, 32] > {code} > Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, > 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), > [31, 32), [32, 0). > Given this the only ranges that are more than one token are: [2, 10), [12, > 20), [22, 30). > * system.size_estimates is not replicated so need to hit every node in the > cluster to get estimates, if nodes are down in the DC with non-size-1 ranges > there is no way to get a estimate. > * CqlInputFormat used to call describe_local_ring so all interactions were > with a single DC, the java driver doesn't filter the DC so looks to allow > cross DC traffic and includes nodes from other DCs in the replica set; in the > example above, the amount of splits went from 4 to 12. > * CqlInputFormat used to call describe_splits_ex to dynamically calculate the > estimates, this was on the "local primary range" and was able to hit replicas > to create estimates if the primary was down. With system.size_estimates we no > longer have backup and no longer expose the "local primary range" in multi-dc. > * CqlInputFormat had a config cassandra.input.keyRange which let you define > your own range. If the range doesn't perfectly match the local range then > the intersectWith calls will produce ranges with no estimates. Example: [0, > 10, 20], cassandra.input.keyRange=5,15. This won't find any estimates so > will produce 2 splits with 128 estimate (default when not found). > * CqlInputFormat special cases Cassandra being collocated with Hadoop and > assumes this when querying system.size_estimates as it doesn't filter to the > specific host, this means that non-collocated deployments randomly select the > nodes and create splits with ranges the hosts do not have locally. > The problems are deterministic to replicate, the following test will show it > 1) deploy a 3 DC cluster with 3 nodes each > 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2 > 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: > ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3}; > 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY > (pk, ck)) > 5) insert 500k partitions uniformly: [0, 500,000) > 6) wait until estimates catch up to writes > 7) for all nodes, SELECT * FROM system.size_estimates > You will get the following > {code} > keyspace_name | table_name | range_start | range_end > | mean_partition_size | partitions_count > +--+--+--+-+-- > simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 > | 87 | 122240 > simpleuniform0 | simpletable0 | 6148914691236517207 | -9223372036854775808 > | 87 | 121472 > (2 rows) > keyspace_name | table_name | range_start | range_end | > mean_partition_size | partitions_count >
[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057969#comment-17057969 ] Ekaterina Dimitrova commented on CASSANDRA-15234: - Ok, back to this one. Unfortunately, I was parallelizing with some other stuff at the beginning of the week. Thanks [~mck] for sharing that ticket, that was helpful. I started from looking into standardizing the parameters names and 1) of the plan [~dcapwell] shared. Also, my idea was to reshuffle a bit the order of the parameters in the yaml and try to put them in sections, for example: * Quick start The minimal properties needed for configuring a cluster. * Commonly used Properties most frequently used when configuring Cassandra. * Performance Tuning Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes. * Advanced Properties for advanced users or properties that are less commonly used. * SecurityServer and client security settings. Yes, this idea came after looking at online documentations and trying to think from user's perspective. On another topic, looking at the plan, I realized that you might want actually a parser and unit converter for the values in order to give freedom to the users to add whatever value they would like in the yaml. Initially, I was thinking that we talk about the suffixes of the parameters names, the way we see them now and some name changes and backward compatibilities. So my question is what is the reason behind doing this and shall we split this Jira then in a couple of smaller ones if we go into that direction? > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057944#comment-17057944 ] Ryan Svihla commented on CASSANDRA-15234: - I think this is overall a good idea, but it definitely needs some limits on the number of combinations supported. Let's just look at a hypothetical new configuration `compaction_throughput` instead of `compaction_throughput_mb_per_sec` (which is a poster child for a difficult to remember name in need of review): * Any downstream tooling that reads configuration will have to take on everything we add, which is fine, but the more math we require the worse it is on them to get updated. An older tool will read compaction_throughput_mb_per_sec and it was self documenting. A new tool will have to take into account every variant we support for MiB/MB/mb/bytes/etc/etc * what about `500mb` vs `500mb/s` vs `500 mbs` (note the space) vs `500MiB/s` vs `500 mb/s` (note the 2 spaces). Which of those is obviously valid or wrong at a glance to a new user? So if we're going to do it, definitely only accept one valid that's the same for everything..I still think it's adding some learning curve for new users. > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > ms|millis(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14793) Improve system table handling when losing a disk when using JBOD
[ https://issues.apache.org/jira/browse/CASSANDRA-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer reassigned CASSANDRA-14793: -- Assignee: Benjamin Lerer > Improve system table handling when losing a disk when using JBOD > > > Key: CASSANDRA-14793 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14793 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Marcus Eriksson >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.0 > > > We should improve the way we handle disk failures when losing a disk in a > JBOD setup > One way could be to pin the system tables to a special data directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15629) Fix flakey testSendSmall - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Svihla reassigned CASSANDRA-15629: --- Assignee: (was: Ryan Svihla) > Fix flakey testSendSmall - org.apache.cassandra.net.ConnectionTest > -- > > Key: CASSANDRA-15629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15629 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Yifan Cai >Priority: Normal > Fix For: 4.0-beta > > > The test fails sometimes with the following error message and trace. > {code:java} > processed count values don't match expected:<10> but was:<9> > junit.framework.AssertionFailedError: processed count values don't match > expected:<10> but was:<9> > at > org.apache.cassandra.net.ConnectionUtils$InboundCountChecker.doCheck(ConnectionUtils.java:217) > at > org.apache.cassandra.net.ConnectionUtils$InboundCountChecker.check(ConnectionUtils.java:200) > at > org.apache.cassandra.net.ConnectionTest.lambda$testSendSmall$11(ConnectionTest.java:305) > at > org.apache.cassandra.net.ConnectionTest.lambda$doTest$8(ConnectionTest.java:242) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:262) > at org.apache.cassandra.net.ConnectionTest.doTest(ConnectionTest.java:240) > at org.apache.cassandra.net.ConnectionTest.test(ConnectionTest.java:229) > at > org.apache.cassandra.net.ConnectionTest.testSendSmall(ConnectionTest.java:277) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057843#comment-17057843 ] Alex Petrov commented on CASSANDRA-15564: - +1 on the dtest part. Just wanted to mention that we only had to register outbound filters after registering mock messaging. Inbound filters can be registered any time since they're not stacked in the same way with mock messaging. > Refactor repair coordinator so errors are consistent > > > Key: CASSANDRA-15564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15564 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 17h > Remaining Estimate: 0h > > This is to split the change in CASSANDRA-15399 so the refactor is isolated > out. > Currently the repair coordinator special cases the exit cases at each call > site; this makes it so that errors can be inconsistent and there are cases > where proper complete isn't done (proper notifications, and forgetting to > update ActiveRepairService). > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15629) Fix flakey testSendSmall - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Svihla reassigned CASSANDRA-15629: --- Assignee: Ryan Svihla > Fix flakey testSendSmall - org.apache.cassandra.net.ConnectionTest > -- > > Key: CASSANDRA-15629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15629 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Yifan Cai >Assignee: Ryan Svihla >Priority: Normal > Fix For: 4.0-beta > > > The test fails sometimes with the following error message and trace. > {code:java} > processed count values don't match expected:<10> but was:<9> > junit.framework.AssertionFailedError: processed count values don't match > expected:<10> but was:<9> > at > org.apache.cassandra.net.ConnectionUtils$InboundCountChecker.doCheck(ConnectionUtils.java:217) > at > org.apache.cassandra.net.ConnectionUtils$InboundCountChecker.check(ConnectionUtils.java:200) > at > org.apache.cassandra.net.ConnectionTest.lambda$testSendSmall$11(ConnectionTest.java:305) > at > org.apache.cassandra.net.ConnectionTest.lambda$doTest$8(ConnectionTest.java:242) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:262) > at org.apache.cassandra.net.ConnectionTest.doTest(ConnectionTest.java:240) > at org.apache.cassandra.net.ConnectionTest.test(ConnectionTest.java:229) > at > org.apache.cassandra.net.ConnectionTest.testSendSmall(ConnectionTest.java:277) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15605) Broken dtest replication_test.py::TestSnitchConfigurationUpdate
[ https://issues.apache.org/jira/browse/CASSANDRA-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057724#comment-17057724 ] Ryan Svihla commented on CASSANDRA-15605: - After running for awhile I did get a failure on a different test (not one of the one's listed) so that maybe worth another jira..but no flakey failures on the listed tests above after an overnight run. > Broken dtest replication_test.py::TestSnitchConfigurationUpdate > --- > > Key: CASSANDRA-15605 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15605 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Sam Tunnicliffe >Assignee: Ryan Svihla >Priority: Normal > Fix For: 4.0-alpha > > > Noticed this failing on a couple of CI runs and repros when running trunk > locally and on CircleCI > 2 or 3 tests are consistently failing: > * {{test_rf_expand_gossiping_property_file_snitch}} > * {{test_rf_expand_property_file_snitch}} > * {{test_move_forwards_between_and_cleanup}} > [https://circleci.com/workflow-run/f23f13a9-bbdc-4764-8336-109517e137f1] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org