[jira] [Commented] (CASSANDRA-14507) OutboundMessagingConnection backlog is not fully written in case of race conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506789#comment-16506789 ] Dinesh Joshi commented on CASSANDRA-14507: -- [~sbtourist] thank you for the bug report. {quote}3) The writer threads are scheduled back and add to the backlog, but the channel state is READY at this point, so those writes would sit in the backlog and expire. {quote} I am not clear on how the messages would simply sit in the backlog queue and expire? Wouldn't they be picked up by the {{MessageOutHandler::channelWritabilityChanged}} and then get drained? What am I missing here? > OutboundMessagingConnection backlog is not fully written in case of race > conditions > --- > > Key: CASSANDRA-14507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14507 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Priority: Major > > The {{OutboundMessagingConnection}} writes into a backlog queue before the > connection handshake is successfully completed, and then writes such backlog > to the channel as soon as the successful handshake moves the channel state to > {{READY}}. > This is unfortunately race prone, as the following could happen: > 1) One or more writer threads see the channel state as {{NOT_READY}} in > {{#sendMessage()}} and are about to enqueue to the backlog, but they get > descheduled by the OS. > 2) The handshake thread is scheduled by the OS and moves the channel state to > {{READY}}, emptying the backlog. > 3) The writer threads are scheduled back and add to the backlog, but the > channel state is {{READY}} at this point, so those writes would sit in the > backlog and expire. > Please note a similar race condition exists between > {{OutboundMessagingConnection#sendMessage()}} and > {{MessageOutHandler#channelWritabilityChanged()}}, which is way more serious > as the channel writability could frequently change, luckily it looks like > {{ChannelWriter#write()}} never gets invoked with {{checkWritability}} at > {{true}} (so writes never go to the backlog when the channel is not writable). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506754#comment-16506754 ] Jordan West commented on CASSANDRA-14499: - {quote}disabling gosspi alone is insufficient, also need to disable native {quote} Agreed. I hadn't updated the description to reflect it but what I am working on does this as well. {quote}still not sure I buy the argument that it’s wrong to serve reads in this case - it may be true that some table is getting out of sync, but that doesn’t mean every table is, {quote} I agree it depends on the workload for each specific dataset but since we can't know which we have we have to assume it could get really out of sync. {quote}and we already have a mechanism to deal with nodes that can serve reads but not writes (speculating on the read repair). {quote} Even if we speculate we still attempt it. That work will always be for naught and being at quota is likely a prolonged state (the ways out of it take a while). {quote}If you don’t serve reads either, than any GC pause will be guaranteed to impact client request latency as we can’t soeculate around it in the common rf=3 case. {quote} This is true. But thats almost the same as losing a node because its disk has been filled up completely. If we have one unhealthy node we are another unhealthy node away from unavailability in the rf=3/quorum case. That said, I'll consider the reads more over the weekend. Its a valid concern. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14509) AsyncOneResponse uses the incorrect timeout
[ https://issues.apache.org/jira/browse/CASSANDRA-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506751#comment-16506751 ] Dinesh Joshi commented on CASSANDRA-14509: -- [~krummas] I have updated the branch with a unit test. > AsyncOneResponse uses the incorrect timeout > --- > > Key: CASSANDRA-14509 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14509 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.x > > > {{AsyncOneResponse}} has a bug where it uses the initial timeout value > instead of the adjustedTimeout. Combined with passing in the wrong > {{TimeUnit}}, it leads to a shorter timeout than expected. This can have > unintended consequences for example, in > {{StorageService::sendReplicationNotification}} instead of waiting 10 seconds > ({{request_timeout_in_ms}}), we wait for {{1}} Nano Seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506748#comment-16506748 ] Jeff Jirsa commented on CASSANDRA-14499: 1) disabling gosspi alone is insufficient, also need to disable native 2) recovery is likely some combination of compactions and host replacement 3) still not sure I buy the argument that it’s wrong to serve reads in this case - it may be true that some table is getting out of sync, but that doesn’t mean every table is, and we already have a mechanism to deal with nodes that can serve reads but not writes (speculating on the read repair). If you don’t serve reads either, than any GC pause will be guaranteed to impact client request latency as we can’t soeculate around it in the common rf=3 case. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506746#comment-16506746 ] Jordan West edited comment on CASSANDRA-14499 at 6/9/18 12:36 AM: -- The other reason the OS level wouldn't work is we are trying to track *live* data, which the OS can't tell the difference between. EDIT: also to clarify, the goal here isn't to implement a perfect quota. There will be some room for error where the quota can be exceeded. The goal is to the mark the node unhealthy when it reaches this level and to have enough headroom for compaction or other operations to get it to a healthy state. Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some offline. Since the node can only get more and more out of sync while not taking write traffic and can't participate in (read) repair until the amount of storage used is below quota, we thought it better to disable both reads and writes. Less-blocking and speculative read repair makes us more available in this case (as it should). Disabling gossip is a quick route to disabling reads/writes. Is it the best approach to doing so? I'm not 100%. My concern is for how the operator gets back to a healthy state once a quota is reached on a node. They have a few options: migrate data to a bigger node, compaction catches up and deletes data, quota is raised so its not met anymore, node(s) are added to take storage responsibility away from the node, or data is forcefully deleted from the node. We need to ensure we don't prevent those operations from taking place. I've been discussing this with [~jasobrown] offline as well. was (Author: jrwest): The other reason the OS level wouldn't work is we are trying to track *live* data, which the OS can't tell the difference between. Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some offline. Since the node can only get more and more out of sync while not taking write traffic and can't participate in (read) repair until the amount of storage used is below quota, we thought it better to disable both reads and writes. Less-blocking and speculative read repair makes us more available in this case (as it should). Disabling gossip is a quick route to disabling reads/writes. Is it the best approach to doing so? I'm not 100%. My concern is for how the operator gets back to a healthy state once a quota is reached on a node. They have a few options: migrate data to a bigger node, compaction catches up and deletes data, quota is raised so its not met anymore, node(s) are added to take storage responsibility away from the node, or data is forcefully deleted from the node. We need to ensure we don't prevent those operations from taking place. I've been discussing this with [~jasobrown] offline as well. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506746#comment-16506746 ] Jordan West commented on CASSANDRA-14499: - The other reason the OS level wouldn't work is we are trying to track *live* data, which the OS can't tell the difference between. Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some offline. Since the node can only get more and more out of sync while not taking write traffic and can't participate in (read) repair until the amount of storage used is below quota, we thought it better to disable both reads and writes. Less-blocking and speculative read repair makes us more available in this case (as it should). Disabling gossip is a quick route to disabling reads/writes. Is it the best approach to doing so? I'm not 100%. My concern is for how the operator gets back to a healthy state once a quota is reached on a node. They have a few options: migrate data to a bigger node, compaction catches up and deletes data, quota is raised so its not met anymore, node(s) are added to take storage responsibility away from the node, or data is forcefully deleted from the node. We need to ensure we don't prevent those operations from taking place. I've been discussing this with [~jasobrown] offline as well. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14489) Test cqlsh authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Bannister resolved CASSANDRA-14489. --- Resolution: Not A Problem This problem didn't actually exist. > Test cqlsh authentication > - > > Key: CASSANDRA-14489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14489 > Project: Cassandra > Issue Type: Sub-task >Reporter: Patrick Bannister >Priority: Critical > Labels: cqlsh, security, test > Fix For: 4.x > > > Coverage analysis of the cqlshlib unittests (pylib/cqlshlib/test/test*.py) > and the dtest cqlsh_tests (cqlsh_tests.py and cqlsh_copy_tests.py) showed no > coverage of authentication related code. > Before we can release a port of cqlsh, we should identify an existing test > for cqlsh authentication, or write a new one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14489) Test cqlsh authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506718#comment-16506718 ] Patrick Bannister commented on CASSANDRA-14489: --- There are some tests for cqlsh username/password login in the dtests. There are several of them in cqlsh_tests/cqlsh_tests.py::TestCqlLogin. I re-read my coverage report, and in fact, we did observe coverage of connecting with cqlsh using cassandra.auth.PlainTextAuthProvider. Furthermore, the relevant dtests are passing for the pure Python 3 port. We don't have coverage of using the LOGIN command during a connected cqlsh session, but I think it's sufficient that we're already testing the initial login with a password. > Test cqlsh authentication > - > > Key: CASSANDRA-14489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14489 > Project: Cassandra > Issue Type: Sub-task >Reporter: Patrick Bannister >Priority: Critical > Labels: cqlsh, security, test > Fix For: 4.x > > > Coverage analysis of the cqlshlib unittests (pylib/cqlshlib/test/test*.py) > and the dtest cqlsh_tests (cqlsh_tests.py and cqlsh_copy_tests.py) showed no > coverage of authentication related code. > Before we can release a port of cqlsh, we should identify an existing test > for cqlsh authentication, or write a new one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14510) Flaky uTest: RemoveTest.testRemoveHostId
Jay Zhuang created CASSANDRA-14510: -- Summary: Flaky uTest: RemoveTest.testRemoveHostId Key: CASSANDRA-14510 URL: https://issues.apache.org/jira/browse/CASSANDRA-14510 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Jay Zhuang https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/619/testReport/org.apache.cassandra.service/RemoveTest/testRemoveHostId/ {noformat} Failed 13 times in the last 30 runs. Flakiness: 31%, Stability: 56% {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14509) AsyncOneResponse uses the incorrect timeout
[ https://issues.apache.org/jira/browse/CASSANDRA-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14509: - Fix Version/s: 4.x Status: Patch Available (was: Open) ||14509|| |[branch|https://github.com/dineshjoshi/cassandra/tree/trunk-14509]| |[utests & dtests|https://circleci.com/gh/dineshjoshi/workflows/cassandra/tree/trunk-14509]| || > AsyncOneResponse uses the incorrect timeout > --- > > Key: CASSANDRA-14509 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14509 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.x > > > {{AsyncOneResponse}} has a bug where it uses the initial timeout value > instead of the adjustedTimeout. Combined with passing in the wrong > {{TimeUnit}}, it leads to a shorter timeout than expected. This can have > unintended consequences for example, in > {{StorageService::sendReplicationNotification}} instead of waiting 10 seconds > ({{request_timeout_in_ms}}), we wait for {{1}} Nano Seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506701#comment-16506701 ] Jeff Jirsa commented on CASSANDRA-14499: Not clear to me how you'd do this as gracefully at the OS level as you can at the cassandra level (by, e.g., blocking writes and inbound streaming). It's also not clear to me that disabling gossip is the right answer. You can still serve reads, the coordinator will know if it's out of sync and can attempt a (now non-blocking and speculating) read repair if necessary. If read repair is required to meet consistency, we'll fail there, but that's still likely better than not serving the already consistent read. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14509) AsyncOneResponse uses the incorrect timeout
Dinesh Joshi created CASSANDRA-14509: Summary: AsyncOneResponse uses the incorrect timeout Key: CASSANDRA-14509 URL: https://issues.apache.org/jira/browse/CASSANDRA-14509 Project: Cassandra Issue Type: Bug Components: Core Reporter: Dinesh Joshi Assignee: Dinesh Joshi {{AsyncOneResponse}} has a bug where it uses the initial timeout value instead of the adjustedTimeout. Combined with passing in the wrong {{TimeUnit}}, it leads to a shorter timeout than expected. This can have unintended consequences for example, in {{StorageService::sendReplicationNotification}} instead of waiting 10 seconds ({{request_timeout_in_ms}}), we wait for {{1}} Nano Seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506659#comment-16506659 ] Joseph Lynch edited comment on CASSANDRA-14459 at 6/8/18 11:36 PM: --- I think that we need some way to get latency measurements for hosts that have been excluded from traffic due to high minimums. For example if during the initial {{PingMessages}} a local DC host gets a very high measurement (e.g. 100ms) we will never send traffic to it ever. My understanding is that's why we reset in the first place. I'll try to come up with a solution that doesn't involve additional traffic. was (Author: jolynch): I think that we need some way to get latency measurements for hosts that have been excluded from traffic due to high minimums. For example if during the initial {{PingMessages}} a local DC host gets a very high measurement (e.g. 100ms) we will never send traffic to it ever. My understanding is that's why we reset in the first place. I'll work on a feedback mechanism for the {{DES}} to ask for latency probes (which I guess would be best implemented as {{PingMessages}} since you're concerned about {{EchoMessages}}). I see possible two designs: one where I send the probes directly from the {{DES}} or I can have a method expressing the desire for probes that propagates up to e.g. the {{MessagingService}}. Are there better options? > DynamicEndpointSnitch should never prefer latent nodes > -- > > Key: CASSANDRA-14459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14459 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Fix For: 4.x > > > The DynamicEndpointSnitch has two unfortunate behaviors that allow it to > provide latent hosts as replicas: > # Loses all latency information when Cassandra restarts > # Clears latency information entirely every ten minutes (by default), > allowing global queries to be routed to _other datacenters_ (and local > queries cross racks/azs) > This means that the first few queries after restart/reset could be quite slow > compared to average latencies. I propose we solve this by resetting to the > minimum observed latency instead of completely clearing the samples and > extending the {{isLatencyForSnitch}} idea to a three state variable instead > of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows > {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the > DS should use those measurements if it only has one or fewer samples for a > host. This fixes both problems because on process restart we send out > {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to > effectively the RTT of the hosts (also at that point normal gossip > {{EchoMessages}} have an opportunity to add an additional latency > measurement). > This strategy also nicely deals with the "a host got slow but now it's fine" > problem that the DS resets were (afaik) designed to stop because the > {{EchoMessage}} ping latency will count only after the reset for that host. > Ping latency is a more reasonable lower bound on host latency (as opposed to > status quo of zero). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14237) Unittest failed: org.apache.cassandra.utils.BitSetTest.compareBitSets
[ https://issues.apache.org/jira/browse/CASSANDRA-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang resolved CASSANDRA-14237. Resolution: Won't Fix Makes sense to me. Closing as "won't fix" > Unittest failed: org.apache.cassandra.utils.BitSetTest.compareBitSets > - > > Key: CASSANDRA-14237 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14237 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Priority: Minor > Labels: testing > > {noformat} > [junit] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 0.822 sec > [junit] > [junit] Testcase: compareBitSets(org.apache.cassandra.utils.BitSetTest): > Caused an ERROR > [junit] java.io.FileNotFoundException: /usr/share/dict/words (No such > file or directory) > [junit] java.lang.RuntimeException: java.io.FileNotFoundException: > /usr/share/dict/words (No such file or directory) > [junit] at > org.apache.cassandra.utils.KeyGenerator$WordGenerator.reset(KeyGenerator.java:137) > [junit] at > org.apache.cassandra.utils.KeyGenerator$WordGenerator.(KeyGenerator.java:126) > [junit] at > org.apache.cassandra.utils.BitSetTest.compareBitSets(BitSetTest.java:50) > [junit] Caused by: java.io.FileNotFoundException: /usr/share/dict/words > (No such file or directory) > [junit] at java.io.FileInputStream.open0(Native Method) > [junit] at java.io.FileInputStream.open(FileInputStream.java:195) > [junit] at java.io.FileInputStream.(FileInputStream.java:138) > [junit] at java.io.FileInputStream.(FileInputStream.java:93) > [junit] at > org.apache.cassandra.utils.KeyGenerator$WordGenerator.reset(KeyGenerator.java:135) > [junit] > [junit] > [junit] Test org.apache.cassandra.utils.BitSetTest FAILED > {noformat} > Works fine on my mac but failed on some linux hosts which do not have > {{/usr/share/dict/words}}. It's the same issue as CASSANDRA-7389, should we > backport that? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506659#comment-16506659 ] Joseph Lynch commented on CASSANDRA-14459: -- I think that we need some way to get latency measurements for hosts that have been excluded from traffic due to high minimums. For example if during the initial {{PingMessages}} a local DC host gets a very high measurement (e.g. 100ms) we will never send traffic to it ever. My understanding is that's why we reset in the first place. I'll work on a feedback mechanism for the {{DES}} to ask for latency probes (which I guess would be best implemented as {{PingMessages}} since you're concerned about {{EchoMessages}}). I see possible two designs: one where I send the probes directly from the {{DES}} or I can have a method expressing the desire for probes that propagates up to e.g. the {{MessagingService}}. Are there better options? > DynamicEndpointSnitch should never prefer latent nodes > -- > > Key: CASSANDRA-14459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14459 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Fix For: 4.x > > > The DynamicEndpointSnitch has two unfortunate behaviors that allow it to > provide latent hosts as replicas: > # Loses all latency information when Cassandra restarts > # Clears latency information entirely every ten minutes (by default), > allowing global queries to be routed to _other datacenters_ (and local > queries cross racks/azs) > This means that the first few queries after restart/reset could be quite slow > compared to average latencies. I propose we solve this by resetting to the > minimum observed latency instead of completely clearing the samples and > extending the {{isLatencyForSnitch}} idea to a three state variable instead > of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows > {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the > DS should use those measurements if it only has one or fewer samples for a > host. This fixes both problems because on process restart we send out > {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to > effectively the RTT of the hosts (also at that point normal gossip > {{EchoMessages}} have an opportunity to add an additional latency > measurement). > This strategy also nicely deals with the "a host got slow but now it's fine" > problem that the DS resets were (afaik) designed to stop because the > {{EchoMessage}} ping latency will count only after the reset for that host. > Ping latency is a more reasonable lower bound on host latency (as opposed to > status quo of zero). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14508) Jenkins Slave doesn't have permission to write to /tmp/ directory
Jay Zhuang created CASSANDRA-14508: -- Summary: Jenkins Slave doesn't have permission to write to /tmp/ directory Key: CASSANDRA-14508 URL: https://issues.apache.org/jira/browse/CASSANDRA-14508 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Jay Zhuang Which is causing uTest failed, e.g.: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testCompressedReadUncompressedChunks/ h3. Error Message {noformat} java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db {noformat} h3. Stacktrace {noformat} java.lang.RuntimeException: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) at org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) at org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) at org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:118) at org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadUncompressedChunks(CompressedInputStreamTest.java:83) Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14146) [DTEST] cdc_test::TestCDC::test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space assertion always fails (Extra items in the left set)
[ https://issues.apache.org/jira/browse/CASSANDRA-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14146: --- Resolution: Fixed Status: Resolved (was: Patch Available) > [DTEST] > cdc_test::TestCDC::test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space > assertion always fails (Extra items in the left set) > > > Key: CASSANDRA-14146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14146 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Michael Kjellman >Priority: Major > > Dtest > cdc_test::TestCDC::test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space > always fails on an assertion. > the assert is the final step of the test and it checks that > pre_non_cdc_write_cdc_raw_segments == _get_cdc_raw_files(node.get_path()) > This fails 100% of the time locally, 100% of the time on circleci executed > under pytest, and 100% of the time for the past 40 test runs on ASF Jenkins > runs against trunk. > This is the only test failure (excluding flaky one-off failures) remaining on > the pytest dtest branch. I'm going to annotate the test with a skip marker > (including a reason reference to this JIRA)... when it's fixed we should also > remove the skip annotation from the test. > {code} > > assert pre_non_cdc_write_cdc_raw_segments == > > _get_cdc_raw_files(node.get_path()) > E AssertionError: assert {'/tmp/dtest-...169.log', ...} == > {'/tmp/dtest-v...169.log', ...} > E Extra items in the left set: > E > '/tmp/dtest-vrn4k8ov/test/node1/cdc_raw/CommitLog-7-1515030005097.log' > E > '/tmp/dtest-vrn4k8ov/test/node1/cdc_raw/CommitLog-7-1515030005098.log' > E Extra items in the right set: > E > '/tmp/dtest-vrn4k8ov/test/node1/cdc_raw/CommitLog-7-1515030005099.log' > E > '/tmp/dtest-vrn4k8ov/test/node1/cdc_raw/CommitLog-7-1515030005100.log' > E Use -v to get the full diff > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506549#comment-16506549 ] Jeremiah Jordan commented on CASSANDRA-14499: - isn't this pretty easy to do with OS level settings? Getting this tracking right across all places we uses disk seems like something we are bound to fail at, where using your OS would not? > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14457) Add a virtual table with current compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506543#comment-16506543 ] Jeff Jirsa commented on CASSANDRA-14457: [~iamaleksey] Re: {{CompactionMetrics}} - [~krummas] recently noted in passing that most of compaction has been slowly evolving over time and could probably use a nice, thorough, ground-up rewrite in the near future. May be worth a chat on the dev@ list about a potential redesign. > Add a virtual table with current compactions > > > Key: CASSANDRA-14457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14457 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: virtual-tables > Fix For: 4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14505) Removal of last element on a List deletes the entire row
[ https://issues.apache.org/jira/browse/CASSANDRA-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan resolved CASSANDRA-14505. - Resolution: Duplicate > Removal of last element on a List deletes the entire row > > > Key: CASSANDRA-14505 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14505 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: * Java: 1.8.0_171 > * SO: Ubuntu 18.04 LTS > * Cassandra: 3.11.2 >Reporter: André Paris >Assignee: Benjamin Lerer >Priority: Major > > The behavior of an element removal from a list by an UPDATE differs by how > the row was created: > Given the table > {{CREATE TABLE table_test (}} > {{ id int PRIMARY KEY,}} > {{ list list}} > {{)}} > If the row is created by an INSERT, the row remains after the UPDATE to > remove the last element on the list: > {{cqlsh:ks_test> INSERT INTO table_test (id, list ) VALUES ( 1, ['foo']) ;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +}} > 1 | ['foo'] > {{(1 rows)}} > {{cqlsh:ks_test> UPDATE table_test SET list = list - ['foo'] WHERE id=1;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +--}} > {{ 1 | null}} > {{(1 rows)}} > > But, if the row is created by an UPDATE, the row is deleted after the UPDATE > to remove the last element on the list: > {{cqlsh:ks_test> UPDATE table_test SET list = list + ['foo'] WHERE id=2;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +-}} > 2 | ['foo'] > {{(1 rows)}} > {{cqlsh:ks_test> UPDATE table_test SET list = list - ['foo'] WHERE id=2;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +--}} > {{(0 rows)}} > > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14505) Removal of last element on a List deletes the entire row
[ https://issues.apache.org/jira/browse/CASSANDRA-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506538#comment-16506538 ] Jeremiah Jordan commented on CASSANDRA-14505: - This has nothing to do with lists. In general if you "create" a row using UPDATE, rather than INSERT, when all columns are nulled out, the row will be gone. This is because INSERT creates a row marker, while UPDATE does not. > Removal of last element on a List deletes the entire row > > > Key: CASSANDRA-14505 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14505 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: * Java: 1.8.0_171 > * SO: Ubuntu 18.04 LTS > * Cassandra: 3.11.2 >Reporter: André Paris >Assignee: Benjamin Lerer >Priority: Major > > The behavior of an element removal from a list by an UPDATE differs by how > the row was created: > Given the table > {{CREATE TABLE table_test (}} > {{ id int PRIMARY KEY,}} > {{ list list}} > {{)}} > If the row is created by an INSERT, the row remains after the UPDATE to > remove the last element on the list: > {{cqlsh:ks_test> INSERT INTO table_test (id, list ) VALUES ( 1, ['foo']) ;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +}} > 1 | ['foo'] > {{(1 rows)}} > {{cqlsh:ks_test> UPDATE table_test SET list = list - ['foo'] WHERE id=1;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +--}} > {{ 1 | null}} > {{(1 rows)}} > > But, if the row is created by an UPDATE, the row is deleted after the UPDATE > to remove the last element on the list: > {{cqlsh:ks_test> UPDATE table_test SET list = list + ['foo'] WHERE id=2;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +-}} > 2 | ['foo'] > {{(1 rows)}} > {{cqlsh:ks_test> UPDATE table_test SET list = list - ['foo'] WHERE id=2;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +--}} > {{(0 rows)}} > > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506533#comment-16506533 ] Jordan West commented on CASSANDRA-14499: - [~jeromatron] I understand those concerns. This would be opt-in for folks who wanted automatic action taken and any such action should take care to not cause the node to flap, for example. One use case where we see this as valuable is QA/perf/test clusters that may not have the full monitoring setup but need to be protected from errant clients filling up disks to a point where worse things happen. The warning system can be accomplished today with monitoring and alerting on the same metrics. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14481) Using nodetool status after enabling Cassandra internal auth for JMX access fails with currently documented permissions
[ https://issues.apache.org/jira/browse/CASSANDRA-14481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14481: - Labels: security (was: ) > Using nodetool status after enabling Cassandra internal auth for JMX access > fails with currently documented permissions > --- > > Key: CASSANDRA-14481 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14481 > Project: Cassandra > Issue Type: Bug > Components: Documentation and Website > Environment: Apache Cassandra 3.11.2 > Centos 6.9 >Reporter: Valerie Parham-Thompson >Priority: Minor > Labels: security > > Using the documentation here: > [https://cassandra.apache.org/doc/latest/operating/security.html#cassandra-integrated-auth] > Running `nodetool status` on a cluster fails as follows: > {noformat} > error: Access Denied > -- StackTrace -- > java.lang.SecurityException: Access Denied > at > org.apache.cassandra.auth.jmx.AuthorizationProxy.invoke(AuthorizationProxy.java:172) > at com.sun.proxy.$Proxy4.invoke(Unknown Source) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:835) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > at > sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:283) > at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:260) > at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) > at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) > at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) > at > javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) > at > javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) > at com.sun.proxy.$Proxy7.effectiveOwnership(Unknown Source) > at org.apache.cassandra.tools.NodeProbe.effectiveOwnership(NodeProbe.java:489) > at org.apache.cassandra.tools.nodetool.Status.execute(Status.java:74) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:255) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:169) {noformat} > Permissions on two additional mbeans were required: > {noformat} > GRANT SELECT, EXECUTE ON MBEAN ‘org.apache.cassandra.db:type=StorageService’ > TO jmx; > GRANT EXECUTE ON MBEAN ‘org.apache.cassandra.db:type=EndpointSnitchInfo’ TO > jmx; > {noformat} > I've updated the documentation in my fork here and would like to do a pull > request for the addition: > [https://github.com/dataindataout/cassandra/blob/trunk/doc/source/operating/security.rst#cassandra-integrated-auth] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14489) Test cqlsh authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14489: - Labels: cqlsh security test (was: cqlsh test) > Test cqlsh authentication > - > > Key: CASSANDRA-14489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14489 > Project: Cassandra > Issue Type: Sub-task >Reporter: Patrick Bannister >Priority: Critical > Labels: cqlsh, security, test > Fix For: 4.x > > > Coverage analysis of the cqlshlib unittests (pylib/cqlshlib/test/test*.py) > and the dtest cqlsh_tests (cqlsh_tests.py and cqlsh_copy_tests.py) showed no > coverage of authentication related code. > Before we can release a port of cqlsh, we should identify an existing test > for cqlsh authentication, or write a new one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14497: - Labels: security (was: ) > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Assignee: Sam Tunnicliffe >Priority: Major > Labels: security > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits the database every time: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. The P99 dropped immediately, but I don't think it is > not a good solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506516#comment-16506516 ] Jeremy Hanna commented on CASSANDRA-14499: -- I just want to add a note of caution to anything automatic happening when certain metrics trigger. I've seen where metrics can misfire under certain circumstances which leads to unpredictable cluster behavior. I would favor having a warning system over anything done automatically if it were my cluster. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14457) Add a virtual table with current compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14457: - Labels: virtual-tables (was: ) > Add a virtual table with current compactions > > > Key: CASSANDRA-14457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14457 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: virtual-tables > Fix For: 4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14458) Add virtual table to list active connections
[ https://issues.apache.org/jira/browse/CASSANDRA-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14458: - Labels: virtual-tables (was: ) > Add virtual table to list active connections > > > Key: CASSANDRA-14458 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14458 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: virtual-tables > Fix For: 4.x > > > List all active connections in virtual table like: > {code:sql} > cqlsh:system> select * from system_views.clients ; > > client_address | cipher | driver_name | driver_version | keyspace | > protocol | requests | ssl | user | version > --+---+-++--+---+--+---+---+- > /127.0.0.1:63903 | undefined | undefined | undefined | | > undefined | 13 | False | anonymous | 4 > /127.0.0.1:63904 | undefined | undefined | undefined | system | > undefined | 16 | False | anonymous | 4 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-13929: --- Resolution: Fixed Reproduced In: 3.11.0, 3.8 (was: 3.11.0) Status: Resolved (was: Ready to Commit) > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506462#comment-16506462 ] Jay Zhuang commented on CASSANDRA-13929: Thanks [~jasobrown] again for the review. Committed as [{{ed5f834}}|https://github.com/apache/cassandra/commit/ed5f8347ef0c7175cd96e59bc8bfaf3ed1f4697a]. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/3] cassandra git commit: Remove BTree.Builder Recycler to reduce memory usage
Remove BTree.Builder Recycler to reduce memory usage patch by Jay Zhuang; reviewed by jasobrown for CASSANDRA-13929 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ed5f8347 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ed5f8347 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ed5f8347 Branch: refs/heads/trunk Commit: ed5f8347ef0c7175cd96e59bc8bfaf3ed1f4697a Parents: b174819 Author: Jay Zhuang Authored: Mon Jan 29 18:17:56 2018 -0800 Committer: Jay Zhuang Committed: Fri Jun 8 10:40:06 2018 -0700 -- CHANGES.txt | 1 + build.xml | 4 +- .../columniterator/SSTableReversedIterator.java | 2 +- .../org/apache/cassandra/db/rows/BTreeRow.java | 2 +- .../cassandra/db/rows/ComplexColumnData.java| 5 +- .../org/apache/cassandra/utils/btree/BTree.java | 69 +- .../test/microbench/BTreeBuildBench.java| 96 .../org/apache/cassandra/utils/BTreeTest.java | 33 ++- 8 files changed, 161 insertions(+), 51 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 2e77d2e..7f4b655 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.11.3 + * Remove BTree.Builder Recycler to reduce memory usage (CASSANDRA-13929) * Reduce nodetool GC thread count (CASSANDRA-14475) * Fix New SASI view creation during Index Redistribution (CASSANDRA-14055) * Remove string formatting lines from BufferPool hot path (CASSANDRA-14416) http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/build.xml -- diff --git a/build.xml b/build.xml index 4edfbb1..54c5372 100644 --- a/build.xml +++ b/build.xml @@ -422,8 +422,8 @@ - - + + http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java -- diff --git a/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java b/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java index cf8798d..6a0b7be 100644 --- a/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java +++ b/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java @@ -426,7 +426,7 @@ public class SSTableReversedIterator extends AbstractSSTableIterator public void reset() { built = null; -rowBuilder = BTree.builder(metadata.comparator); +rowBuilder.reuse(); deletionBuilder = MutableDeletionInfo.builder(partitionLevelDeletion, metadata().comparator, false); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/src/java/org/apache/cassandra/db/rows/BTreeRow.java -- diff --git a/src/java/org/apache/cassandra/db/rows/BTreeRow.java b/src/java/org/apache/cassandra/db/rows/BTreeRow.java index 15ac30a..c70e0e2 100644 --- a/src/java/org/apache/cassandra/db/rows/BTreeRow.java +++ b/src/java/org/apache/cassandra/db/rows/BTreeRow.java @@ -738,7 +738,7 @@ public class BTreeRow extends AbstractRow this.clustering = null; this.primaryKeyLivenessInfo = LivenessInfo.EMPTY; this.deletion = Deletion.LIVE; -this.cells_ = null; +this.cells_.reuse(); this.hasComplex = false; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java -- diff --git a/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java b/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java index 380af7a..1395782 100644 --- a/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java +++ b/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java @@ -242,7 +242,10 @@ public class ComplexColumnData extends ColumnData implements Iterable { this.column = column; this.complexDeletion = DeletionTime.LIVE; // default if writeComplexDeletion is not called -this.builder = BTree.builder(column.cellComparator()); +if (builder == null) +builder = BTree.builder(column.cellComparator()); +else +builder.reuse(column.cellComparator()); } public void addComplexDeletion(DeletionTime
[1/3] cassandra git commit: Remove BTree.Builder Recycler to reduce memory usage
Repository: cassandra Updated Branches: refs/heads/cassandra-3.11 b1748198e -> ed5f8347e refs/heads/trunk 800f0b394 -> 958e13d16 Remove BTree.Builder Recycler to reduce memory usage patch by Jay Zhuang; reviewed by jasobrown for CASSANDRA-13929 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ed5f8347 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ed5f8347 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ed5f8347 Branch: refs/heads/cassandra-3.11 Commit: ed5f8347ef0c7175cd96e59bc8bfaf3ed1f4697a Parents: b174819 Author: Jay Zhuang Authored: Mon Jan 29 18:17:56 2018 -0800 Committer: Jay Zhuang Committed: Fri Jun 8 10:40:06 2018 -0700 -- CHANGES.txt | 1 + build.xml | 4 +- .../columniterator/SSTableReversedIterator.java | 2 +- .../org/apache/cassandra/db/rows/BTreeRow.java | 2 +- .../cassandra/db/rows/ComplexColumnData.java| 5 +- .../org/apache/cassandra/utils/btree/BTree.java | 69 +- .../test/microbench/BTreeBuildBench.java| 96 .../org/apache/cassandra/utils/BTreeTest.java | 33 ++- 8 files changed, 161 insertions(+), 51 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 2e77d2e..7f4b655 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.11.3 + * Remove BTree.Builder Recycler to reduce memory usage (CASSANDRA-13929) * Reduce nodetool GC thread count (CASSANDRA-14475) * Fix New SASI view creation during Index Redistribution (CASSANDRA-14055) * Remove string formatting lines from BufferPool hot path (CASSANDRA-14416) http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/build.xml -- diff --git a/build.xml b/build.xml index 4edfbb1..54c5372 100644 --- a/build.xml +++ b/build.xml @@ -422,8 +422,8 @@ - - + + http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java -- diff --git a/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java b/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java index cf8798d..6a0b7be 100644 --- a/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java +++ b/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java @@ -426,7 +426,7 @@ public class SSTableReversedIterator extends AbstractSSTableIterator public void reset() { built = null; -rowBuilder = BTree.builder(metadata.comparator); +rowBuilder.reuse(); deletionBuilder = MutableDeletionInfo.builder(partitionLevelDeletion, metadata().comparator, false); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/src/java/org/apache/cassandra/db/rows/BTreeRow.java -- diff --git a/src/java/org/apache/cassandra/db/rows/BTreeRow.java b/src/java/org/apache/cassandra/db/rows/BTreeRow.java index 15ac30a..c70e0e2 100644 --- a/src/java/org/apache/cassandra/db/rows/BTreeRow.java +++ b/src/java/org/apache/cassandra/db/rows/BTreeRow.java @@ -738,7 +738,7 @@ public class BTreeRow extends AbstractRow this.clustering = null; this.primaryKeyLivenessInfo = LivenessInfo.EMPTY; this.deletion = Deletion.LIVE; -this.cells_ = null; +this.cells_.reuse(); this.hasComplex = false; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed5f8347/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java -- diff --git a/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java b/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java index 380af7a..1395782 100644 --- a/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java +++ b/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java @@ -242,7 +242,10 @@ public class ComplexColumnData extends ColumnData implements Iterable { this.column = column; this.complexDeletion = DeletionTime.LIVE; // default if writeComplexDeletion is not called -this.builder = BTree.builder(column.cellComparator()); +if (builder == null) +builder = BTree.builder(column.cellComparator()
[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/958e13d1 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/958e13d1 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/958e13d1 Branch: refs/heads/trunk Commit: 958e13d1667391c69ec82f54da7d371e6eba29d6 Parents: 800f0b3 ed5f834 Author: Jay Zhuang Authored: Fri Jun 8 10:47:14 2018 -0700 Committer: Jay Zhuang Committed: Fri Jun 8 10:48:15 2018 -0700 -- CHANGES.txt | 1 + build.xml | 4 +- .../columniterator/SSTableReversedIterator.java | 2 +- .../org/apache/cassandra/db/rows/BTreeRow.java | 2 +- .../cassandra/db/rows/ComplexColumnData.java| 5 +- .../org/apache/cassandra/utils/btree/BTree.java | 69 +- .../test/microbench/BTreeBuildBench.java| 96 .../org/apache/cassandra/utils/BTreeTest.java | 33 ++- 8 files changed, 161 insertions(+), 51 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/958e13d1/CHANGES.txt -- diff --cc CHANGES.txt index 9857704,7f4b655..27c9561 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,253 -1,5 +1,254 @@@ +4.0 + * Add option to sanity check tombstones on reads/compactions (CASSANDRA-14467) + * Add a virtual table to expose all running sstable tasks (CASSANDRA-14457) + * Let nodetool import take a list of directories (CASSANDRA-14442) + * Avoid unneeded memory allocations / cpu for disabled log levels (CASSANDRA-14488) + * Implement virtual keyspace interface (CASSANDRA-7622) + * nodetool import cleanup and improvements (CASSANDRA-14417) + * Bump jackson version to >= 2.9.5 (CASSANDRA-14427) + * Allow nodetool toppartitions without specifying table (CASSANDRA-14360) + * Audit logging for database activity (CASSANDRA-12151) + * Clean up build artifacts in docs container (CASSANDRA-14432) + * Minor network authz improvements (Cassandra-14413) + * Automatic sstable upgrades (CASSANDRA-14197) + * Replace deprecated junit.framework.Assert usages with org.junit.Assert (CASSANDRA-14431) + * Cassandra-stress throws NPE if insert section isn't specified in user profile (CASSSANDRA-14426) + * List clients by protocol versions `nodetool clientstats --by-protocol` (CASSANDRA-14335) + * Improve LatencyMetrics performance by reducing write path processing (CASSANDRA-14281) + * Add network authz (CASSANDRA-13985) + * Use the correct IP/Port for Streaming when localAddress is left unbound (CASSANDRA-14389) + * nodetool listsnapshots is missing local system keyspace snapshots (CASSANDRA-14381) + * Remove StreamCoordinator.streamExecutor thread pool (CASSANDRA-14402) + * Rename nodetool --with-port to --print-port to disambiguate from --port (CASSANDRA-14392) + * Client TOPOLOGY_CHANGE messages have wrong port. (CASSANDRA-14398) + * Add ability to load new SSTables from a separate directory (CASSANDRA-6719) + * Eliminate background repair and probablistic read_repair_chance table options + (CASSANDRA-13910) + * Bind to correct local address in 4.0 streaming (CASSANDRA-14362) + * Use standard Amazon naming for datacenter and rack in Ec2Snitch (CASSANDRA-7839) + * Fix junit failure for SSTableReaderTest (CASSANDRA-14387) + * Abstract write path for pluggable storage (CASSANDRA-14118) + * nodetool describecluster should be more informative (CASSANDRA-13853) + * Compaction performance improvements (CASSANDRA-14261) + * Refactor Pair usage to avoid boxing ints/longs (CASSANDRA-14260) + * Add options to nodetool tablestats to sort and limit output (CASSANDRA-13889) + * Rename internals to reflect CQL vocabulary (CASSANDRA-14354) + * Add support for hybrid MIN(), MAX() speculative retry policies + (CASSANDRA-14293, CASSANDRA-14338, CASSANDRA-14352) + * Fix some regressions caused by 14058 (CASSANDRA-14353) + * Abstract repair for pluggable storage (CASSANDRA-14116) + * Add meaningful toString() impls (CASSANDRA-13653) + * Add sstableloader option to accept target keyspace name (CASSANDRA-13884) + * Move processing of EchoMessage response to gossip stage (CASSANDRA-13713) + * Add coordinator write metric per CF (CASSANDRA-14232) + * Correct and clarify SSLFactory.getSslContext method and call sites (CASSANDRA-14314) + * Handle static and partition deletion properly on ThrottledUnfilteredIterator (CASSANDRA-14315) + * NodeTool clientstats should show SSL Cipher (CASSANDRA-14322) + * Add ability to specify driver name and version (CASSANDRA-14275) + * Abstract streaming for pluggable storage (CASSANDRA-14115) + * Forced incremental repairs should promote sstables if they can (CASSANDRA-14294) + * Use Murm
[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506404#comment-16506404 ] Kevin Zhang commented on CASSANDRA-14358: - [~jolynch] currently we have an issue with gossip going one way after a node had sudden loss of storage controller. After the offending node comes back online, all the rest nodes show TCP connections on gossip, but those connections (in bold) are not seen on the offending node. On offending node, nodetool gossipinfo shows generation 0 for all other nodes and nodetool status show DN for all other nodes. On other nodes, nodetool gossipinfo seems fine but nodetool status shows offending node down. This can be resolved by restarting cassandra on all nodes except the offending node, or wait for 2 hours after crash event (tcp_keepalive is set to 7200s on debian?). I don't know if this is related, but wondering if there is any way to verify (like TRACE logging on org.apache.cassandra.gms and/or org.apache.cassandra.net, or maybe packet capture). So far we can reproduce it at 30% chance. Thanks in advance. node 10.96.105.4 *tcp 0 0 10.96.105.4:7001 10.96.105.6:55629 ESTABLISHED keepalive (729.79/0/0)* *tcp 0 0 10.96.105.4:39219 10.96.105.6:7001 ESTABLISHED keepalive (783.04/0/0)* *tcp 0 0 10.96.105.4:7001 10.96.105.6:60007 ESTABLISHED keepalive (729.79/0/0)* tcp 0 0 10.96.105.4:7001 10.96.105.6:45318 ESTABLISHED keepalive (1471.16/0/0) node 10.96.105.6 tcp 0 0 10.96.105.6:7001 0.0.0.0:* LISTEN off (0.00/0/0) tcp 0 0 10.96.105.6:45318 10.96.105.4:7001 ESTABLISHED keepalive (1477.00/0/0) > OutboundTcpConnection can hang for many minutes when nodes restart > -- > > Key: CASSANDRA-14358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14358 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running > with {{internode_encryption: all}} and the EC2 multi region snitch on Linux > 4.13 within the same AWS region. Smallest cluster I've seen the problem on is > 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently > reproduce on at least one node in the cluster. > So all the connections are SSL and we're connecting on the internal ip > addresses (not the public endpoint ones). > Potentially relevant sysctls: > {noformat} > /proc/sys/net/ipv4/tcp_syn_retries = 2 > /proc/sys/net/ipv4/tcp_synack_retries = 5 > /proc/sys/net/ipv4/tcp_keepalive_time = 7200 > /proc/sys/net/ipv4/tcp_keepalive_probes = 9 > /proc/sys/net/ipv4/tcp_keepalive_intvl = 75 > /proc/sys/net/ipv4/tcp_retries2 = 15 > {noformat} >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x > > Attachments: 10 Minute Partition.pdf > > > edit summary: This primarily impacts networks with stateful firewalls such as > AWS. I'm working on a proper patch for trunk but unfortunately it relies on > the Netty refactor in 4.0 so it will be hard to backport to previous > versions. A workaround for earlier versions is to set the > {{net.ipv4.tcp_retries2}} sysctl to ~5. This can be done with the following: > {code:java} > $ cat /etc/sysctl.d/20-cassandra-tuning.conf > net.ipv4.tcp_retries2=5 > $ # Reload all sysctls > $ sysctl --system{code} > Original Bug Report: > I've been trying to debug nodes not being able to see each other during > longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can > contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and > 2.1.x clusters for us. I think I finally have a lead. It appears that prior > to trunk (with the awesome Netty refactor) we do not set socket connect > timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set > {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe > that this means that we could potentially block forever on {{connect}} or > {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I > think that the OS will protect us somewhat (and that may be what's causing > the eventual timeout) but I think that given the right network conditions our > {{OutboundTCPConnection}} threads can just be stuck never making any progress > until the OS intervenes. > I have attached some logs of such a network partition during a rolling > restart where an old node in the cluster has a completely foobarred > {{OutboundTcpConnection}} for ~10 minutes before finally getting a > {{java.net.SocketException: Connection timed out (Write failed)}} and > immediately successfully reconnecting. I conclude that the old node is the > problem because the new node (the one that restarted) is sending ECHOs to the >
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506282#comment-16506282 ] Vinay Chella commented on CASSANDRA-14482: -- (y)(y). Looking forward for your contributions [~sushm...@gmail.com] > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance > Fix For: 3.11.x, 4.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506243#comment-16506243 ] Jason Brown commented on CASSANDRA-14459: - tbqh, I am -1 on sending an {{EchoMessage}} on every gossip round. This increases the gossip traffic by 66% [1], if not 100%, adds more processing demands to the single-threaded Gossip stage, and will not even give you realistic latency data (except, possibly, a rudimentary floor latency, but that assumes a small cluster that is rather quiescent). Seed nodes would also bear a lot of this additional traffic. If we don't have any latency data in DES for a host, it's because we have not communicated meaningfully with it (as far as latency numbers go). I am totally fine with that, and we don't need to goose the traffic to get latencies for a node which we don't talk to. Your original patch was probably good enough to start a proper review, as I believe this behavior is a worthwhile addition. [1] Currently there's 2-3 msgs per gossip round (Ack2 is optional), EchoMsg + response adds two more. > DynamicEndpointSnitch should never prefer latent nodes > -- > > Key: CASSANDRA-14459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14459 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Fix For: 4.x > > > The DynamicEndpointSnitch has two unfortunate behaviors that allow it to > provide latent hosts as replicas: > # Loses all latency information when Cassandra restarts > # Clears latency information entirely every ten minutes (by default), > allowing global queries to be routed to _other datacenters_ (and local > queries cross racks/azs) > This means that the first few queries after restart/reset could be quite slow > compared to average latencies. I propose we solve this by resetting to the > minimum observed latency instead of completely clearing the samples and > extending the {{isLatencyForSnitch}} idea to a three state variable instead > of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows > {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the > DS should use those measurements if it only has one or fewer samples for a > host. This fixes both problems because on process restart we send out > {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to > effectively the RTT of the hosts (also at that point normal gossip > {{EchoMessages}} have an opportunity to add an additional latency > measurement). > This strategy also nicely deals with the "a host got slow but now it's fine" > problem that the DS resets were (afaik) designed to stop because the > {{EchoMessage}} ping latency will count only after the reset for that host. > Ping latency is a more reasonable lower bound on host latency (as opposed to > status quo of zero). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14482: --- Fix Version/s: 4.x > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance > Fix For: 3.11.x, 4.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14482: --- Component/s: Compression > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance > Fix For: 3.11.x, 4.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14482: --- Labels: performance (was: ) > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance > Fix For: 3.11.x, 4.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506156#comment-16506156 ] Jeff Jirsa commented on CASSANDRA-14482: I think that's more than fair and reasonable. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa reassigned CASSANDRA-14482: -- Assignee: Sushma A Devendrappa (was: Vinay Chella) > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506145#comment-16506145 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- [~vinaykumarcse] [~jjirsa] [~zznate] do you guys mind if i work on this. I am internally working on this and would love to take it forward and this will be my first chance to contribute to the community. Thanks Sushma > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Assignee: Vinay Chella >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14462) CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X
[ https://issues.apache.org/jira/browse/CASSANDRA-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Rousseau reassigned CASSANDRA-14462: --- Assignee: Fabien Rousseau > CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X > --- > > Key: CASSANDRA-14462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14462 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Fabien Rousseau >Assignee: Fabien Rousseau >Priority: Major > Attachments: 14662-2.1-2.2.patch > > > Issue CASSANDRA-12127 changed the way the reversed comparator behaves. Before > scrubbing tables with reversed clustering keys, requests with CAS won't apply > (even if the condition is true). > Below is a simple scenario to reproduce it: > - use C* 2.1.14/2.2.6 > - create the schema > {code:java} > CREATE KEYSPACE IF NOT EXISTS test_ks WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 1}; > USE test_ks; > CREATE TABLE IF NOT EXISTS test_cf ( > pid text, > total int static, > sid uuid, > amount int, > PRIMARY KEY ((pid), sid) > ) WITH CLUSTERING ORDER BY (sid DESC); > {code} > > - insert data > {code:java} > INSERT INTO test_cf (pid, sid, amount) VALUES ('1', > b2495ad2-9b64-4aab-b000-2ed20dda60ab, 2); > INSERT INTO test_cf (pid, total) VALUES ('1', 2);{code} > > - nodetool flush (this is necessary for the scenario to show the problem) > - upgrade to C* 2.1.20/2.2.12 > - execute the following queries: > {code:java} > UPDATE test_cf SET total = 3 WHERE pid = '1' IF total = 2; > UPDATE test_cf SET amount = 3 WHERE pid = '1' AND sid = > b2495ad2-9b64-4aab-b000-2ed20dda60ab IF amount = 2;{code} > > Both statements won't be applied while they should be applied. > It seems related to the min/maxColumn sstable checks (before the scrubbing, > the min is an empty array, after it is no more) which filter too many > sstables. > The SliceQueryFilter.shouldInclude method filter too many SSTables. > Note: When doing a simple "SELECT total FROM test_cf WHERE pid ='1';" works > well because the selected slices are different (and thus do not filter the > sstables). > Note: This does not seem to affect the 3.0.X versions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14462) CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X
[ https://issues.apache.org/jira/browse/CASSANDRA-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506104#comment-16506104 ] Fabien Rousseau edited comment on CASSANDRA-14462 at 6/8/18 2:52 PM: - I propose the patch in attachments. It works with the scenario described in the ticket. This might not be the right way to fix the issue, but it has the advantage of being rather simple and safe, at the expense of a performance penalty before scrubbing. (Once scrubbed, there is no more penalty). The patch works by disabling the min/max column check for tables with a clustering key and for the first clustering component being a reversed comparator (and automatically including it). There are no unit tests because I don't know how to automatically generate SSTable of a previous version then use them (but please let me know if it is possible). Note: the patch is simple enough to apply cleanly to both 2.1 & 2.2 branches. was (Author: frousseau): I propose the following patch. It works with the scenario described in the ticket. This might not be the right way to fix the issue, but it has the advantage of being rather simple and safe, at the expense of a performance penalty before scrubbing. (Once scrubbed, there is no more penalty). The patch works by disabling the min/max column check for tables with a clustering key and for the first clustering component being a reversed comparator (and automatically including it). There are no unit tests because I don't know how to automatically generate SSTable of a previous version then use them (but please let me know if it is possible). Note: the patch is simple enough to apply cleanly to both 2.1 & 2.2 branches. > CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X > --- > > Key: CASSANDRA-14462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14462 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Fabien Rousseau >Priority: Major > Attachments: 14662-2.1-2.2.patch > > > Issue CASSANDRA-12127 changed the way the reversed comparator behaves. Before > scrubbing tables with reversed clustering keys, requests with CAS won't apply > (even if the condition is true). > Below is a simple scenario to reproduce it: > - use C* 2.1.14/2.2.6 > - create the schema > {code:java} > CREATE KEYSPACE IF NOT EXISTS test_ks WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 1}; > USE test_ks; > CREATE TABLE IF NOT EXISTS test_cf ( > pid text, > total int static, > sid uuid, > amount int, > PRIMARY KEY ((pid), sid) > ) WITH CLUSTERING ORDER BY (sid DESC); > {code} > > - insert data > {code:java} > INSERT INTO test_cf (pid, sid, amount) VALUES ('1', > b2495ad2-9b64-4aab-b000-2ed20dda60ab, 2); > INSERT INTO test_cf (pid, total) VALUES ('1', 2);{code} > > - nodetool flush (this is necessary for the scenario to show the problem) > - upgrade to C* 2.1.20/2.2.12 > - execute the following queries: > {code:java} > UPDATE test_cf SET total = 3 WHERE pid = '1' IF total = 2; > UPDATE test_cf SET amount = 3 WHERE pid = '1' AND sid = > b2495ad2-9b64-4aab-b000-2ed20dda60ab IF amount = 2;{code} > > Both statements won't be applied while they should be applied. > It seems related to the min/maxColumn sstable checks (before the scrubbing, > the min is an empty array, after it is no more) which filter too many > sstables. > The SliceQueryFilter.shouldInclude method filter too many SSTables. > Note: When doing a simple "SELECT total FROM test_cf WHERE pid ='1';" works > well because the selected slices are different (and thus do not filter the > sstables). > Note: This does not seem to affect the 3.0.X versions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14462) CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X
[ https://issues.apache.org/jira/browse/CASSANDRA-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506104#comment-16506104 ] Fabien Rousseau commented on CASSANDRA-14462: - I propose the following patch. It works with the scenario described in the ticket. This might not be the right way to fix the issue, but it has the advantage of being rather simple and safe, at the expense of a performance penalty before scrubbing. (Once scrubbed, there is no more penalty). The patch works by disabling the min/max column check for tables with a clustering key and for the first clustering component being a reversed comparator (and automatically including it). There are no unit tests because I don't know how to automatically generate SSTable of a previous version then use them (but please let me know if it is possible). Note: the patch is simple enough to apply cleanly to both 2.1 & 2.2 branches. > CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X > --- > > Key: CASSANDRA-14462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14462 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Fabien Rousseau >Priority: Major > Attachments: 14662-2.1-2.2.patch > > > Issue CASSANDRA-12127 changed the way the reversed comparator behaves. Before > scrubbing tables with reversed clustering keys, requests with CAS won't apply > (even if the condition is true). > Below is a simple scenario to reproduce it: > - use C* 2.1.14/2.2.6 > - create the schema > {code:java} > CREATE KEYSPACE IF NOT EXISTS test_ks WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 1}; > USE test_ks; > CREATE TABLE IF NOT EXISTS test_cf ( > pid text, > total int static, > sid uuid, > amount int, > PRIMARY KEY ((pid), sid) > ) WITH CLUSTERING ORDER BY (sid DESC); > {code} > > - insert data > {code:java} > INSERT INTO test_cf (pid, sid, amount) VALUES ('1', > b2495ad2-9b64-4aab-b000-2ed20dda60ab, 2); > INSERT INTO test_cf (pid, total) VALUES ('1', 2);{code} > > - nodetool flush (this is necessary for the scenario to show the problem) > - upgrade to C* 2.1.20/2.2.12 > - execute the following queries: > {code:java} > UPDATE test_cf SET total = 3 WHERE pid = '1' IF total = 2; > UPDATE test_cf SET amount = 3 WHERE pid = '1' AND sid = > b2495ad2-9b64-4aab-b000-2ed20dda60ab IF amount = 2;{code} > > Both statements won't be applied while they should be applied. > It seems related to the min/maxColumn sstable checks (before the scrubbing, > the min is an empty array, after it is no more) which filter too many > sstables. > The SliceQueryFilter.shouldInclude method filter too many SSTables. > Note: When doing a simple "SELECT total FROM test_cf WHERE pid ='1';" works > well because the selected slices are different (and thus do not filter the > sstables). > Note: This does not seem to affect the 3.0.X versions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14462) CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X
[ https://issues.apache.org/jira/browse/CASSANDRA-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Rousseau updated CASSANDRA-14462: Attachment: 14662-2.1-2.2.patch > CAS temporarily broken on reversed tables after upgrading on 2.1.X or 2.2.X > --- > > Key: CASSANDRA-14462 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14462 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Fabien Rousseau >Priority: Major > Attachments: 14662-2.1-2.2.patch > > > Issue CASSANDRA-12127 changed the way the reversed comparator behaves. Before > scrubbing tables with reversed clustering keys, requests with CAS won't apply > (even if the condition is true). > Below is a simple scenario to reproduce it: > - use C* 2.1.14/2.2.6 > - create the schema > {code:java} > CREATE KEYSPACE IF NOT EXISTS test_ks WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 1}; > USE test_ks; > CREATE TABLE IF NOT EXISTS test_cf ( > pid text, > total int static, > sid uuid, > amount int, > PRIMARY KEY ((pid), sid) > ) WITH CLUSTERING ORDER BY (sid DESC); > {code} > > - insert data > {code:java} > INSERT INTO test_cf (pid, sid, amount) VALUES ('1', > b2495ad2-9b64-4aab-b000-2ed20dda60ab, 2); > INSERT INTO test_cf (pid, total) VALUES ('1', 2);{code} > > - nodetool flush (this is necessary for the scenario to show the problem) > - upgrade to C* 2.1.20/2.2.12 > - execute the following queries: > {code:java} > UPDATE test_cf SET total = 3 WHERE pid = '1' IF total = 2; > UPDATE test_cf SET amount = 3 WHERE pid = '1' AND sid = > b2495ad2-9b64-4aab-b000-2ed20dda60ab IF amount = 2;{code} > > Both statements won't be applied while they should be applied. > It seems related to the min/maxColumn sstable checks (before the scrubbing, > the min is an empty array, after it is no more) which filter too many > sstables. > The SliceQueryFilter.shouldInclude method filter too many SSTables. > Note: When doing a simple "SELECT total FROM test_cf WHERE pid ='1';" works > well because the selected slices are different (and thus do not filter the > sstables). > Note: This does not seem to affect the 3.0.X versions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Deleted] (CASSANDRA-14506) Cassandra has a serious bug
[ https://issues.apache.org/jira/browse/CASSANDRA-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa deleted CASSANDRA-14506: --- > Cassandra has a serious bug > --- > > Key: CASSANDRA-14506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14506 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Priority: Critical > > TBA -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14506) Cassandra has a serious bug
[ https://issues.apache.org/jira/browse/CASSANDRA-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14506: --- Summary: Cassandra has a serious bug (was: Cassandra is an idiot) > Cassandra has a serious bug > --- > > Key: CASSANDRA-14506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14506 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.0.x > > > TBA -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14459: --- Fix Version/s: 4.x > DynamicEndpointSnitch should never prefer latent nodes > -- > > Key: CASSANDRA-14459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14459 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Fix For: 4.x > > > The DynamicEndpointSnitch has two unfortunate behaviors that allow it to > provide latent hosts as replicas: > # Loses all latency information when Cassandra restarts > # Clears latency information entirely every ten minutes (by default), > allowing global queries to be routed to _other datacenters_ (and local > queries cross racks/azs) > This means that the first few queries after restart/reset could be quite slow > compared to average latencies. I propose we solve this by resetting to the > minimum observed latency instead of completely clearing the samples and > extending the {{isLatencyForSnitch}} idea to a three state variable instead > of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows > {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the > DS should use those measurements if it only has one or fewer samples for a > host. This fixes both problems because on process restart we send out > {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to > effectively the RTT of the hosts (also at that point normal gossip > {{EchoMessages}} have an opportunity to add an additional latency > measurement). > This strategy also nicely deals with the "a host got slow but now it's fine" > problem that the DS resets were (afaik) designed to stop because the > {{EchoMessage}} ping latency will count only after the reset for that host. > Ping latency is a more reasonable lower bound on host latency (as opposed to > status quo of zero). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13929: Status: Ready to Commit (was: Patch Available) > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505984#comment-16505984 ] Jason Brown commented on CASSANDRA-13929: - [~jay.zhuang] +1 > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/2] cassandra-builds git commit: Update centos docker file to avoid py ssl warnings
Repository: cassandra-builds Updated Branches: refs/heads/master 8f796c668 -> fb5df10b6 Update centos docker file to avoid py ssl warnings Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/a317ef0c Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/a317ef0c Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/a317ef0c Branch: refs/heads/master Commit: a317ef0c79c6c38f2ed7627482a609cf9c7bc4e7 Parents: 8f796c6 Author: Stefan Podkowinski Authored: Fri Jun 8 14:32:24 2018 +0200 Committer: Stefan Podkowinski Committed: Fri Jun 8 14:32:24 2018 +0200 -- docker/centos7-image.docker | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/a317ef0c/docker/centos7-image.docker -- diff --git a/docker/centos7-image.docker b/docker/centos7-image.docker index c082939..90d7b28 100644 --- a/docker/centos7-image.docker +++ b/docker/centos7-image.docker @@ -24,11 +24,17 @@ RUN yum -y install \ # via epel-releases RUN yum -y install python2-pip +# install ssl enabled urllib for retrieving python packages +# this will produce some ssl related warnings, which will be resolved once the package has been installed +RUN pip install urllib3[secure] + +# upgrade to modern pip version +RUN pip install --upgrade pip + # install Sphinx to generate docs RUN pip install \ Sphinx \ - sphinx_rtd_theme \ - urllib3 + sphinx_rtd_theme # create and change to build user RUN adduser build - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/2] cassandra-builds git commit: README: on updating rpm/deb repositories
README: on updating rpm/deb repositories Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/fb5df10b Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/fb5df10b Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/fb5df10b Branch: refs/heads/master Commit: fb5df10b6341b82fc887c7b60109b1a25f485334 Parents: a317ef0 Author: Stefan Podkowinski Authored: Fri Jun 8 14:34:14 2018 +0200 Committer: Stefan Podkowinski Committed: Fri Jun 8 14:34:14 2018 +0200 -- README.md | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/fb5df10b/README.md -- diff --git a/README.md b/README.md index 8bb85ee..14faed1 100644 --- a/README.md +++ b/README.md @@ -59,7 +59,32 @@ Once the RPM is signed, both the import key and verification steps should take p See use of `debsign` in `cassandra-release/prepare_release.sh`. +## Updating package repositories -## Publishing packages +### Prerequisites -TODO +Artifacts for RPM and Debian package repositories, as well as tar archives, are keept in a single SVN repository. You need to have your own local copy for adding new packages: + +``` +svn co --config-option 'config:miscellany:use-commit-times=yes' https://dist.apache.org/repos/dist/release/cassandra +``` + +(you may also want to set `use-commit-times = yes` in your local svn config) + +We'll further refer to the local directory created by the svn command as `$artifacts_svn_dir`. + +Required build tools: +* [createrepo](https://packages.ubuntu.com/bionic/createrepo) (RPMs) +* [reprepro](https://packages.ubuntu.com/bionic/reprepro) (Debian) + +### RPM + +Adding new packages to the official repository starts by copying the RPMs to `$artifacts_svn_dir/redhat/`. Afterwards, recreate the metadata by executing `createrepo -v .` in that directory. Finally, sign the generated meta data files in the `repodata` sub-directory: + +``` +for i in `ls *.bz2 *.gz *.xml`; do gpg -sba --local-user MyAlias $i; done; +``` + +### Debian + +See `finish_release.sh` - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14507) OutboundMessagingConnection backlog is not fully written in case of race conditions
Sergio Bossa created CASSANDRA-14507: Summary: OutboundMessagingConnection backlog is not fully written in case of race conditions Key: CASSANDRA-14507 URL: https://issues.apache.org/jira/browse/CASSANDRA-14507 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Reporter: Sergio Bossa The {{OutboundMessagingConnection}} writes into a backlog queue before the connection handshake is successfully completed, and then writes such backlog to the channel as soon as the successful handshake moves the channel state to {{READY}}. This is unfortunately race prone, as the following could happen: 1) One or more writer threads see the channel state as {{NOT_READY}} in {{#sendMessage()}} and are about to enqueue to the backlog, but they get descheduled by the OS. 2) The handshake thread is scheduled by the OS and moves the channel state to {{READY}}, emptying the backlog. 3) The writer threads are scheduled back and add to the backlog, but the channel state is {{READY}} at this point, so those writes would sit in the backlog and expire. Please note a similar race condition exists between {{OutboundMessagingConnection#sendMessage()}} and {{MessageOutHandler#channelWritabilityChanged()}}, which is way more serious as the channel writability could frequently change, luckily it looks like {{ChannelWriter#write()}} never gets invoked with {{checkWritability}} at {{true}} (so writes never go to the backlog when the channel is not writable). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14506) Cassandra is an idiot
Aleksey Yeschenko created CASSANDRA-14506: - Summary: Cassandra is an idiot Key: CASSANDRA-14506 URL: https://issues.apache.org/jira/browse/CASSANDRA-14506 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Fix For: 3.0.x, 3.11.x, 4.0.x TBA -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14502) toDate() CQL function is instantiated for wrong argument type
[ https://issues.apache.org/jira/browse/CASSANDRA-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer reassigned CASSANDRA-14502: -- Assignee: Benjamin Lerer > toDate() CQL function is instantiated for wrong argument type > - > > Key: CASSANDRA-14502 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14502 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Piotr Sarna >Assignee: Benjamin Lerer >Priority: Minor > Fix For: 4.0.x > > > {{toDate()}} function is instantiated to work for {{timeuuid}} and {{date}} > types passed as an argument, instead of {{timeuuid}} and {{timestamp}}, as > stated in this documentation: > [http://cassandra.apache.org/doc/latest/cql/functions.html#datetime-functions] > As a result it's possible to convert a {{date}} into {{date}}, but not a > {{timestamp}} into {{date}}, which is probably what was meant. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505784#comment-16505784 ] Joseph Lynch commented on CASSANDRA-14459: -- Ok, I've pushed another version of the patch to that branch which: # Adds a guaranteed {{EchoMessage}} to live hosts in addition to the GossipSyn request during the first step of gossip so that we can get some latency measurements from even latent nodes at some point. Increasing the gossip messaging slightly concerns me so the other option is I can have DES send explicit {{EchoMessages}} when it notices that a host doesn't have any data in {{reset}}. That is more deterministic (we can guarantee that after 2 reset intervals we'll probe the host), but also has the DES actively sending messages... # Creates a JMX method on {{DynamicEndpointSnitchMBean}} to allow users to force timing resets (if someone wants the old behavior back they can just call it on a cron ;)). I've been playing around with a local CCM cluster using `netem` to delay traffic to a particular localhost node with a small (~5s) reset interval and testing the reset logic out and it appears to work well. The only issue I ran into is that if a node is really fast once and then it becomes slow it will get some traffic after every reset because we reset to the fast measurement. This is no worse than the status quo but I tried to mitigate it by special casing a host which only has two measurements (a fast and a subsequent slow one) to use the mean instead of the minimum which eventually converges either up or down to the new RTT. > DynamicEndpointSnitch should never prefer latent nodes > -- > > Key: CASSANDRA-14459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14459 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > > The DynamicEndpointSnitch has two unfortunate behaviors that allow it to > provide latent hosts as replicas: > # Loses all latency information when Cassandra restarts > # Clears latency information entirely every ten minutes (by default), > allowing global queries to be routed to _other datacenters_ (and local > queries cross racks/azs) > This means that the first few queries after restart/reset could be quite slow > compared to average latencies. I propose we solve this by resetting to the > minimum observed latency instead of completely clearing the samples and > extending the {{isLatencyForSnitch}} idea to a three state variable instead > of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows > {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the > DS should use those measurements if it only has one or fewer samples for a > host. This fixes both problems because on process restart we send out > {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to > effectively the RTT of the hosts (also at that point normal gossip > {{EchoMessages}} have an opportunity to add an additional latency > measurement). > This strategy also nicely deals with the "a host got slow but now it's fine" > problem that the DS resets were (afaik) designed to stop because the > {{EchoMessage}} ping latency will count only after the reset for that host. > Ping latency is a more reasonable lower bound on host latency (as opposed to > status quo of zero). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14505) Removal of last element on a List deletes the entire row
[ https://issues.apache.org/jira/browse/CASSANDRA-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer reassigned CASSANDRA-14505: -- Assignee: Benjamin Lerer > Removal of last element on a List deletes the entire row > > > Key: CASSANDRA-14505 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14505 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: * Java: 1.8.0_171 > * SO: Ubuntu 18.04 LTS > * Cassandra: 3.11.2 >Reporter: André Paris >Assignee: Benjamin Lerer >Priority: Major > > The behavior of an element removal from a list by an UPDATE differs by how > the row was created: > Given the table > {{CREATE TABLE table_test (}} > {{ id int PRIMARY KEY,}} > {{ list list}} > {{)}} > If the row is created by an INSERT, the row remains after the UPDATE to > remove the last element on the list: > {{cqlsh:ks_test> INSERT INTO table_test (id, list ) VALUES ( 1, ['foo']) ;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +}} > 1 | ['foo'] > {{(1 rows)}} > {{cqlsh:ks_test> UPDATE table_test SET list = list - ['foo'] WHERE id=1;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +--}} > {{ 1 | null}} > {{(1 rows)}} > > But, if the row is created by an UPDATE, the row is deleted after the UPDATE > to remove the last element on the list: > {{cqlsh:ks_test> UPDATE table_test SET list = list + ['foo'] WHERE id=2;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +-}} > 2 | ['foo'] > {{(1 rows)}} > {{cqlsh:ks_test> UPDATE table_test SET list = list - ['foo'] WHERE id=2;}} > {{cqlsh:ks_test> SELECT * FROM table_test;}} > {{ id | list}} > {{ +--}} > {{(0 rows)}} > > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org