[jira] [Comment Edited] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506971#comment-16506971 ] Jeremiah Jordan edited comment on CASSANDRA-14499 at 6/9/18 1:03 PM: - If one node has reached “full” how likely is it that others are about to as well? Without monitoring how will an operator know to do something to “fix” the situation? I’m just not convinced that it’s worth adding the logic and complications in the rest of the code to allow this feature, which will maybe add a short bandaid of time before things completely fall over, and possible just have things fall over early. If you are lucky and one node has enough more data than others that it hits this first, without others following shortly behind, you might give a small amount of breathing room for compaction to clean a little space out, but that is only going to do so much, it won’t fix the problem. You need to recognize as an operator that your nodes are full and add more nodes to your cluster, or add more disk space to your cluster. was (Author: jjordan): If one node has reached “full” how likely is it that others are about to as well? Without monitoring how will an operator know to do something to “fix” the situation? I’m just not convinced that it’s worth adding the logic and complications in the rest of the code to allow this feature, which will maybe add a short bandaid of time before things completely fall over, and possible just have things fall over early. If you are lucky and one node has enough more data than other that it hits this first, without other following shortly behind, you might give a small amount of breathing room for compaction to clean a little space out, but that is only going to do so much, it won’t fix the problem. You need to recognize as an operator that your nodes are full and add more nodes to your cluster, or add more disk space to your cluster. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506971#comment-16506971 ] Jeremiah Jordan commented on CASSANDRA-14499: - If one node has reached “full” how likely is it that others are about to as well? Without monitoring how will an operator know to do something to “fix” the situation? I’m just not convinced that it’s worth adding the logic and complications in the rest of the code to allow this feature, which will maybe add a short bandaid of time before things completely fall over, and possible just have things fall over early. If you are lucky and one node has enough more data than other that it hits this first, without other following shortly behind, you might give a small amount of breathing room for compaction to clean a little space out, but that is only going to do so much, it won’t fix the problem. You need to recognize as an operator that your nodes are full and add more nodes to your cluster, or add more disk space to your cluster. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14507) OutboundMessagingConnection backlog is not fully written in case of race conditions
[ https://issues.apache.org/jira/browse/CASSANDRA-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506886#comment-16506886 ] Sergio Bossa commented on CASSANDRA-14507: -- bq. Wouldn't they be picked up by the MessageOutHandler::channelWritabilityChanged and then get drained? That is assuming the writability changes before the timeout window, which might very well not be the case, unless I'm misunderstanding your question? Also, {{channelWritabilityChanged()}} is race-prone by itself as mentioned above. > OutboundMessagingConnection backlog is not fully written in case of race > conditions > --- > > Key: CASSANDRA-14507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14507 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Priority: Major > > The {{OutboundMessagingConnection}} writes into a backlog queue before the > connection handshake is successfully completed, and then writes such backlog > to the channel as soon as the successful handshake moves the channel state to > {{READY}}. > This is unfortunately race prone, as the following could happen: > 1) One or more writer threads see the channel state as {{NOT_READY}} in > {{#sendMessage()}} and are about to enqueue to the backlog, but they get > descheduled by the OS. > 2) The handshake thread is scheduled by the OS and moves the channel state to > {{READY}}, emptying the backlog. > 3) The writer threads are scheduled back and add to the backlog, but the > channel state is {{READY}} at this point, so those writes would sit in the > backlog and expire. > Please note a similar race condition exists between > {{OutboundMessagingConnection#sendMessage()}} and > {{MessageOutHandler#channelWritabilityChanged()}}, which is way more serious > as the channel writability could frequently change, luckily it looks like > {{ChannelWriter#write()}} never gets invoked with {{checkWritability}} at > {{true}} (so writes never go to the backlog when the channel is not writable). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org