[jira] [Assigned] (CASSANDRA-12106) Add ability to blacklist a CQL partition so all requests are ignored
[ https://issues.apache.org/jira/browse/CASSANDRA-12106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu reassigned CASSANDRA-12106: --- Assignee: Sumanth Pasupuleti (was: Geoffrey Yu) > Add ability to blacklist a CQL partition so all requests are ignored > > > Key: CASSANDRA-12106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12106 > Project: Cassandra > Issue Type: New Feature >Reporter: Geoffrey Yu >Assignee: Sumanth Pasupuleti >Priority: Minor > Fix For: 4.x > > Attachments: 12106-trunk.txt > > > Sometimes reads/writes to a given partition may cause problems due to the > data present. It would be useful to have a manual way to blacklist such > partitions so all read and write requests to them are rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495387#comment-15495387 ] Geoffrey Yu commented on CASSANDRA-12367: - Thanks for the first pass [~slebresne]! I added another commit to address your comments [here|https://github.com/geoffxy/cassandra/commit/a71968ebba8b67591b88cafd2daf3b37e17fec52]. I added {{rowCount()}} to the {{Partition}} interface to be able to pass in a {{rowEstimate}} to {{UnfilteredRowIteratorSerializer.serializedSize()}} since all the implementing classes already had that method available. Please let me know how it looks now! {quote} Wonders if it wouldn't be more user friendly to return 0 if the key is not hosted on that replica (which will simply happen if we don't check anything). Genuine question though, I could see both options having advantages, so mentioning it for the sake of discussion. {quote} I don't feel strongly either way since I also agree that both options have merit. I've left the check in for now but I have no objection to removing it if others feel strongly. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461776#comment-15461776 ] Geoffrey Yu commented on CASSANDRA-12367: - [~slebresne]: Are [these changes|https://github.com/geoffxy/cassandra/compare/trunk...geoffxy:CASSANDRA-12367?w=1] similar to what you had in mind? It is meant to subtract the offsets between {{RowIndexedEntry}} objects corresponding to the partition key and the next partition key in the file, to get a size in bytes. I also kept the code that reads the partition from the memtable so that it would be possible for the operator to get information on the partition's footprint in the memtable as well. However, it also ignores {{Unfiltered}} objects that are not {{Row}} s. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12367: Attachment: 12367-trunk-v2.txt I've attached another patch that adds in a new statement to CQL as described in the ticket for some early feedback on the approach. It's implemented as a new statement since the semantics of what it is meant to do did not fit in well with the existing {{SELECT}} statement. {code} cqlsh> SELECT SIZE FROM demo.test WHERE type = 'person'; endpoint | size (bytes) ---+-- 127.0.0.2 | 338 127.0.0.3 | 338 (2 rows) {code} The statement needs to be restricted to a single partition, and returns results based on the consistency level (here it was {{ALL}} on a keyspace with RF=2). {quote} Could we use SSTableReader.getScanner(Range range, ...) instead of scanning all the partitions in the sstable? We would need to create the range so that it includes the token requested but I think it should save us some time by seeking to the correct position directly. {quote} Using {{SSTableReader.getScanner(Range range, ...)}} makes sense. Is there a recommended approach for creating a small {{Range}} that will wrap the requested token? For a {{LongToken}} it seems straightforward to just decrease the token in value slightly to create a range, but I'm not quite sure what a reasonable approach might look like for all the different types of tokens. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
[ https://issues.apache.org/jira/browse/CASSANDRA-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu resolved CASSANDRA-12075. - Resolution: Won't Fix > Include whether or not the client should retry the request when throwing a > RequestExecutionException > > > Key: CASSANDRA-12075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > > Some requests that result in an error should not be retried by the client. > Right now if the client gets an error, it has no way of knowing whether or > not it should retry. We can include an extra field in each > {{RequestExecutionException}} that will indicate whether the client should > retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433601#comment-15433601 ] Geoffrey Yu commented on CASSANDRA-9875: Thanks! I've opened a PR for the dtests [here|https://github.com/riptano/cassandra-dtest/pull/1273]. I'll keep an eye on the tests and look in to any failures that come up. > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-dtest-master-v2.txt, 9875-dtest-master.txt, > 9875-trunk-v2.txt, 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Attachment: 9875-dtest-master-v2.txt I've attached a new dtest patch with the changes. I ended up just adding a new test so we can get more granularity in the reporting. Please let me know how it looks! > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-dtest-master-v2.txt, 9875-dtest-master.txt, > 9875-trunk-v2.txt, 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Status: Patch Available (was: Open) > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-dtest-master-v2.txt, 9875-dtest-master.txt, > 9875-trunk-v2.txt, 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Status: Patch Available (was: Open) > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-dtest-master.txt, 9875-trunk-v2.txt, 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Attachment: 9875-dtest-master.txt 9875-trunk-v2.txt I've attached a new patch that uses a source filter as well as a patch for two new dtests. One test verifies the behavior of rebuilding with a specific range and the other verifies that {{nodetool rebuild}} disallows rebuilding a range that the current node does not own. Please let me know how these look! > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-dtest-master.txt, 9875-trunk-v2.txt, 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-2848) Make the Client API support passing down timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-2848: --- Attachment: 2848-trunk-v2.txt I'm attaching a second version of the patch that incorporates the changes in CASSANDRA-12256. *TL;DR:* The timeout is represented as an {{OptionalLong}} that is encoded in {{QueryOptions}}. It is passed all the way to the replica nodes on reads through {{ReadCommand}}, but is only kept on the coordinator for writes. The optional client specified timeout is decoded as a part of {{QueryOptions}}. Since this timeout may or may not be specified by a client, I opted to use an {{OptionalLong}} in an effort to make it clearer in the code that this is optional. I’ve gated the use of the new timeout flag (and encoding the timeout) to protocol v5 and above. On the read path, the timeout is kept within the {{ReadCommand}} and referenced in the {{ReadCallback.awaitResults()}}. It is also serialized within the {{ReadCommand}} so that replica nodes can use it when setting the monitoring time in {{ReadCommandVerbHandler}}. Of course, because the time when the query started is not propagated to the replicas, this will only enforce the timeout from when the {{MessageIn}} was constructed. On the write path, the timeout is just passed through the call stack into the {{AbstractWriteResponseHandler}}/{{AbstractPaxosCallback}} where it is referenced in the respective {{await()}} calls. I had investigated the possibility of passing the timeout to the replicas on the write path. To do so we'd need to incorporate it into the outgoing internode message when making a write, meaning placing it into {{Mutation}} or otherwise creating some sort of wrapper around a mutation that can hold the timeout. It seemed like this would be a very invasive change for minimal gain, considering being able to abort an in progress write didn't seem as useful compared to aborting an in progress read. This still requires a version bump in the internode protocol to support the change in serialization of {{ReadCommand}} (I haven't touched {{MessagingService.current_version}} yet, though). If we don't want to wait till 4.0, we can delay this part of the patch and just retain the custom timeout on the coordinator (i.e. don't serialize the timeout). Once the branch for 4.0 is available, we can modify the serialization to allow us to pass the timeout to the replicas. I'd also like to include some dtests for this, namely to just validate which timeout is being used on the coordinator. Is the accepted practice for doing something like this to log something and assert for the presence of the log entry? I want to avoid relying on the actual timeout observed since that can cause the test to be flaky. > Make the Client API support passing down timeouts > - > > Key: CASSANDRA-2848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2848 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Goffinet >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 2848-trunk-v2.txt, 2848-trunk.txt > > > Having a max server RPC timeout is good for worst case, but many applications > that have middleware in front of Cassandra, might have higher timeout > requirements. In a fail fast environment, if my application starting at say > the front-end, only has 20ms to process a request, and it must connect to X > services down the stack, by the time it hits Cassandra, we might only have > 10ms. I propose we provide the ability to specify the timeout on each call we > do optionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428853#comment-15428853 ] Geoffrey Yu commented on CASSANDRA-12311: - Thanks for all the help as well! :) > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 3.10 > > Attachments: 12311-dtest.txt, 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Status: Open (was: Patch Available) > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428458#comment-15428458 ] Geoffrey Yu commented on CASSANDRA-9875: No worries about the delay! Yeah I totally agree, adding a host whitelist would be a better interface. I somehow missed the source filter in the {{RangeStreamer}}. I'll take a look, make the changes, and add a dtest. > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12256) Count entire coordinated request against timeout
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423289#comment-15423289 ] Geoffrey Yu commented on CASSANDRA-12256: - Thanks for the review and help along the way! > Count entire coordinated request against timeout > > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.10 > > Attachments: 12256-trunk-v1v2.diff, 12256-trunk-v2.txt, > 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421934#comment-15421934 ] Geoffrey Yu commented on CASSANDRA-12311: - I looked through the dtest failures and they all seem to be related to https://github.com/riptano/cassandra-dtest/pull/1147 I think they should pass if you rebase your {{CASSANDRA-12311-tests}} dtests branch off of the latest upstream master and rerun them. I was able to rebase the branch locally without any merge conflicts. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-dtest.txt, 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421866#comment-15421866 ] Geoffrey Yu commented on CASSANDRA-12256: - Thanks for rerunning! I looked through a handful of the remaining failing dtests and they all seem to be failing due to timeouts. I wasn't able to replicate the failures locally when running them individually this time, which leads me to _suspect_ that they fail because the existing dtest timeouts are now too strict. I'm not super familiar with the dtest set up, so I'm looking for some input as to how to best proceed. Do the tests use the timeouts configured in {{dtest.py}} if they don't specify their own custom values? If so, do you think it would be a good approach to try with those values increased versus specifying custom values for the failing tests? I do realize that increasing the default ones could potentially cause the tests to run longer than they already do, however. Also, is there a way for me to kick off a subset of these tests myself on Jenkins to test them out? I don't want to have to keep bugging you with these failures :) > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk-v1v2.diff, 12256-trunk-v2.txt, > 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12367: Status: Patch Available (was: Open) > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12367: Fix Version/s: 3.x > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12367: Attachment: 12367-trunk.txt I've attached a patch that exposes a new method through JMX that will allow an operator to get the size of a partition on disk, scoped by a keyspace and table. I implemented it by iterating through the sstables (leveraging the bloom filter) and adding up the sizes of the CQL rows that fall within the partition. This also adds a nodetool command that can be used to invoke the API. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417999#comment-15417999 ] Geoffrey Yu edited comment on CASSANDRA-12256 at 8/11/16 9:52 PM: -- I've attached a v2 patch that removes the query start timestamp from {{QueryState}} and instead records it inside {{Message}} and passes it through the call chain. It turned out that {{QueryState}} is not exactly the best place to keep the query start timestamp because it is reused for queries that have the same {{streamId}}. I also attached a diff file between version 1 and 2 of the patches so it is easier to review since the changes are quite noisy. If the changes are alright, could you trigger the tests again? This should fix the majority of them, which will make it easier to identify and address any further failures. was (Author: geoffxy): I've attached a v2 patch that removes the query start timestamp from {{QueryState}} and instead records it inside {{Message}} and passes it through the call chain. It turned out that {{QueryState}} is not exactly the best place to keep the query start timestamp because it is reused for queries that have the same {{streamId}}. I also attached a diff file between version 1 and 2 of the patches so it is easier to review since the changes are quite noisy. If the changes are alright, could you trigger the tests again? This should fix the majority of them, which will make it easier to address any further failures. > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk-v1v2.diff, 12256-trunk-v2.txt, > 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12256: Status: Patch Available (was: Open) > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk-v1v2.diff, 12256-trunk-v2.txt, > 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12256: Attachment: 12256-trunk-v1v2.diff 12256-trunk-v2.txt I've attached a v2 patch that removes the query start timestamp from {{QueryState}} and instead records it inside {{Message}} and passes it through the call chain. It turned out that {{QueryState}} is not exactly the best place to keep the query start timestamp because it is reused for queries that have the same {{streamId}}. I also attached a diff file between version 1 and 2 of the patches so it is easier to review since the changes are quite noisy. If the changes are alright, could you trigger the tests again? This should fix the majority of them, which will make it easier to address any further failures. > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk-v1v2.diff, 12256-trunk-v2.txt, > 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416338#comment-15416338 ] Geoffrey Yu commented on CASSANDRA-12256: - Thanks for the review! I looked through the test results and it seems like there are quite a few failures that are timeouts. I'll take a look and see what I can do. > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415774#comment-15415774 ] Geoffrey Yu edited comment on CASSANDRA-9876 at 8/10/16 8:37 PM: - I've attached a v3 patch that makes some changes to {{RepairOptionTest}} to address the new behavior of {{RepairOption.parse()}}. The patch has four commits. The first is the v2 patch, the second and third are your ninja commits, and the fourth are the test changes. Please take a look and let me know what you think! I've also updated the dtest PR to address the failures in {{deprecated_repair_test.py}}. It looks like the failures are related to the changes in {{RepairOption.toString()}}. was (Author: geoffxy): I've attached a v3 patch that makes some changes to {{RepairOptionTest}} to address the new behavior of {{RepairOption.parse()}}. The patch has four commits. The first is the v2 patch, the second and third are your ninja commits, and the fourth are the test changes. Please take a look and let me know what you think! > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, > 9876-trunk-v3.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9876: --- Attachment: 9876-trunk-v3.txt I've attached a v3 patch that makes some changes to {{RepairOptionTest}} to address the new behavior of {{RepairOption.parse()}}. The patch has four commits. The first is the v2 patch, the second and third are your ninja commits, and the fourth are the test changes. Please take a look and let me know what you think! > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, > 9876-trunk-v3.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415711#comment-15415711 ] Geoffrey Yu commented on CASSANDRA-9876: It looks like the new check for ensuring both {{--dc}} and {{--hosts}} are not specified together is causing the {{RepairOptionTest.testParseOptions}} test to fail. I'll take a look at fixing it and I'll add some new ones for the pull repair option parsing. > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415629#comment-15415629 ] Geoffrey Yu commented on CASSANDRA-9876: Thanks for the quick follow up and review! Your changes look good to me. I've opened a PR for the dtest here: https://github.com/riptano/cassandra-dtest/pull/1209 > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Summary: Rebuild from targeted replica (was: Rebuild with start and end token and from targeted replica) > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Fix Version/s: 3.x Status: Patch Available (was: Open) > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9875) Rebuild from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu reassigned CASSANDRA-9875: -- Assignee: Geoffrey Yu > Rebuild from targeted replica > - > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9875) Rebuild with start and end token and from targeted replica
[ https://issues.apache.org/jira/browse/CASSANDRA-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9875: --- Attachment: 9875-trunk.txt Since CASSANDRA-10406 already implements the ability to specify ranges for {{nodetool rebuild}}, I attached a patch to add the ability to specify specific sources to stream from for the rebuild (which is the other improvement this ticket mentions). *Usage:* {{nodetool rebuild --keyspace --tokens --sources }} The implementation in this ticket requires that if {{-- sources}} is used, a source must be specified for every single token range provided using {{-- tokens}}. I also added in some code to validate the inputted ranges to make sure that the current node owns all of them. > Rebuild with start and end token and from targeted replica > -- > > Key: CASSANDRA-9875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9875 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > Labels: lhf > Attachments: 9875-trunk.txt > > > Nodetool rebuild command will rebuild all the token ranges handled by the > endpoint. Sometimes we want to rebuild only a certain token range. We should > add this ability to rebuild command. We should also add the ability to stream > from a given replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414589#comment-15414589 ] Geoffrey Yu commented on CASSANDRA-12311: - Thanks, that sounds great! > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-dtest.txt, 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414586#comment-15414586 ] Geoffrey Yu edited comment on CASSANDRA-9876 at 8/10/16 1:51 AM: - Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first. I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives. {quote} I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts. {quote} The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug. To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified. The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to [find a list of neighbors for each range|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162]. The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair. If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think? was (Author: geoffxy): Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first. I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives. {quote} I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts. {quote} The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug. To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified. The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to [find a list of neighbors for each range|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162]. The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. In the example above, only one range will be common between any two nodes. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair. If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think? > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://i
[jira] [Comment Edited] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414586#comment-15414586 ] Geoffrey Yu edited comment on CASSANDRA-9876 at 8/10/16 1:50 AM: - Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first. I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives. {quote} I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts. {quote} The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug. To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified. The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to [find a list of neighbors for each range|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162]. The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. In the example above, only one range will be common between any two nodes. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair. If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think? was (Author: geoffxy): Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first. I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives. {quote} I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts. {quote} The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug. To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified. The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to find a list of neighbors for each range (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162). The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. In the example above, only one range will be common between any two nodes. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair. If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think? > One way targeted repair >
[jira] [Updated] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9876: --- Status: Awaiting Feedback (was: Open) > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9876: --- Attachment: 9876-dtest-master.txt 9876-trunk-v2.txt Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first. I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives. {quote} I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts. {quote} The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug. To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified. The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to find a list of neighbors for each range (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162). The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. In the example above, only one range will be common between any two nodes. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair. If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think? > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12256: Attachment: 12256-trunk.txt I've attached a first pass at this ticket. The majority of the changes are to pass down the query start timestamp all the way to the {{ReadCallback}} and {{AbstractWriteResponseHandler}}. The timestamp is recorded when the {{QueryState}} is created for a particular query. > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12256) Properly respect the request timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12256: Fix Version/s: 3.x Status: Patch Available (was: Open) > Properly respect the request timeouts > - > > Key: CASSANDRA-12256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12256 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Geoffrey Yu > Fix For: 3.x > > Attachments: 12256-trunk.txt > > > We have a number of {{request_timeout_*}} option, that probably every user > expect to be an upper bound on how long the coordinator will wait before > timeouting a request, but it's actually not always the case, especially for > read requests. > I believe we don't respect those timeout properly in at least the following > cases: > * On a digest mismatch: in that case, we reset the timeout for the data > query, which means the overall query might take up to twice the configured > timeout before timeouting. > * On a range query: the timeout is reset for every sub-range that is queried. > With many nodes and vnodes, a range query could span tons of sub-range and so > a range query could take pretty much arbitrary long before actually > timeouting for the user. > * On short reads: we also reset the timeout for every short reads "retries". > It's also worth noting that even outside those, the timeouts don't take most > of the processing done by the coordinator (query parsing and CQL handling for > instance) into account. > Now, in all fairness, the reason this is this way is that the timeout > currently are *not* timeout for the full user request, but rather how long a > coordinator should wait on any given replica for any given internal query > before giving up. *However*, I'm pretty sure this is not what user > intuitively expect and want, *especially* in the context of CASSANDRA-2848 > where the goal is explicitely to have an upper bound on the query from the > user point of view. > So I'm suggesting we change how those timeouts are handled to really be > timeouts on the whole user query. > And by that I basically just mean that we'd mark the start of each query as > soon as possible in the processing, and use that starting time as base in > {{ReadCallback.await}} and {{AbstractWriteResponseHandler.get()}}. It won't > be perfect in the sense that we'll still only possibly timeout during > "blocking" operations, so typically if parsing a query takes more than your > timeout, you still won't timeout until that query is sent, but I think that's > probably fine in practice because 1) if you timeouts are small enough that > this matter, you're probably doing it wrong and 2) we can totally improve on > that later if needs be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Attachment: 12311-dtest.txt Thanks for the help with the driver and the example test! I really appreciate it. :) I've attached a patch meant to be applied on top of your {{CASSANDRA-12311-tests}} branch. It modifies {{paging_test.py:TestPagingWithDeletions.test_failure_threshold_deletions}} and {{write_failure_tests.py}} so that they check for the failure map when protocol v5 is used. I also added another file, {{read_failure_tests.py}} to test read failures due to reading too many tombstones. I modeled it after {{write_failure_tests.py}} and added tests for protocol v3 and v4 as well. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-dtest.txt, 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410124#comment-15410124 ] Geoffrey Yu commented on CASSANDRA-12311: - Thanks! I really appreciate it. :) > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410118#comment-15410118 ] Geoffrey Yu commented on CASSANDRA-12311: - Unfortunately I'm not quite familiar with the python driver. :( > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Attachment: 12311-trunk-v5.txt Thanks! I've attached a patch with changes to the documentation and also unit tests to cover the serialization and deserialization of read/write failure error messages. I took a look at the dtests but, since this changes the encoding for the client facing protocol, won't the python driver will need to be changed first to recognize the new failure code map? > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk-v5.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Attachment: 12311-trunk-v4.txt I've attached a patch with {{failures}} removed. I removed it from the exceptions themselves, which does have the implication that we lose some information when decoding an {{ErrorMessage}} while using protocol v4 (i.e. we can't meaningfully re-create the failure reason code map with just the number of failures). I feel that this is okay since as far as I'm aware, decoding the number of failures is meaningful (in this codebase) only when it is actually being used by {{o.a.c.transport.Client}} client-side. Let me know if I should change this. I'll get working on the dtests and update here once I have them done. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, > 12311-trunk-v4.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9876: --- Fix Version/s: 3.x Status: Patch Available (was: Open) > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-9876: --- Attachment: 9876-trunk.txt I've attached a patch that should add in what is described in the ticket. Specifically, it adds a new option {{--pull-repair}} to {{nodetool repair}} that can be used as follows: {{nodetool repair --in-hosts , --start-token --end-token --pull-repair}} Suppose {{}} is the node where the command is being run. Then {{}} will only request data from {{}} during the streaming step (if there is a mismatch) but will not send any data to {{}}. The node where the command is being run must one of the two nodes specified by {{--in-hosts}}. And of course the token range specified must be a range that both nodes own. > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 9876-trunk.txt > > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9876) One way targeted repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu reassigned CASSANDRA-9876: -- Assignee: Geoffrey Yu > One way targeted repair > --- > > Key: CASSANDRA-9876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9876 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Geoffrey Yu >Priority: Minor > > Many applications use C* by writing to one local DC. The other DC is used > when the local DC is unavailable. When the local DC becomes available, we > want to run a targeted repair b/w one endpoint from each DC to minimize the > data transfer over WAN. In this case, it will be helpful to do a one way > repair in which data will only be streamed from other DC to local DC instead > of streaming the data both ways. This will further minimize the traffic over > WAN. This feature should only be supported if a targeted repair is run > involving 2 hosts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408173#comment-15408173 ] Geoffrey Yu commented on CASSANDRA-12311: - Also, for what it's worth, while going through the protocol documentation I noticed that {{\[byte\]}} is referenced a few times but never explicitly defined under the "Notations" section. This could lead to ambiguity when it is used to define an encoding for an integer (i.e. signed or unsigned). Is this something we should perhaps consider adding to the specification? (I'm guessing here that, when used as an integer, it was intended to be interpreted as an unsigned 8 bit integer?) > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406949#comment-15406949 ] Geoffrey Yu edited comment on CASSANDRA-12311 at 8/4/16 1:21 AM: - Those ideas sound good to me. I can see how having extensibility in the failure codes can be useful so that we don't need to wait for protocol version bumps. Also passing back an endpoint to failure code map would be nice since we won't need to interpret the potentially different responses from the replicas at the coordinator to determine which (single) failure code should be used. I attached a patch with those changes incorporated. Since we need to pass some sort of failure code back from the replicas, I wanted to use the same set of failure codes between nodes as between the client and coordinator. So I placed the codes in a new enum {{RequestFailureReason}} and placed the map under {{RequestFailureException}}, meaning {{WriteFailureException}} s will carry this endpoint to failure code map as well. Please let me know what you think. was (Author: geoffxy): Those ideas sound good to me. I can see how having extensibility in the failure codes can be useful so that we don't need to wait for protocol version bumps. Also passing back an endpoint to failure code map would be nice since we won't need to interpret the potentially different responses from the replicas at the coordinator to determine which (single) failure code should be used. I attached a patch with those changes incorporated. Since we need to pass some sort of failure code back from the replicas, I wanted to use the same set of failure codes between nodes as between the client and coordinator. So I placed the codes in a new enum {{RequestFailureReason}} and placed the map under {{RequestFailureException}}, meaning {{WriteFailureException}}s will carry this endpoint to failure code map as well. Please let me know what you think. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Attachment: 12311-trunk-v3.txt Those ideas sound good to me. I can see how having extensibility in the failure codes can be useful so that we don't need to wait for protocol version bumps. Also passing back an endpoint to failure code map would be nice since we won't need to interpret the potentially different responses from the replicas at the coordinator to determine which (single) failure code should be used. I attached a patch with those changes incorporated. Since we need to pass some sort of failure code back from the replicas, I wanted to use the same set of failure codes between nodes as between the client and coordinator. So I placed the codes in a new enum {{RequestFailureReason}} and placed the map under {{RequestFailureException}}, meaning {{WriteFailureException}}s will carry this endpoint to failure code map as well. Please let me know what you think. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Labels: client-impacting, doc-impacting > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12367) Add an API to request the size of a CQL partition
Geoffrey Yu created CASSANDRA-12367: --- Summary: Add an API to request the size of a CQL partition Key: CASSANDRA-12367 URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 Project: Cassandra Issue Type: Improvement Reporter: Geoffrey Yu Assignee: Geoffrey Yu Priority: Minor It would be useful to have an API that we could use to get the total serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Attachment: 12311-trunk-v2.txt I've attached an updated patch that removes the new exception and instead adds a new {{reason}} field within {{ReadFailureException}} that can be used to indicate why the read query failed. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 4.x > > Attachments: 12311-trunk-v2.txt, 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400265#comment-15400265 ] Geoffrey Yu commented on CASSANDRA-12311: - Sure, that sounds reasonable to me. I'll make the changes and update the patch. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 4.x > > Attachments: 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Description: Right now if a data node fails to perform a read because it ran into a {{TombstoneOverwhelmingException}}, it only responds back to the coordinator node with a generic failure. Under this scheme, the coordinator won't be able to know exactly why the request failed and subsequently the client only gets a generic {{ReadFailureException}}. It would be useful to inform the client that their read failed because we read too many tombstones. We should have the data nodes reply with a failure type so the coordinator can pass this information to the client. (was: Right now if a data node fails to perform a read because it ran into a TombstoneOverwhelmingException, it only responds back to the coordinator node with a generic failure. Under this scheme, the coordinator won't be able to know exactly why the request failed and subsequently the client only gets a generic ReadFailureException. It would be useful to inform the client that their read failed because we read too many tombstones. We should have the data nodes reply with a failure type so the coordinator can pass this information to the client.) > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 4.x > > Attachments: 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > {{TombstoneOverwhelmingException}}, it only responds back to the coordinator > node with a generic failure. Under this scheme, the coordinator won't be able > to know exactly why the request failed and subsequently the client only gets > a generic {{ReadFailureException}}. It would be useful to inform the client > that their read failed because we read too many tombstones. We should have > the data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Attachment: 12311-trunk.txt > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > TombstoneOverwhelmingException, it only responds back to the coordinator node > with a generic failure. Under this scheme, the coordinator won't be able to > know exactly why the request failed and subsequently the client only gets a > generic ReadFailureException. It would be useful to inform the client that > their read failed because we read too many tombstones. We should have the > data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12311: Fix Version/s: 4.x Status: Patch Available (was: Open) I've attached a proposed patch that implements these changes. It adds a new exception code and also makes changes to internode messaging, so I've marked it for 4.x. > Propagate TombstoneOverwhelmingException to the client > -- > > Key: CASSANDRA-12311 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 4.x > > Attachments: 12311-trunk.txt > > > Right now if a data node fails to perform a read because it ran into a > TombstoneOverwhelmingException, it only responds back to the coordinator node > with a generic failure. Under this scheme, the coordinator won't be able to > know exactly why the request failed and subsequently the client only gets a > generic ReadFailureException. It would be useful to inform the client that > their read failed because we read too many tombstones. We should have the > data nodes reply with a failure type so the coordinator can pass this > information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12311) Propagate TombstoneOverwhelmingException to the client
Geoffrey Yu created CASSANDRA-12311: --- Summary: Propagate TombstoneOverwhelmingException to the client Key: CASSANDRA-12311 URL: https://issues.apache.org/jira/browse/CASSANDRA-12311 Project: Cassandra Issue Type: Improvement Reporter: Geoffrey Yu Assignee: Geoffrey Yu Priority: Minor Right now if a data node fails to perform a read because it ran into a TombstoneOverwhelmingException, it only responds back to the coordinator node with a generic failure. Under this scheme, the coordinator won't be able to know exactly why the request failed and subsequently the client only gets a generic ReadFailureException. It would be useful to inform the client that their read failed because we read too many tombstones. We should have the data nodes reply with a failure type so the coordinator can pass this information to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12106) Add ability to blacklist a CQL partition so all requests are ignored
[ https://issues.apache.org/jira/browse/CASSANDRA-12106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12106: Fix Version/s: 4.x Status: Patch Available (was: Open) I’ve attached a patch that implements this. There are a lot of changes, so I thought I’d highlight the high level approach I took to make it easier to understand what is going on. This patch will let us blacklist any particular CQL partition, scoped by keyspace and table. Any reads/writes to a blacklisted partition will be rejected, and the client will receive a Read/WriteRejectedException accordingly. The mechanism for blacklisting a partition is exposed through a node tool command. The approach to implementing this is to perform the rejection at the data replica level. This is so each node only needs to be aware of blacklisted partitions for ranges that it owns, allowing this to scale to larger clusters. The blacklist is stored in a new table under the {{system_distributed}} keyspace. Each node then maintains an in memory cache to store blacklist entries corresponding to its token ranges. For single partition reads and writes, we reject the request as long as one replica responds with a rejection. For partition range reads, we reject the request if there is a blacklisted partition within the range. CAS writes are rejected by the data nodes only on the prepare/promise step, and potentially when the coordinator performs the read before the propose/accept step. If the write proceeds past these places, the mutation will be allowed to be applied. A mutation in a batch log that is rejected will not be considered a "failure" in the replay. This means if all mutations were either applied successfully or rejected, then the replay would be considered successful and the batch log would be deleted. Mutations that are rejected are not hinted. Any hints that are rejected when they are attempted to be applied will still be considered "successful" and deleted. There are also changes included to help cover cache consistency when a node starts up, undergoes a range movement, and if it is decommissioned. > Add ability to blacklist a CQL partition so all requests are ignored > > > Key: CASSANDRA-12106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12106 > Project: Cassandra > Issue Type: New Feature >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 4.x > > Attachments: 12106-trunk.txt > > > Sometimes reads/writes to a given partition may cause problems due to the > data present. It would be useful to have a manual way to blacklist such > partitions so all read and write requests to them are rejected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12106) Add ability to blacklist a CQL partition so all requests are ignored
[ https://issues.apache.org/jira/browse/CASSANDRA-12106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12106: Attachment: 12106-trunk.txt > Add ability to blacklist a CQL partition so all requests are ignored > > > Key: CASSANDRA-12106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12106 > Project: Cassandra > Issue Type: New Feature >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 12106-trunk.txt > > > Sometimes reads/writes to a given partition may cause problems due to the > data present. It would be useful to have a manual way to blacklist such > partitions so all read and write requests to them are rejected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-2848) Make the Client API support passing down timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-2848: --- Status: Patch Available (was: Open) I’ve attached a patch implementing this, and would love some feedback! At a high level, the approach I took was to use the last flag available in the protocol to allow a client to specify whether or not the client supplied a timeout (as a {{long}}, in milliseconds). Cassandra will then use the minimum of either the client specified timeout or the configured RPC timeout. The rest of the changes were essentially for passing the client supplied timeout down to where it’s actually needed. I also bumped the messaging service version to allow for passing the timeout to the replica nodes as a part of serialization/deserialization for {{ReadCommand}} and {{Mutation}}. > Make the Client API support passing down timeouts > - > > Key: CASSANDRA-2848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2848 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Goffinet >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 2848-trunk.txt > > > Having a max server RPC timeout is good for worst case, but many applications > that have middleware in front of Cassandra, might have higher timeout > requirements. In a fail fast environment, if my application starting at say > the front-end, only has 20ms to process a request, and it must connect to X > services down the stack, by the time it hits Cassandra, we might only have > 10ms. I propose we provide the ability to specify the timeout on each call we > do optionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-2848) Make the Client API support passing down timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-2848: --- Attachment: 2848-trunk.txt > Make the Client API support passing down timeouts > - > > Key: CASSANDRA-2848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2848 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Goffinet >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 2848-trunk.txt > > > Having a max server RPC timeout is good for worst case, but many applications > that have middleware in front of Cassandra, might have higher timeout > requirements. In a fail fast environment, if my application starting at say > the front-end, only has 20ms to process a request, and it must connect to X > services down the stack, by the time it hits Cassandra, we might only have > 10ms. I propose we provide the ability to specify the timeout on each call we > do optionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12178) Add prefixes to the name of snapshots created before a truncate or drop
[ https://issues.apache.org/jira/browse/CASSANDRA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378423#comment-15378423 ] Geoffrey Yu commented on CASSANDRA-12178: - Okay that makes sense. Thanks for the quick review! > Add prefixes to the name of snapshots created before a truncate or drop > --- > > Key: CASSANDRA-12178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12178 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12178-3.0.txt, 12178-trunk.txt > > > It would be useful to be able to identify snapshots that are taken because a > table was truncated or dropped. We can do this by prepending a prefix to > snapshot names for snapshots that are created before a truncate/drop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12178) Add prefixes to the name of snapshots created before a truncate or drop
[ https://issues.apache.org/jira/browse/CASSANDRA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12178: Attachment: 12178-3.0.txt > Add prefixes to the name of snapshots created before a truncate or drop > --- > > Key: CASSANDRA-12178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12178 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.0.x > > Attachments: 12178-3.0.txt, 12178-trunk.txt > > > It would be useful to be able to identify snapshots that are taken because a > table was truncated or dropped. We can do this by prepending a prefix to > snapshot names for snapshots that are created before a truncate/drop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12178) Add prefixes to the name of snapshots created before a truncate or drop
[ https://issues.apache.org/jira/browse/CASSANDRA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12178: Fix Version/s: 3.0.x > Add prefixes to the name of snapshots created before a truncate or drop > --- > > Key: CASSANDRA-12178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12178 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.0.x > > Attachments: 12178-3.0.txt, 12178-trunk.txt > > > It would be useful to be able to identify snapshots that are taken because a > table was truncated or dropped. We can do this by prepending a prefix to > snapshot names for snapshots that are created before a truncate/drop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12178) Add prefixes to the name of snapshots created before a truncate or drop
[ https://issues.apache.org/jira/browse/CASSANDRA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12178: Attachment: 12178-trunk.txt > Add prefixes to the name of snapshots created before a truncate or drop > --- > > Key: CASSANDRA-12178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12178 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 12178-trunk.txt > > > It would be useful to be able to identify snapshots that are taken because a > table was truncated or dropped. We can do this by prepending a prefix to > snapshot names for snapshots that are created before a truncate/drop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12178) Add prefixes to the name of snapshots created before a truncate or drop
[ https://issues.apache.org/jira/browse/CASSANDRA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12178: Status: Patch Available (was: Open) > Add prefixes to the name of snapshots created before a truncate or drop > --- > > Key: CASSANDRA-12178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12178 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 12178-trunk.txt > > > It would be useful to be able to identify snapshots that are taken because a > table was truncated or dropped. We can do this by prepending a prefix to > snapshot names for snapshots that are created before a truncate/drop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12178) Add prefixes to the name of snapshots created before a truncate or drop
Geoffrey Yu created CASSANDRA-12178: --- Summary: Add prefixes to the name of snapshots created before a truncate or drop Key: CASSANDRA-12178 URL: https://issues.apache.org/jira/browse/CASSANDRA-12178 Project: Cassandra Issue Type: Improvement Reporter: Geoffrey Yu Assignee: Geoffrey Yu Priority: Minor It would be useful to be able to identify snapshots that are taken because a table was truncated or dropped. We can do this by prepending a prefix to snapshot names for snapshots that are created before a truncate/drop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12106) Add ability to blacklist a CQL partition so all requests are ignored
Geoffrey Yu created CASSANDRA-12106: --- Summary: Add ability to blacklist a CQL partition so all requests are ignored Key: CASSANDRA-12106 URL: https://issues.apache.org/jira/browse/CASSANDRA-12106 Project: Cassandra Issue Type: New Feature Reporter: Geoffrey Yu Assignee: Geoffrey Yu Priority: Minor Sometimes reads/writes to a given partition may cause problems due to the data present. It would be useful to have a manual way to blacklist such partitions so all read and write requests to them are rejected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12076) Add username to AuthenticationException messages
[ https://issues.apache.org/jira/browse/CASSANDRA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12076: Attachment: 12076-dtest-master.txt > Add username to AuthenticationException messages > > > Key: CASSANDRA-12076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Trivial > Attachments: 12076-dtest-master.txt, 12076-trunk-v2.txt, > 12076-trunk.txt > > > When an {{AuthenticationException}} is thrown, there are a few places where > the user that initiated the request is not included in the exception message. > It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12076) Add username to AuthenticationException messages
[ https://issues.apache.org/jira/browse/CASSANDRA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353742#comment-15353742 ] Geoffrey Yu commented on CASSANDRA-12076: - I attached a patch for the affected dtests in auth_test.py. The patched tests ran fine locally. > Add username to AuthenticationException messages > > > Key: CASSANDRA-12076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Trivial > Attachments: 12076-dtest-master.txt, 12076-trunk-v2.txt, > 12076-trunk.txt > > > When an {{AuthenticationException}} is thrown, there are a few places where > the user that initiated the request is not included in the exception message. > It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12076) Add username to AuthenticationException messages
[ https://issues.apache.org/jira/browse/CASSANDRA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347482#comment-15347482 ] Geoffrey Yu commented on CASSANDRA-12076: - Absolutely - I made the changes and attached a new patch. How do the messages look now? As for the dtests, which version should I be restricting the existing relevant tests to? > Add username to AuthenticationException messages > > > Key: CASSANDRA-12076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Trivial > Attachments: 12076-trunk-v2.txt, 12076-trunk.txt > > > When an {{AuthenticationException}} is thrown, there are a few places where > the user that initiated the request is not included in the exception message. > It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12076) Add username to AuthenticationException messages
[ https://issues.apache.org/jira/browse/CASSANDRA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12076: Attachment: 12076-trunk-v2.txt > Add username to AuthenticationException messages > > > Key: CASSANDRA-12076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Trivial > Attachments: 12076-trunk-v2.txt, 12076-trunk.txt > > > When an {{AuthenticationException}} is thrown, there are a few places where > the user that initiated the request is not included in the exception message. > It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12076) Add username to AuthenticationException messages
[ https://issues.apache.org/jira/browse/CASSANDRA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12076: Status: Patch Available (was: Open) > Add username to AuthenticationException messages > > > Key: CASSANDRA-12076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Trivial > Attachments: 12076-trunk.txt > > > When an {{AuthenticationException}} is thrown, there are a few places where > the user that initiated the request is not included in the exception message. > It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12076) Add username to AuthenticationException messages
[ https://issues.apache.org/jira/browse/CASSANDRA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12076: Attachment: 12076-trunk.txt > Add username to AuthenticationException messages > > > Key: CASSANDRA-12076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Trivial > Attachments: 12076-trunk.txt > > > When an {{AuthenticationException}} is thrown, there are a few places where > the user that initiated the request is not included in the exception message. > It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12076) Add username to AuthenticationException messages
Geoffrey Yu created CASSANDRA-12076: --- Summary: Add username to AuthenticationException messages Key: CASSANDRA-12076 URL: https://issues.apache.org/jira/browse/CASSANDRA-12076 Project: Cassandra Issue Type: Improvement Reporter: Geoffrey Yu Assignee: Geoffrey Yu Priority: Trivial When an {{AuthenticationException}} is thrown, there are a few places where the user that initiated the request is not included in the exception message. It can be useful to have this information included for logging purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
[ https://issues.apache.org/jira/browse/CASSANDRA-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-12075: Issue Type: Improvement (was: New Feature) > Include whether or not the client should retry the request when throwing a > RequestExecutionException > > > Key: CASSANDRA-12075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > > Some requests that result in an error should not be retried by the client. > Right now if the client gets an error, it has no way of knowing whether or > not it should retry. We can include an extra field in each > {{RequestExecutionException}} that will indicate whether the client should > retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12075) Include whether or not the client should retry the request when throwing a RequestExecutionException
Geoffrey Yu created CASSANDRA-12075: --- Summary: Include whether or not the client should retry the request when throwing a RequestExecutionException Key: CASSANDRA-12075 URL: https://issues.apache.org/jira/browse/CASSANDRA-12075 Project: Cassandra Issue Type: New Feature Reporter: Geoffrey Yu Assignee: Geoffrey Yu Priority: Minor Some requests that result in an error should not be retried by the client. Right now if the client gets an error, it has no way of knowing whether or not it should retry. We can include an extra field in each {{RequestExecutionException}} that will indicate whether the client should retry, retry on a different host, or not retry at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11880) Display number of tables in cfstats
[ https://issues.apache.org/jira/browse/CASSANDRA-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-11880: Attachment: 11880-trunk.txt > Display number of tables in cfstats > --- > > Key: CASSANDRA-11880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11880 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 11880-trunk.txt > > > We should display the number of tables in a Cassandra cluster in {{nodetool > cfstats}}. This would be useful for monitoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11880) Display number of tables in cfstats
[ https://issues.apache.org/jira/browse/CASSANDRA-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoffrey Yu updated CASSANDRA-11880: Status: Patch Available (was: Open) > Display number of tables in cfstats > --- > > Key: CASSANDRA-11880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11880 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Attachments: 11880-trunk.txt > > > We should display the number of tables in a Cassandra cluster in {{nodetool > cfstats}}. This would be useful for monitoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11880) Display number of tables in cfstats
Geoffrey Yu created CASSANDRA-11880: --- Summary: Display number of tables in cfstats Key: CASSANDRA-11880 URL: https://issues.apache.org/jira/browse/CASSANDRA-11880 Project: Cassandra Issue Type: Improvement Reporter: Geoffrey Yu Priority: Minor We should display the number of tables in a Cassandra cluster in {{nodetool cfstats}}. This would be useful for monitoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)