[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556007#comment-15556007 ] Tupshin Harper commented on CASSANDRA-7296: --- Given the fresh activity, I'd like to re-emphasize my support for this ticket. I think node/data debugging via request pinning is an excellent use of it, and is basically the original reason for the ticket. Spark turned out to be an irrelevant tangent, but there is significant benefit in supporting this (degeneratively simple) form of consistency. If [~jjirsa]'s patch is still applicable (or can be), i'd love to see it given a fair shake. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9779) Append-only optimization
[ https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344434#comment-15344434 ] Tupshin Harper commented on CASSANDRA-9779: --- Basically we are talking about frozen rows(as analogy to frozen collections), and I am very much in favor of this. *Many* use cases, particularly IoT, would be able to use such an optimization while still benefiting from representing data in highly structured columns. > Append-only optimization > > > Key: CASSANDRA-9779 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9779 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: Jonathan Ellis > Fix For: 3.x > > > Many common workloads are append-only: that is, they insert new rows but do > not update existing ones. However, Cassandra has no way to infer this and so > it must treat all tables as if they may experience updates in the future. > If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for > instance) then we could do a number of optimizations: > - Compaction would only need to worry about defragmenting partitions, not > rows. We could default to DTCS or similar. > - CollationController could stop scanning sstables as soon as it finds a > matching row > - Most importantly, materialized views wouldn't need to worry about deleting > prior values, which would eliminate the majority of the MV overhead -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8119) More Expressive Consistency Levels
[ https://issues.apache.org/jira/browse/CASSANDRA-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330525#comment-15330525 ] Tupshin Harper commented on CASSANDRA-8119: --- I like the overall approach that Tyler proposes, but I have been convinced for a long time, that the ultimate desirable functionality would be to combine the above expressive consistency levels with multiple CL callbacks per request (e.g. one callback at LQ and another at EQ). I would love to see this ticket prepare to make protocol/conceptual changes to support that even though it would surely be prohibitive to implement multiple CL callbacks within the scope of this ticket. > More Expressive Consistency Levels > -- > > Key: CASSANDRA-8119 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8119 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: Tyler Hobbs > Fix For: 3.x > > > For some multi-datacenter environments, the current set of consistency levels > are too restrictive. For example, the following consistency requirements > cannot be expressed: > * LOCAL_QUORUM in two specific DCs > * LOCAL_QUORUM in the local DC plus LOCAL_QUORUM in at least one other DC > * LOCAL_QUORUM in the local DC plus N remote replicas in any DC > I propose that we add a new consistency level: CUSTOM. In the v4 (or v5) > protocol, this would be accompanied by an additional map argument. A map of > {DC: CL} or a map of {DC: int} is sufficient to cover the first example. If > we accept a special keys to represent "any datacenter", the second case can > be handled. A similar technique could be used for "any other nodes". > I'm not in love with the special keys, so if anybody has ideas for something > more elegant, feel free to propose them. The main idea is that we want to be > flexible enough to cover any reasonable consistency or durability > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7666) Range-segmented sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325050#comment-15325050 ] Tupshin Harper commented on CASSANDRA-7666: --- In addition to being relevant to CASSANDRA-11989, I believe range-segmented sstables represents an under-appreciated potential optimization for compaction strategies. As a rule of thumb, we tend to recommend that STCS workloads be kept under 2TB, or so. The main reason for this (besides operational concerns involving time to bootstrap/repair/etc), is that STCS compaction performance scales sublinearly with the amount of data in a table/node, and that the write amplification factor is substantially higher at 10TB than 2. With range-segmented-sstables, just 5 segments would allow 10TB to be isolated into 2 segment sections, and as long as the cumulative IO and CPU of the nodes was sufficient for the total workload, could sustain performance at that scale. I suggest that this ticket be re-opsened for those two reasons. > Range-segmented sstables > > > Key: CASSANDRA-7666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7666 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: Jonathan Ellis > Labels: dense-storage > > It would be useful to segment sstables by data range (not just token range as > envisioned by CASSANDRA-6696). > The primary use case is to allow deleting those data ranges for "free" by > dropping the sstables involved. We should also (possibly as a separate > ticket) be able to leverage this information in query planning to avoid > unnecessary sstable reads. > Relational databases typically call this "partitioning" the table, but > obviously we use that term already for something else: > http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html > Tokutek's take for mongodb: > http://docs.tokutek.com/tokumx/tokumx-partitioned-collections.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11989) Rehabilitate Byte Ordered Partitioning
[ https://issues.apache.org/jira/browse/CASSANDRA-11989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-11989: --- Description: This is a placeholder ticket to aid in NGCC discussion and should lead to a design doc. The general idea is that Byte Ordered Partitoning is the only way to maximize locality (beyond the healthy size of a single partition). Because of random/murmur's inability to do so, BOP has intrinsic value, assuming the operational downside are eliminated. This ticket tries to address the operational challenges of BOP and proposes that it should be the default in the distant future. http://slides.com/tupshinharper/rehabilitating_bop https://docs.google.com/a/datastax.com/document/d/1zcvLbyZAebmvrqnKidpXlTtdICNox92pWYGKSd7SS7M/edit?usp=docslist_api was: This is a placeholder ticket to aid in NGCC discussion and should lead to a design doc. The general idea is that Byte Ordered Partitoning is the only way to maximize locality (beyond the healthy size of a single partition). Because of random/murmur's inability to do so, BOP has intrinsic value, assuming the operational downside are eliminated. This ticket tries to address the operational challenges of BOP and proposes that it should be the default in the distant future. http://slides.com/tupshinharper/rehabilitating_bop > Rehabilitate Byte Ordered Partitioning > -- > > Key: CASSANDRA-11989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11989 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > Fix For: 4.x > > > This is a placeholder ticket to aid in NGCC discussion and should lead to a > design doc. > The general idea is that Byte Ordered Partitoning is the only way to maximize > locality (beyond the healthy size of a single partition). Because of > random/murmur's inability to do so, BOP has intrinsic value, assuming the > operational downside are eliminated. This ticket tries to address the > operational challenges of BOP and proposes that it should be the default in > the distant future. > http://slides.com/tupshinharper/rehabilitating_bop > https://docs.google.com/a/datastax.com/document/d/1zcvLbyZAebmvrqnKidpXlTtdICNox92pWYGKSd7SS7M/edit?usp=docslist_api -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11989) Rehabilitate Byte Ordered Partitioning
[ https://issues.apache.org/jira/browse/CASSANDRA-11989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324754#comment-15324754 ] Tupshin Harper commented on CASSANDRA-11989: I'm envisioning that everything would be built off of low level "acquire_token" and "release_token" type operations, and that giving nodes the ability to dynamically perform those two operations safely will be a pre-requisite, so would require a gossip enhancement. I'm avoiding depending on any more complex semantics, and am working on mechanisms to to dynamically reallocate based on just those two primitives. > Rehabilitate Byte Ordered Partitioning > -- > > Key: CASSANDRA-11989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11989 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > Fix For: 4.x > > > This is a placeholder ticket to aid in NGCC discussion and should lead to a > design doc. > The general idea is that Byte Ordered Partitoning is the only way to maximize > locality (beyond the healthy size of a single partition). Because of > random/murmur's inability to do so, BOP has intrinsic value, assuming the > operational downside are eliminated. This ticket tries to address the > operational challenges of BOP and proposes that it should be the default in > the distant future. > http://slides.com/tupshinharper/rehabilitating_bop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11989) Rehabilitate Byte Ordered Partitioning
[ https://issues.apache.org/jira/browse/CASSANDRA-11989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-11989: --- Labels: ponies (was: ) Issue Type: Improvement (was: Bug) > Rehabilitate Byte Ordered Partitioning > -- > > Key: CASSANDRA-11989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11989 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > Fix For: 4.x > > > This is a placeholder ticket to aid in NGCC discussion and should lead to a > design doc. > The general idea is that Byte Ordered Partitoning is the only way to maximize > locality (beyond the healthy size of a single partition). Because of > random/murmur's inability to do so, BOP has intrinsic value, assuming the > operational downside are eliminated. This ticket tries to address the > operational challenges of BOP and proposes that it should be the default in > the distant future. > http://slides.com/tupshinharper/rehabilitating_bop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11989) Rehabilitate Byte Ordered Partitioning
Tupshin Harper created CASSANDRA-11989: -- Summary: Rehabilitate Byte Ordered Partitioning Key: CASSANDRA-11989 URL: https://issues.apache.org/jira/browse/CASSANDRA-11989 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tupshin Harper Fix For: 4.x This is a placeholder ticket to aid in NGCC discussion and should lead to a design doc. The general idea is that Byte Ordered Partitoning is the only way to maximize locality (beyond the healthy size of a single partition). Because of random/murmur's inability to do so, BOP has intrinsic value, assuming the operational downside are eliminated. This ticket tries to address the operational challenges of BOP and proposes that it should be the default in the distant future. http://slides.com/tupshinharper/rehabilitating_bop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319696#comment-15319696 ] Tupshin Harper commented on CASSANDRA-7622: --- +1 from me. I just wanted to make sure that write support would stay on the short term road map. Fine staging it that way. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319483#comment-15319483 ] Tupshin Harper commented on CASSANDRA-7622: --- As the filer of this ticket, I largely agree about the scope issue. I was surprised to see any notion of replication being discussed, because I didn't view persistence or cross-node awareness/aggregation as being a feature of virtual tables. I do think that exposing JMX provides the best initial use case, and would like to target that first. The higher level interface that Sylvain proposes is also very much in the right direction. That said, I disagree with one aspect. There's no reason to restrict the API to read only, even initially. Most JMX metrics are read only, and those would either be ignore, or raise an error, if they were attempted to be written to. But JMX metrix that arer settable, should be exposable as r/w (with separate read vs write permissions, of course). If an interface is designed sufficient to allow the elegant reading and writing of jmx metrics, it will be widely usable for many other plugins/virtual tables as well. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088221#comment-15088221 ] Tupshin Harper commented on CASSANDRA-8844: --- Relying on the Cassandra libs doesn't prevent you from copying the logs elsewhere and processing there, and doesn't require cassandra to be running on those machines. It does require the Java consumer to be implemented in a JVM language, however. I'm not fond of that last part, and would love it if we formalized the format, but I suppose I'll start by reverse engineering it. :) > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > processing > -- Be able to continu
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069834#comment-15069834 ] Tupshin Harper commented on CASSANDRA-8844: --- While I haven't really followed how MVs are doing mutation-based repair, your idea to go down that path mirrors my own thinking. to clarify, I believe there are two separate issues: 1) Currently, nothing, including repair, is able to cause a partially replicated CDC table to converge towards fully CDC-replicated, even when only worrying about delivering the latest copy and not caring about intermediate mutations 2) intermediate mutations aren't retained, and therefore any plausible fixes to #1, short of mutation-based repair, will still not recover all mutations that were applied to mutable-state columns. So +1 to [~JoshuaMcKenzie]'s suggestion. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they >
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066606#comment-15066606 ] Tupshin Harper commented on CASSANDRA-8844: --- It is designed to be RF copies for redundancy and high availability. If Cassandra were to deduplicate, and then the node that owned the remaining copy goes down, you have CDC data loss (failure to capture and send some data to a remote system). It is essential that the consumer be given enough capability that they can build a highly reliable system out of it. I believe that there will need to be a small number of reliably-enqueuing implementations built on top of CDC that will have any necessary de-dupe logic built in. What I would *most* like to see is a Kafka consumer of CDC that could then be used as the delivery mechanism to other systems. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > le
[jira] [Commented] (CASSANDRA-7464) Retire/replace sstable2json and json2sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730687#comment-14730687 ] Tupshin Harper commented on CASSANDRA-7464: --- With sstable2json going away with 3.0, but no activity nor timeline for this ticket, it seems like we are going to be left in a situation where we have no way to debug the contents of an sstable. This would seem to be a requirement for a 3.0-final release. > Retire/replace sstable2json and json2sstable > > > Key: CASSANDRA-7464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7464 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Priority: Minor > > Both tools are pretty awful. They are primarily meant for debugging (there is > much more efficient and convenient ways to do import/export data), but their > output manage to be hard to handle both for humans and for tools (especially > as soon as you have modern stuff like composites). > There is value to having tools to export sstable contents into a format that > is easy to manipulate by human and tools for debugging, small hacks and > general tinkering, but sstable2json and json2sstable are not that. > So I propose that we deprecate those tools and consider writing better > replacements. It shouldn't be too hard to come up with an output format that > is more aware of modern concepts like composites, UDTs, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631882#comment-14631882 ] Tupshin Harper edited comment on CASSANDRA-6477 at 7/17/15 8:55 PM: OK, so let me summarize my view of the conflicting viewpoints here # If the MV shares the same partition key (and only reorders the partition based on different clustering columns), then the problem is relatively easy. Unfortunately the general consensus is that a common case will be to have different partition keys in the MV than the base table, so we can't support only that easy case. # If the MV has a different partition key than the base table, then there are inherently more nodes involved in fulfilling the entire request, and we have to address that case. # As [~tjake] and [~jbellis] say, the more nodes involved in a query, the higher the risk of unavailability if the MV is updated synchronously. # Some use cases expect synchronous updates (as argued by [~rustyrazorblade] and [~brianmhess] # But others use cases definitely do not. I think it is absurd to say that just because a table has a MV, every write should care about the MV. Even more absurd to say that adding an MV to a table will reduce the availability of all writes to the base table. Given all of those, the conclusion that both sync and async forms are necessary seems totally unavoidable. Ideally, I'd like to see an extension of what [~iamaleksey] proposed above but be much more thorough and flexible about it. If each request were able to pass multiple consistency-level contracts to the coordinator, each one could represent the expectation for a separate callback at the driver level. e.g. A query to a table with a MV could express the following compound consistency levels. {noformat} {LQ, LOCAL_ONE{DC3,DC4}, LQ{MV1,MV2}} {noformat} That would tell the coordinator to deliver three separate notifications back to the client. One when LQ in the local dc was fulfilled. Another when at least one copy was delivered to each of DC3 and DC4, and another when LQ was fulfilled in the local dc for MV1 and MV2. and yes, you would need more flexible syntax that could express per-dc per table consistency, e.g. {noformat}LQ{DCs:DC3,DC4,VIEWS:MV1,MV2}{noformat} I realize that this is a very far-fetched proposal, but I wanted to throw it out there as, imo, it reflects the theoretically best option that fulfills everybody's requirements. (and is also a very general mechanism that could be used in other scenarios). Short of that, I don't think there is any choice but to support both sync and async forms of writes to tables with MVs. One more point(not to distract from the above). With the current design of MVs, there will always be risk of inconsistent reads (timeouts leaving data queryable in the primary table but not in one or more MVs) until the data is eventually propagated to the MV. While it would be at a high cost, RAMP would still be useful to be to provide read isolation in that scenario. was (Author: tupshin): OK, so let me summarize my view of the conflicting viewpoints here # If the MV shares the same partition key (and only reorders the partition based on different clustering columns), then the problem is relatively easy. Unfortunately the general consensus is that a common case will be to have different partition keys in the MV than the base table, so we can't support only that easy case. # If the MV has a different partition key than the base table, then there are inherently more nodes involved in fulfilling the entire request, and we have to address that case. # As [~tjake] and [~jbellis] say, the more nodes involved in a query, the higher the risk of unavailability if the MV is updated synchronously. # Some use cases expect synchronous updates (as argued by [~rustyrazorblade] and [~brianmhess] # But others use cases definitely do not. I think it is absurd to say that just because a table has a MV, every write should care about the MV. Even more absurd to say that adding an MV to a table will reduce the availability of all writes to the base table. Given all of those, the conclusion that both sync and async forms are necessary seems totally unavoidable. Ideally, I'd like to see an extension of what [~iamaleksey] proposed above but be much more thorough and flexible about it. If each request were able to pass multiple consistency-level contracts to the coordinator, each one could represent the expectation for a separate callback at the driver level. e.g. A query to a table with a MV could express the following compound consistency levels. {noformat} {LQ, LOCAL_ONE{DC3,DC4}, LQ{MV1,MV2}} {noformat} and yes, you would need more flexible syntax that could express per-dc per table consistency, e.g. {noformat}LQ{DCs:DC3,DC4,VIEWS:MV1,MV2}{noformat} That would tell the coordinator t
[jira] [Comment Edited] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631882#comment-14631882 ] Tupshin Harper edited comment on CASSANDRA-6477 at 7/17/15 8:55 PM: OK, so let me summarize my view of the conflicting viewpoints here # If the MV shares the same partition key (and only reorders the partition based on different clustering columns), then the problem is relatively easy. Unfortunately the general consensus is that a common case will be to have different partition keys in the MV than the base table, so we can't support only that easy case. # If the MV has a different partition key than the base table, then there are inherently more nodes involved in fulfilling the entire request, and we have to address that case. # As [~tjake] and [~jbellis] say, the more nodes involved in a query, the higher the risk of unavailability if the MV is updated synchronously. # Some use cases expect synchronous updates (as argued by [~rustyrazorblade] and [~brianmhess] # But others use cases definitely do not. I think it is absurd to say that just because a table has a MV, every write should care about the MV. Even more absurd to say that adding an MV to a table will reduce the availability of all writes to the base table. Given all of those, the conclusion that both sync and async forms are necessary seems totally unavoidable. Ideally, I'd like to see an extension of what [~iamaleksey] proposed above but be much more thorough and flexible about it. If each request were able to pass multiple consistency-level contracts to the coordinator, each one could represent the expectation for a separate callback at the driver level. e.g. A query to a table with a MV could express the following compound consistency levels. {noformat} {LQ, LOCAL_ONE{DC3,DC4}, LQ{MV1,MV2}} {noformat} and yes, you would need more flexible syntax that could express per-dc per table consistency, e.g. {noformat}LQ{DCs:DC3,DC4,VIEWS:MV1,MV2}{noformat} That would tell the coordinator to deliver three separate notifications back to the client. One when LQ in the local dc was fulfilled. Another when at least one copy was delivered to each of DC3 and DC4, and another when LQ was fulfilled in the local dc for MV1 and MV2. I realize that this is a very far-fetched proposal, but I wanted to throw it out there as, imo, it reflects the theoretically best option that fulfills everybody's requirements. (and is also a very general mechanism that could be used in other scenarios). Short of that, I don't think there is any choice but to support both sync and async forms of writes to tables with MVs. One more point(not to distract from the above). With the current design of MVs, there will always be risk of inconsistent reads (timeouts leaving data queryable in the primary table but not in one or more MVs) until the data is eventually propagated to the MV. While it would be at a high cost, RAMP would still be useful to be to provide read isolation in that scenario. was (Author: tupshin): OK, so let me summarize my view of the conflicting viewpoints here # If the MV shares the same partition key (and only reorders the partition based on different clustering columns), then the problem is relatively easy. Unfortunately the general consensus is that a common case will be to have different partition keys in the MV than the base table, so we can't support only that easy case. # If the MV has a different partition key than the base table, then there are inherently more nodes involved in fulfilling the entire request, and we have to address that case. # As [~tjake] and [~jbellis] say, the more nodes involved in a query, the higher the risk of unavailability if the MV is updated synchronously. # Some use cases expect synchronous updates (as argued by [~rustyrazorblade] and [~brianmhess] # But others use cases definitely do not. I think it is absurd to say that just because a table has a MV, every write should care about the MV. Even more absurd to say that adding an MV to a table will reduce the availability of all writes to the base table. Given all of those, the conclusion that both sync and async forms are necessary seems totally unavoidable. Ideally, I'd like to see an extension of what [~iamaleksey] proposed above but be much more thorough and flexible about it. If each request were able to pass multiple consistency-level contracts to the coordinator, each one could represent the expectation for a separate callback at the driver level. e.g. A query to a table with a MV could express the following compound consistency levels. {noformat} {LQ, LOCAL_ONE{DC3,DC4}, LQ{MV1,MV2}} {noformat} That would tell the coordinator to deliver three separate notifications back to the client. One when LQ in the local dc was fulfilled. Another when at least one copy was delivered to
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631882#comment-14631882 ] Tupshin Harper commented on CASSANDRA-6477: --- OK, so let me summarize my view of the conflicting viewpoints here # If the MV shares the same partition key (and only reorders the partition based on different clustering columns), then the problem is relatively easy. Unfortunately the general consensus is that a common case will be to have different partition keys in the MV than the base table, so we can't support only that easy case. # If the MV has a different partition key than the base table, then there are inherently more nodes involved in fulfilling the entire request, and we have to address that case. # As [~tjake] and [~jbellis] say, the more nodes involved in a query, the higher the risk of unavailability if the MV is updated synchronously. # Some use cases expect synchronous updates (as argued by [~rustyrazorblade] and [~brianmhess] # But others use cases definitely do not. I think it is absurd to say that just because a table has a MV, every write should care about the MV. Even more absurd to say that adding an MV to a table will reduce the availability of all writes to the base table. Given all of those, the conclusion that both sync and async forms are necessary seems totally unavoidable. Ideally, I'd like to see an extension of what [~iamaleksey] proposed above but be much more thorough and flexible about it. If each request were able to pass multiple consistency-level contracts to the coordinator, each one could represent the expectation for a separate callback at the driver level. e.g. A query to a table with a MV could express the following compound consistency levels. {noformat} {LQ, LOCAL_ONE{DC3,DC4}, LQ{MV1,MV2}} {noformat} That would tell the coordinator to deliver three separate notifications back to the client. One when LQ in the local dc was fulfilled. Another when at least one copy was delivered to each of DC3 and DC4, and another when LQ was fulfilled in the local dc for MV1 and MV2. I realize that this is a very far-fetched proposal, but I wanted to throw it out there as, imo, it reflects the theoretically best option that fulfills everybody's requirements. (and is also a very general mechanism that could be used in other scenarios). Short of that, I don't think there is any choice but to support both sync and async forms of writes to tables with MVs. One more point(not to distract from the above). With the current design of MVs, there will always be risk of inconsistent reads (timeouts leaving data queryable in the primary table but not in one or more MVs) until the data is eventually propagated to the MV. While it would be at a high cost, RAMP would still be useful to be to provide read isolation in that scenario. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631500#comment-14631500 ] Tupshin Harper commented on CASSANDRA-6477: --- Just a reminder (since it was a loong time ago in this ticket), that we were going to target immediate consistency once we could leverage RAMP, and not before. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626761#comment-14626761 ] Tupshin Harper commented on CASSANDRA-6477: --- I find myself disagreeing with the hard requirement that all rows in the table must show up in the materialized views. While it would be nice, I believe that clearly documenting the limitation and providing a couple of reasonable choices is far preferable then encouraging using rope sufficient to hang the user. My suggestion: * Create a formal notion of NOT NULL columns in the schema that can be applied to a table, irrespective of any MV usage. * Columns that are NOT NULL would have the exact same restrictions as PK columns, namely that they need to be included in all inserts and updates (with the possible exception of LWT updates) * Document (and warn in cqlsh) the fact that if you create a MV with a PK using a nullable column from the table, then those values will not be in the view It seems to me like this is a far less dangerous (and in many ways less surprising) than automatically creating a hotspot in the MV because lots of data with NULLs get added. Now with 8099 supporting NULLs for clustering columns, this might only apply to columns that would be a partition key in the MV, and that seems appealing. But I can't talk myself into liking inserting nulls into a MV partition key. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-6477: -- Comment: was deleted (was: I find myself disagreeing with the hard requirement that all rows in the table must show up in the materialized views. While it would be nice, I believe that clearly documenting the limitation and providing a couple of reasonable choices is far preferable then encouraging using rope sufficient to hang the user. My suggestion: * Create a formal notion of NOT NULL columns in the schema that can be applied to a table, irrespective of any MV usage. * Columns that are NOT NULL would have the exact same restrictions as PK columns, namely that they need to be included in all inserts and updates (with the possible exception of LWT updates) * Document (and warn in cqlsh) the fact that if you create a MV with a PK using a nullable column from the table, then those values will not be in the view It seems to me like this is a far less dangerous (and in many ways less surprising) than automatically creating a hotspot in the MV because lots of data with NULLs get added. Now with 8099 supporting NULLs for clustering columns, this might only apply to columns that would be a partition key in the MV, and that seems appealing. But I can't talk myself into liking inserting nulls into a MV partition key.) > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626760#comment-14626760 ] Tupshin Harper commented on CASSANDRA-6477: --- I find myself disagreeing with the hard requirement that all rows in the table must show up in the materialized views. While it would be nice, I believe that clearly documenting the limitation and providing a couple of reasonable choices is far preferable then encouraging using rope sufficient to hang the user. My suggestion: * Create a formal notion of NOT NULL columns in the schema that can be applied to a table, irrespective of any MV usage. * Columns that are NOT NULL would have the exact same restrictions as PK columns, namely that they need to be included in all inserts and updates (with the possible exception of LWT updates) * Document (and warn in cqlsh) the fact that if you create a MV with a PK using a nullable column from the table, then those values will not be in the view It seems to me like this is a far less dangerous (and in many ways less surprising) than automatically creating a hotspot in the MV because lots of data with NULLs get added. Now with 8099 supporting NULLs for clustering columns, this might only apply to columns that would be a partition key in the MV, and that seems appealing. But I can't talk myself into liking inserting nulls into a MV partition key. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9200) Sequences
[ https://issues.apache.org/jira/browse/CASSANDRA-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610346#comment-14610346 ] Tupshin Harper commented on CASSANDRA-9200: --- An example of an application domain where strictly increasing integers are required is the IMAP protocol. https://tools.ietf.org/html/rfc3501#page-8 where this is mandatory. {{A 32-bit value assigned to each message, which when used with the unique identifier validity value (see below) forms a 64-bit value that MUST NOT refer to any other message in the mailbox or any subsequent mailbox with the same name forever. Unique identifiers are assigned in a strictly ascending fashion in the mailbox; as each message is added to the mailbox it is assigned a higher UID than the message(s) which were added previously. Unlike message sequence numbers, unique identifiers are not necessarily contiguous.}} Building this kind of system on top of C* today requires an external CP system (ick operational complexity), though it is likely the case that the sequences here really only need to be modeled as clustering keys and not partition keys. > Sequences > - > > Key: CASSANDRA-9200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9200 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp > Fix For: 3.x > > > UUIDs are usually the right choice for surrogate keys, but sometimes > application constraints dictate an increasing numeric value. > We could do this by using LWT to reserve "blocks" of the sequence for each > member of the cluster, which would eliminate paxos contention at the cost of > not being strictly increasing. > PostgreSQL syntax: > http://www.postgresql.org/docs/9.4/static/sql-createsequence.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578794#comment-14578794 ] Tupshin Harper commented on CASSANDRA-7066: --- Users don't care about SSTables, users care about their data. It's unclear what, if any, impact this would have on the availability/existence of data. So a few questions about failure conditions, all of which would apply to a single node cluster, and with commitlog durability set to batch, for simplicity of discussion. Could this result in any circumstances where: # a write was acknowledged to be written (consistency level met), but then no longer exists on disk through this sstable cleanup/deletion? # a datum was queryable (through memtable or sstable read), but then is either no longer on disk or queryable? # a datum was deleted (tombstone?) and then comes back? # similar questions to above when a snapshot/backup occurred prior to the sstable cleanup, and restoration from that backup was necessary. If the answer to all of those is "no", then I have a hard time imagining any objections, though would love additional input from others. If yes, then huge problem. :) Given the reference to "partial results" above, I'd also like some clarity on whether that has had any user-facing impact of data availability/queryability. > Simplify (and unify) cleanup of compaction leftovers > > > Key: CASSANDRA-7066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania >Priority: Minor > Labels: compaction > Fix For: 3.x > > Attachments: 7066.txt > > > Currently we manage a list of in-progress compactions in a system table, > which we use to cleanup incomplete compactions when we're done. The problem > with this is that 1) it's a bit clunky (and leaves us in positions where we > can unnecessarily cleanup completed files, or conversely not cleanup files > that have been superceded); and 2) it's only used for a regular compaction - > no other compaction types are guarded in the same way, so can result in > duplication if we fail before deleting the replacements. > I'd like to see each sstable store in its metadata its direct ancestors, and > on startup we simply delete any sstables that occur in the union of all > ancestor sets. This way as soon as we finish writing we're capable of > cleaning up any leftovers, so we never get duplication. It's also much easier > to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539854#comment-14539854 ] Tupshin Harper commented on CASSANDRA-7622: --- An additional thought is that the capabilities framework (CASSANDRA-8303) could be used to to restrict the available commands that would make it through to the virtual table implementation. A possibly controversial example of this would be to only support UPDATE operations and not INSERT operations to semantically denote the fact that this table doesn't support adding new hosts, metrics, or attributes, but does support updating them. This wouldn't restrict all unsupported behavior, and the table implementation would still have to return errors if a read-only (or non-existent) attribute were updated, but it seems a bit cleaner than having the table claim to support INSERTs. A (maybe) less controversial use of 8303 would be to disallow all write operations (both UPDATE and INSERT as well as others) for tables that are truly read-only. And in the JMX case, it would certainly make sense to have different users have either SELECT only permissions or SELECT and UPDATE. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Benjamin Lerer > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539841#comment-14539841 ] Tupshin Harper edited comment on CASSANDRA-7622 at 5/12/15 1:54 PM: I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: {code} CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT metric_type, attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index {code} was (Author: tupshin): I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: {code} CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT metrics_type, attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index {code} > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Benjamin Lerer > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539841#comment-14539841 ] Tupshin Harper edited comment on CASSANDRA-7622 at 5/12/15 1:53 PM: I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: {code} CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT metrics_type, attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index {code} was (Author: tupshin): I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: {code} CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index {code} > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Benjamin Lerer > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539841#comment-14539841 ] Tupshin Harper commented on CASSANDRA-7622: --- I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Benjamin Lerer > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539841#comment-14539841 ] Tupshin Harper edited comment on CASSANDRA-7622 at 5/12/15 1:48 PM: I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: {code} CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index {code} was (Author: tupshin): I think the assumption should be that each virtual table supports a subset of the queries performed on regular tables. If the virtual table can support all operations great, but otherwise noops or unsupported exceptions should be fine if a given operation doesn't make sense for the table. The locality of the data (and whether distributed or not), should be internal to the implementation of each virtual table. Using JMX, I suggest this as a simplified starting point: CREATE TABLE jmx ( node_id uuid, metric_type text, attributes map, host_ip text static, PRIMARY KEY ((node_id), metric_type) ) CREATE INDEX host_by_ip ON jmx (host_ip) #this will work after CASSANDRA-8103 SELECT attributes FROM jmx where node_id = 'eedea3e3-e36d-4371-8937-57f5a8303165' #returns all metrics for a given node SELECT attributes FROM jmx where host_ip = '10.10.10.10' and metric_type='CompactionManager' #returns all compaction metrics for a given node, looking up the node by a pseudo secondary index > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Benjamin Lerer > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533578#comment-14533578 ] Tupshin Harper commented on CASSANDRA-7622: --- The correct decision to make JMX bind to localhost only for security reasons creates additional importance and urgency for this as a feature. I'd like to promote it from hand-wavey 3.x to more concrete 3.1 in hopes that it wouldn't slip from there. We really need to simplify the access patterns and reduce the surface area. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Benjamin Lerer > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9242) Add PerfDisableSharedMem to default JVM params
[ https://issues.apache.org/jira/browse/CASSANDRA-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512125#comment-14512125 ] Tupshin Harper commented on CASSANDRA-9242: --- Big plus one on this. Since that linked article came out, I've heard of a couple of cases where this was tried, and in each case, it helped with long tail latencies. > Add PerfDisableSharedMem to default JVM params > -- > > Key: CASSANDRA-9242 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9242 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Matt Stump > > We should add PerfDisableSharedMem to default JVM params. The JVM will save > stats to a memory mapped file when reaching a safepoint. This is performed > synchronously and the JVM remains paused while this action takes place. > Occasionally the OS will stall the calling thread while this happens > resulting in significant impact to worst case JVM pauses. By disabling the > save in the JVM these mysterious multi-second pauses disappear. > The behavior is outlined in [this > article|http://www.evanjones.ca/jvm-mmap-pause.html]. Another manifestation > is significant time spent in sys during GC pauses. In [the linked > test|http://cstar.datastax.com/graph?stats=762d9c2a-eace-11e4-8236-42010af0688f&metric=gc_max_ms&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=110.77&ymin=0&ymax=10421.4] > you'll notice multiple seconds spent in sys during the longest pauses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8692) Coalesce intra-cluster network messages
[ https://issues.apache.org/jira/browse/CASSANDRA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360966#comment-14360966 ] Tupshin Harper commented on CASSANDRA-8692: --- I commented on CASSANDRA-7032 that "It seems like there might be a way to constrain vnode RDF (replication distribution factor) in the general scope of this ticket as well." I feel like there are some very compelling availability arguments (in addition to these possible performance optimizations) in favor of being able to constrain how many other nodes (within a DC) that a given vnode-enabled node actually replicates with. e.g. you could have 256 vnodes, but guarantee that those 256 would only replicate to 32 (out of possibly thousands) of other nodes. > Coalesce intra-cluster network messages > --- > > Key: CASSANDRA-8692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8692 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 2.1.4 > > Attachments: batching-benchmark.png > > > While researching CASSANDRA-8457 we found that it is effective and can be > done without introducing additional latency at low concurrency/throughput. > The patch from that was used and found to be useful in a real life scenario > so I propose we implement this in 2.1 in addition to 3.0. > The change set is a single file and is small enough to be reviewable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330198#comment-14330198 ] Tupshin Harper commented on CASSANDRA-8844: --- To clarify what I think is the minimum viable feature set that Cassandra should support: # A DDL mechanism for turning on and off logging for a given table # Either file-based logging built in, or a pluggable interface where such logging could be built # If it's a pluggable interface, the ability to specify the classname of the logger in the DDL command Ideally, I'd love to see the pluggable interface to allow for other logging mechanisms, but for Cassandra itself to include a bare-bones logger that could be integrated with out of the box, and to serve as an example for how others should implement the interface. I certainly see the CQL delivery mechanism, as well as the more flexible logging (multiple logs per table along with filtering), as out of scope for this ticket. I would create another "future" one for both of those. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper > Fix For: 3.1 > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > -
[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-8844: -- Description: "In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. Also, Change data capture (CDC) is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources." -Wikipedia As Cassandra is increasingly being used as the Source of Record (SoR) for mission critical data in large enterprises, it is increasingly being called upon to act as the central hub of traffic and data flow to other systems. In order to try to address the general need, we (cc [~brianmhess]), propose implementing a simple data logging mechanism to enable per-table CDC patterns. h2. The goals: # Use CQL as the primary ingestion mechanism, in order to leverage its Consistency Level semantics, and in order to treat it as the single reliable/durable SoR for the data. # To provide a mechanism for implementing good and reliable (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) continuous semi-realtime feeds of mutations going into a Cassandra cluster. # To eliminate the developmental and operational burden of users so that they don't have to do dual writes to other systems. # For users that are currently doing batch export from a Cassandra system, give them the opportunity to make that realtime with a minimum of coding. h2. The mechanism: We propose a durable logging mechanism that functions similar to a commitlog, with the following nuances: - Takes place on every node, not just the coordinator, so RF number of copies are logged. - Separate log per table. - Per-table configuration. Only tables that are specified as CDC_LOG would do any logging. - Per DC. We are trying to keep the complexity to a minimum to make this an easy enhancement, but most likely use cases would prefer to only implement CDC logging in one (or a subset) of the DCs that are being replicated to - In the critical path of ConsistencyLevel acknowledgment. Just as with the commitlog, failure to write to the CDC log should fail that node's write. If that means the requested consistency level was not met, then clients *should* experience UnavailableExceptions. - Be written in a Row-centric manner such that it is easy for consumers to reconstitute rows atomically. - Written in a simple format designed to be consumed *directly* by daemons written in non JVM languages h2. Nice-to-haves I strongly suspect that the following features will be asked for, but I also believe that they can be deferred for a subsequent release, and to guage actual interest. - Multiple logs per table. This would make it easy to have multiple "subscribers" to a single table's changes. A workaround would be to create a forking daemon listener, but that's not a great answer. - Log filtering. Being able to apply filters, including UDF-based filters would make Casandra a much more versatile feeder into other systems, and again, reduce complexity that would otherwise need to be built into the daemons. h2. Format and Consumption - Cassandra would only write to the CDC log, and never delete from it. - Cleaning up consumed logfiles would be the client daemon's responibility - Logfile size should probably be configurable. - Logfiles should be named with a predictable naming schema, making it triivial to process them in order. - Daemons should be able to checkpoint their work, and resume from where they left off. This means they would have to leave some file artifact in the CDC log's directory. - A sophisticated daemon should be able to be written that could -- Catch up, in written-order, even when it is multiple logfiles behind in processing -- Be able to continuously "tail" the most recent logfile and get low-latency(ms?) access to the data as it is written. h2. Alternate approach In order to make consuming a change log easy and efficient to do with low latency, the following could supplement the approach outlined above - Instead of writing to a logfile, by default, Cassandra could expose a socket for a daemon to connect to, and from which it could pull each row. - Cassandra would have a limited buffer for storing rows, should the listener become backlogged, but it would immediately spill to disk in that case, never incurring large in-memory costs. h2. Additional consumption possibility With all of the above, still relevant: - instead (or in addition to) using the other logging mechanisms, use CQL transport itself as a logger. - Extend the CQL protoocol slightly so that rows of data can be return to a listener that didn't explicit make a query, but instead registered itself with Cassandra as a listener for a particular event
[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-8844: -- Description: "In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. Also, Change data capture (CDC) is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources." -Wikipedia As Cassandra is increasingly being used as the Source of Record (SoR) for mission critical data in large enterprises, it is increasingly being called upon to act as the central hub of traffic and data flow to other systems. In order to try to address the general need, we (cc [~brianmhess]), propose implementing a simple data logging mechanism to enable per-table CDC patterns. h2. The goals: # Use CQL as the primary ingestion mechanism, in order to leverage its Consistency Level semantics, and in order to treat it as the single reliable/durable SoR for the data. # To provide a mechanism for implementing good and reliable (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) continuous semi-realtime feeds of mutations going into a Cassandra cluster. # To eliminate the developmental and operational burden of users so that they don't have to do dual writes to other systems. # For users that are currently doing batch export from a Cassandra system, give them the opportunity to make that realtime with a minimum of coding. The mechanism: We propose a durable logging mechanism that functions similar to a commitlog, with the following nuances: - Takes place on every node, not just the coordinator, so RF number of copies are logged. - Separate log per table. - Per-table configuration. Only tables that are specified as CDC_LOG would do any logging. - Per DC. We are trying to keep the complexity to a minimum to make this an easy enhancement, but most likely use cases would prefer to only implement CDC logging in one (or a subset) of the DCs that are being replicated to - In the critical path of ConsistencyLevel acknowledgment. Just as with the commitlog, failure to write to the CDC log should fail that node's write. If that means the requested consistency level was not met, then clients *should* experience UnavailableExceptions. - Be written in a Row-centric manner such that it is easy for consumers to reconstitute rows atomically. - Written in a simple format designed to be consumed *directly* by daemons written in non JVM languages h2. Nice-to-haves I strongly suspect that the following features will be asked for, but I also believe that they can be deferred for a subsequent release, and to guage actual interest. - Multiple logs per table. This would make it easy to have multiple "subscribers" to a single table's changes. A workaround would be to create a forking daemon listener, but that's not a great answer. - Log filtering. Being able to apply filters, including UDF-based filters would make Casandra a much more versatile feeder into other systems, and again, reduce complexity that would otherwise need to be built into the daemons. h2. Format and Consumption - Cassandra would only write to the CDC log, and never delete from it. - Cleaning up consumed logfiles would be the client daemon's responibility - Logfile size should probably be configurable. - Logfiles should be named with a predictable naming schema, making it triivial to process them in order. - Daemons should be able to checkpoint their work, and resume from where they left off. This means they would have to leave some file artifact in the CDC log's directory. - A sophisticated daemon should be able to be written that could -- Catch up, in written-order, even when it is multiple logfiles behind in processing -- Be able to continuously "tail" the most recent logfile and get low-latency(ms?) access to the data as it is written. h2. Alternate approach In order to make consuming a change log easy and efficient to do with low latency, the following could supplement the approach outlined above - Instead of writing to a logfile, by default, Cassandra could expose a socket for a daemon to connect to, and from which it could pull each row. - Cassandra would have a limited buffer for storing rows, should the listener become backlogged, but it would immediately spill to disk in that case, never incurring large in-memory costs. h2. Additional consumption possibility With all of the above, still relevant: - instead (or in addition to) using the other logging mechanisms, use CQL transport itself as a logger. - Extend the CQL protoocol slightly so that rows of data can be return to a listener that didn't explicit make a query, but instead registered itself with Cassandra as a listener for a particular event typ
[jira] [Created] (CASSANDRA-8844) Change Data Capture (CDC)
Tupshin Harper created CASSANDRA-8844: - Summary: Change Data Capture (CDC) Key: CASSANDRA-8844 URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Tupshin Harper Fix For: 3.1 "In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. Also, Change data capture (CDC) is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources." -Wikipedia As Cassandra is increasingly being used as the Source of Record (SoR) for mission critical data in large enterprises, it is increasingly being called upon to act as the central hub of traffic and data flow to other systems. In order to try to address the general need, we (cc [~brianmhess]), propose implementing a simple data logging mechanism to enable per-table CDC patterns. h2. The goals: # Use CQL as the primary ingestion mechanism, in order to leverage its Consistency Level semantics, and in order to treat it as the single reliable/durable SoR for the data. # To provide a mechanism for implementing good and reliable (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) continuous semi-realtime feeds of mutations going into a Cassandra cluster. # To eliminate the developmental and operational burden of users so that they don't have to do dual writes to other systems. # For users that are currently doing batch export from a Cassandra system, give them the opportunity to make that realtime with a minimum of coding. The mechanism: We propose a durable logging mechanism that functions similar to a commitlog, with the following nuances: - Takes place on every node, not just the coordinator, so RF number of copies are logged. - Separate log per table. - Per-table configuration. Only tables that are specified as CDC_LOG would do any logging. - Per DC. We are trying to keep the complexity to a minimum to make this an easy enhancement, but most likely use cases would prefer to only implement CDC logging in one (or a subset) of the DCs that are being replicated to - In the critical path of ConsistencyLevel acknowledgment. Just as with the commitlog, failure to write to the CDC log should fail that node's write. If that means the requested consistency level was not met, then clients *should* experience UnavailableExceptions. - Be written in a Row-centric manner such that it is easy for consumers to reconstitute rows atomically. - Written in a simple format designed to be consumed *directly* by daemons written in non JVM languages h2. Nice-to-haves I strongly suspect that the following features will be asked for, but I also believe that they can be deferred for a subsequent release, and to guage actual interest. - Multiple logs per table. This would make it easy to have multiple "subscribers" to a single table's changes. A workaround would be to create a forking daemon listener, but that's not a great answer. - Log filtering. Being able to apply filters, including UDF-based filters would make Casandra a much more versatile feeder into other systems, and again, reduce complexity that would otherwise need to be built into the daemons. h2. Format and Consumption - Cassandra would only write to the CDC log, and never delete from it. - Cleaning up consumed logfiles would be the client daemon's responibility - Logfile size should probably be configurable. - Logfiles should be named with a predictable naming schema, making it triivial to process them in order. - Daemons should be able to checkpoint their work, and resume from where they left off. This means they would have to leave some file artifact in the CDC log's directory. - A sophisticated daemon should be able to be written that could -- Catch up, in written-order, even when it is multiple logfiles behind in processing -- Be able to continuously "tail" the most recent logfile and get low-latency(ms?) access to the data as it is written. h2. Alternate approach In order to make consuming a change log easy and efficient to do with low latency, the following could supplement the approach outlined above - Instead of writing to a logfile, by default, Cassandra could expose a socket for a daemon to connect to, and from which it could pull each row. - Cassandra would have a limited buffer for storing rows, should the listener become backlogged, but it would immediately spill to disk in that case, never incurring large in-memory costs. h2. Additional consumption possibility With all of the above, still relevant: - instead (or in addition to) using the other logging mechanisms, use CQL transport itself as a logger. - Extend the CQL protoocol slightly
[jira] [Commented] (CASSANDRA-8754) Required consistency level
[ https://issues.apache.org/jira/browse/CASSANDRA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309509#comment-14309509 ] Tupshin Harper commented on CASSANDRA-8754: --- something like a set of ALLOWED_CONSISTENCY_LEVELS (maybe separate ones for reads and writes?) per table. The biggest benefit would be to enforce sanity on LWT operations not mixing with non-LWT, but in general, useful to reduce the amount of rope users have to hang themselves with. > Required consistency level > -- > > Key: CASSANDRA-8754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8754 > Project: Cassandra > Issue Type: New Feature >Reporter: Ryan Svihla > > Idea is to prevent a query based on a consistency level not being met. For > example we can specify that all queries should be at least CL LOCAL_QUORUM. > Lots of customers struggle with getting all their dev teams on board with > consistency levels and all the ramifications. The normal solution for this > has traditionally to build a service in front of Cassandra that the entire > dev team accesses. However, this has proven challenging for some > organizations to do correctly, and I think an easier approach would be to > require a given consistency level as a matter of enforced policy in the > database. > I'm open for where this belongs. The most flexible approach is at a table > level, however I'm concerned this is potentially error prone and labor > intensive. It could be a table attribute similar to compaction strategy. > The simplest administratively is a cluster level, in say the cassandra.yaml > The middle ground is at they keyspace level, the only downside I could > foresee is keyspace explosion to fit involved minimum schemes. It could be a > keyspace attribute such as replication strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8754) Required consistency level
[ https://issues.apache.org/jira/browse/CASSANDRA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309493#comment-14309493 ] Tupshin Harper commented on CASSANDRA-8754: --- -1 on cluster level (too limiting, IMO), but big +1 for table level restrictions of CL > Required consistency level > -- > > Key: CASSANDRA-8754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8754 > Project: Cassandra > Issue Type: New Feature >Reporter: Ryan Svihla > > Idea is to prevent a query based on a consistency level not being met. > Lots of customers struggle with getting all their dev teams on board with > consistency levels and all the ramifications. The normal solution for this > has traditionally to build a service in front of Cassandra that the entire > dev team accesses. However, this has proven challenging for some > organizations to do correctly, and I think an easier approach would be to > require a given consistency level as a matter of enforced policy in the > database. > I'm open for where this belongs. The most flexible approach is at a table > level, however I'm concerned this is potentially error prone and labor > intensive. It could be a table attribute similar to compaction strategy. > The simplest administratively is a cluster level, in say the cassandra.yaml > The middle ground is at they keyspace level, the only downside I could > foresee is keyspace explosion to fit involved minimum schemes. It could be a > keyspace attribute such as replication strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8692) Coalesce intra-cluster network messages
[ https://issues.apache.org/jira/browse/CASSANDRA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299100#comment-14299100 ] Tupshin Harper commented on CASSANDRA-8692: --- At least 2.1 inclusion please. This is looking to be a pretty substantial win. > Coalesce intra-cluster network messages > --- > > Key: CASSANDRA-8692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8692 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Attachments: batching-benchmark.png > > > While researching CASSANDRA-8457 we found that it is effective and can be > done without introducing additional latency at low concurrency/throughput. > The patch from that was used and found to be useful in a real life scenario > so I propose we implement this in 2.1 in addition to 3.0. > The change set is a single file and is small enough to be reviewable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8586) support millions of sstables by lazily acquiring/caching/dropping filehandles
Tupshin Harper created CASSANDRA-8586: - Summary: support millions of sstables by lazily acquiring/caching/dropping filehandles Key: CASSANDRA-8586 URL: https://issues.apache.org/jira/browse/CASSANDRA-8586 Project: Cassandra Issue Type: New Feature Reporter: Tupshin Harper Assignee: Aleksey Yeschenko This might turn into a meta ticket if other obstacles are found in the goal of supporting a huge number of sstables. Technically, the only gap that I know of to prevent us from supporting absurd numbers of sstables is the fact that we hold on to an open filehandle for every single sstable. For use cases that are willing to take a hit to read-performance in order to achieve high densities and low write amplification, a mechanism for only retaining file handles for recently read sstables could be very valuable. This will allow for alternate compaction strategies and compaction strategy tuning that don't try to optimize for read performance as aggresively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7666) Range-segmented sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257751#comment-14257751 ] Tupshin Harper commented on CASSANDRA-7666: --- I think that it's sufficient to let this be dormant until or unless it is needed to support other features. DTCS covers most of the immediate benefit. Future possible features such as tiered storage and the ability to drop whole segments at a time, however, mean that we should not defer this one indefinitely. > Range-segmented sstables > > > Key: CASSANDRA-7666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7666 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Sam Tunnicliffe > Fix For: 3.0 > > > It would be useful to segment sstables by data range (not just token range as > envisioned by CASSANDRA-6696). > The primary use case is to allow deleting those data ranges for "free" by > dropping the sstables involved. We should also (possibly as a separate > ticket) be able to leverage this information in query planning to avoid > unnecessary sstable reads. > Relational databases typically call this "partitioning" the table, but > obviously we use that term already for something else: > http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html > Tokutek's take for mongodb: > http://docs.tokutek.com/tokumx/tokumx-partitioned-collections.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7275) Errors in FlushRunnable may leave threads hung
[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250554#comment-14250554 ] Tupshin Harper commented on CASSANDRA-7275: --- Strongly in favor of the opt in policy based approach that [~jbellis] mentioned. There isn't a one size fits all approach to deal with this > Errors in FlushRunnable may leave threads hung > -- > > Key: CASSANDRA-7275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Tyler Hobbs >Assignee: Pavel Yaskevich >Priority: Minor > Fix For: 2.0.12 > > Attachments: 0001-Move-latch.countDown-into-finally-block.patch, > 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch > > > In Memtable.FlushRunnable, the CountDownLatch will never be counted down if > there are errors, which results in hanging any threads that are waiting for > the flush to complete. For example, an error like this causes the problem: > {noformat} > ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line > 198) Exception in thread Thread[FlushWriter:474,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.position(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) > at > org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) > at > org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) > at > org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) > at > org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8494) incremental bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249114#comment-14249114 ] Tupshin Harper commented on CASSANDRA-8494: --- bq. I think the improved feedback will make a huge difference for people wondering if bootstrap is working! So much this. Would make the ticket worthwhile by itself. > incremental bootstrap > - > > Key: CASSANDRA-8494 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8494 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jon Haddad >Assignee: Yuki Morishita >Priority: Minor > Labels: density > Fix For: 3.0 > > > Current bootstrapping involves (to my knowledge) picking tokens and streaming > data before the node is available for requests. This can be problematic with > "fat nodes", since it may require 20TB of data to be streamed over before the > machine can be useful. This can result in a massive window of time before > the machine can do anything useful. > As a potential approach to mitigate the huge window of time before a node is > available, I suggest modifying the bootstrap process to only acquire a single > initial token before being marked UP. This would likely be a configuration > parameter "incremental_bootstrap" or something similar. > After the node is bootstrapped with this one token, it could go into UP > state, and could then acquire additional tokens (one or a handful at a time), > which would be streamed over while the node is active and serving requests. > The benefit here is that with the default 256 tokens a node could become an > active part of the cluster with less than 1% of it's final data streamed over. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230450#comment-14230450 ] Tupshin Harper commented on CASSANDRA-8371: --- FWIW, I believe that setting the parameter to less than a day will be a common case, and not an unusual one. For write-heavy, high velocity workloads, the additional read cost of reading from an extra (repair-created) sstable in the case of repair taking place after the segment is frozen will often be the correct optimization, in order to minimize write amplification at the expense of tiny additional read overhead. > DateTieredCompactionStrategy is always compacting > -- > > Key: CASSANDRA-8371 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8371 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: mck >Assignee: Björn Hegerfors > Labels: compaction, performance > Attachments: java_gc_counts_rate-month.png, > read-latency-recommenders-adview.png, read-latency.png, > sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png > > > Running 2.0.11 and having switched a table to > [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that > disk IO and gc count increase, along with the number of reads happening in > the "compaction" hump of cfhistograms. > Data, and generally performance, looks good, but compactions are always > happening, and pending compactions are building up. > The schema for this is > {code}CREATE TABLE search ( > loginid text, > searchid timeuuid, > description text, > searchkey text, > searchurl text, > PRIMARY KEY ((loginid), searchid) > );{code} > We're sitting on about 82G (per replica) across 6 nodes in 4 DCs. > CQL executed against this keyspace, and traffic patterns, can be seen in > slides 7+8 of https://prezi.com/b9-aj6p2esft/ > Attached are sstables-per-read and read-latency graphs from cfhistograms, and > screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), > to DTCS (week ~46). > These screenshots are also found in the prezi on slides 9-11. > [~pmcfadin], [~Bj0rn], > Can this be a consequence of occasional deleted rows, as is described under > (3) in the description of CASSANDRA-6602 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226474#comment-14226474 ] Tupshin Harper commented on CASSANDRA-7438: --- [~xedin] I'm lost in too many layers of snark and indirection (not just yours). Can you elaborate on what strategy you actually find appealling? > Serializing Row cache alternative (Fully off heap) > -- > > Key: CASSANDRA-7438 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux >Reporter: Vijay >Assignee: Vijay > Labels: performance > Fix For: 3.0 > > Attachments: 0001-CASSANDRA-7438.patch > > > Currently SerializingCache is partially off heap, keys are still stored in > JVM heap as BB, > * There is a higher GC costs for a reasonably big cache. > * Some users have used the row cache efficiently in production for better > results, but this requires careful tunning. > * Overhead in Memory for the cache entries are relatively high. > So the proposal for this ticket is to move the LRU cache logic completely off > heap and use JNI to interact with cache. We might want to ensure that the new > implementation match the existing API's (ICache), and the implementation > needs to have safe memory access, low overhead in memory and less memcpy's > (As much as possible). > We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226341#comment-14226341 ] Tupshin Harper commented on CASSANDRA-8371: --- And I'd also like to see an option to change max_sstable_age_days to be a smaller unit of time. Right now, you can only set it to integer days. Particularly with high ingestion rates, and low TTL, I see legitimate use cases where that could benefit from being as low as an hour, or even less, in order to minimize any write amplification. Just switching to use seconds as the unit of time here would make a lot of sense to me. 365 days would then be expressible as 31536. :) > DateTieredCompactionStrategy is always compacting > -- > > Key: CASSANDRA-8371 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8371 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: mck >Assignee: Björn Hegerfors > Labels: compaction, performance > Attachments: java_gc_counts_rate-month.png, read-latency.png, > sstables.png, vg2_iad-month.png > > > Running 2.0.11 and having switched a table to > [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that > disk IO and gc count increase, along with the number of reads happening in > the "compaction" hump of cfhistograms. > Data, and generally performance, looks good, but compactions are always > happening, and pending compactions are building up. > The schema for this is > {code}CREATE TABLE search ( > loginid text, > searchid timeuuid, > description text, > searchkey text, > searchurl text, > PRIMARY KEY ((loginid), searchid) > );{code} > We're sitting on about 82G (per replica) across 6 nodes in 4 DCs. > CQL executed against this keyspace, and traffic patterns, can be seen in > slides 7+8 of https://prezi.com/b9-aj6p2esft > Attached are sstables-per-read and read-latency graphs from cfhistograms, and > screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), > to DTCS (week ~46). > These screenshots are also found in the prezi on slides 9-11. > [~pmcfadin], [~Bj0rn], > Can this be a consequence of occasional deleted rows, as is described under > (3) in the description of CASSANDRA-6602 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7826) support arbitrary nesting of collection
[ https://issues.apache.org/jira/browse/CASSANDRA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200853#comment-14200853 ] Tupshin Harper commented on CASSANDRA-7826: --- Correct interpretation of my end goal. I'm neutral on the need/benefit of doing frozen/nested in 2.x. Personally I'd be OK deferring full nesting of collections until unfrozen nested support in 3.0. > support arbitrary nesting of collection > --- > > Key: CASSANDRA-7826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7826 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Tyler Hobbs > Labels: ponies > > The inability to nest collections is one of the bigger data modelling > limitations we have right now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8225) Production-capable COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194640#comment-14194640 ] Tupshin Harper commented on CASSANDRA-8225: --- fwiw, i agree wholeheartedly with sylvain. the cqlsh-based approach (executing python code) is a dead end for getting decent performance out of bulk loading. > Production-capable COPY FROM > > > Key: CASSANDRA-8225 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8225 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis > Fix For: 2.1.2 > > > Via [~schumacr], > bq. I pulled down a sourceforge data generator and created a moc file of > 500,000 rows that had an incrementing sequence number, date, and SSN. I then > used our COPY command and MySQL's LOAD DATA INFILE to load the file on my > Mac. Results were: > {noformat} > mysql> load data infile '/Users/robin/dev/datagen3.txt' into table p_test > fields terminated by ','; > Query OK, 50 rows affected (2.18 sec) > {noformat} > C* 2.1.0 (pre-CASSANDRA-7405) > {noformat} > cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with > delimiter=','; > 50 rows imported in 16 minutes and 45.485 seconds. > {noformat} > Cassandra 2.1.1: > {noformat} > cqlsh:dev> copy p_test from '/Users/robin/dev/datagen3.txt' with > delimiter=','; > Processed 50 rows; Write: 4037.46 rows/s > 50 rows imported in 2 minutes and 3.058 seconds. > {noformat} > [jbellis] 7405 gets us almost an order of magnitude improvement. > Unfortunately we're still almost 2 orders slower than mysql. > I don't think we can continue to tell people, "use sstableloader instead." > The number of users sophisticated enough to use the sstable writers is small > and (relatively) decreasing as our user base expands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8168) Require Java 8
[ https://issues.apache.org/jira/browse/CASSANDRA-8168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187203#comment-14187203 ] Tupshin Harper commented on CASSANDRA-8168: --- I'm also +1 on it, but more so if we can endorse openjdk8 (as opposed to just oracle jdk) from day 1. > Require Java 8 > -- > > Key: CASSANDRA-8168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8168 > Project: Cassandra > Issue Type: Task >Reporter: T Jake Luciani > Fix For: 3.0 > > > This is to discuss requiring Java 8 for version >= 3.0 > There are a couple big reasons for this. > * Better support for complex async work e.g (CASSANDRA-5239) > http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html > * Use Nashorn for Javascript UDFs CASSANDRA-7395 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-7028) Allow C* to compile under java 8
[ https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper reopened CASSANDRA-7028: --- > Allow C* to compile under java 8 > > > Key: CASSANDRA-7028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7028 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 2.1.1, 3.0 > > Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, > 7028_v5.patch > > > antlr 3.2 has a problem with java 8, as described here: > http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656 > updating to antlr 3.5.2 solves this, however they have split up the jars > differently, which adds some changes, but also the generation of > CqlParser.java causes a method to be too large, so i needed to split that > method to reduce the size of it. > (patch against trunk) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7849) Server logged error messages (in binary protocol) for unexpected exceptions could be more helpful
[ https://issues.apache.org/jira/browse/CASSANDRA-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125066#comment-14125066 ] Tupshin Harper commented on CASSANDRA-7849: --- Strong +1 to disabling those kinds of messages except at debug level. Less noise, please. > Server logged error messages (in binary protocol) for unexpected exceptions > could be more helpful > - > > Key: CASSANDRA-7849 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7849 > Project: Cassandra > Issue Type: Improvement >Reporter: graham sanderson > Fix For: 1.2.19, 2.0.11 > > Attachments: cassandra-1.2-7849.txt > > > From time to time (actually quite frequently) we get error messages in the > server logs like this > {code} > ERROR [Native-Transport-Requests:288] 2014-08-29 04:48:07,118 > ErrorMessage.java (line 222) Unexpected exception during request > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > These particular cases are almost certainly problems with the client driver, > client machine, client process, however after the fact this particular > exception is practically impossible to debug because there is no indication > in the underlying JVM/netty exception of who the peer was. I should note we > have lots of different types of applications running against the cluster so > it is very hard to correlate these to anything -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7857) Ability to froze UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117740#comment-14117740 ] Tupshin Harper edited comment on CASSANDRA-7857 at 9/1/14 9:39 PM: --- I'll suggest "fixed", as an alternative to frozen, static, or serialized. was (Author: tupshin): I'll sugget "fixed", as an alternative to frozen, static, or serialized. > Ability to froze UDT > > > Key: CASSANDRA-7857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7857 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.0 > > Attachments: 7857-v2.txt, 7857.txt > > > Currently, UDT are serialized into a single value. For 3.0, we want to change > that somewhat and allow updating individual subfields: CASSANDRA-7423 (and > ultimately, we'll probably allow querying subpart of UDT to some extend). > Also for 3.0, we want to allow some nesting of collections (CASSANDRA-7826). > However, migrating the currently serialized UDT would be challenging. Besides > that, even with nested collections, we probably won't be able to support > nesting within map keys and sets without serializing (at the very least, not > initially). Also, it can be useful in some specific case to have UDT or > collections for PK columns, even if those are serialized. > So we need a better way to distinguish when a composite types (collections & > UDT) are serialized (which imply you can't update subpart of the value, you > have to rewrite it fully) and when they are not. The suggestion is then to > introduce a new keyword, {{frozen}}, to indicate that a type is serialized: > {noformat} > CREATE TYPE foo (a int, b int); > CREATE TABLE bar ( > k frozen PRIMARY KEY, > m map>, text> > ) > {noformat} > A big advantage is that it makes the downside (you can't update the value > without rewriting it all) clear and upfront. > Now, as of 2.1, we only support frozen UDT, and so we should make this clear > by 1) adding the frozen keyword and 2) don't allow use of UDT unless they are > "frozen" (since that's all we really support). This is what this ticket > proposes to do. And this should be done in 2.1.0 or this will be a breaking > change. > We will have a follow-up ticket that will extend {{frozen}} to collection, > but this is less urgent since this will be strictly an improvement. > I'll note that in term of syntax, {{serialized}} was suggested as an > alternative to {{frozen}}. I personally have a minor preference for > {{serialized}} but it was argued that it had a "sequential" connotation which > {{frozen}} don't have. Changing that is still up for discussion, but we need > to reach a decision quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7857) Ability to froze UDT
[ https://issues.apache.org/jira/browse/CASSANDRA-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117740#comment-14117740 ] Tupshin Harper commented on CASSANDRA-7857: --- I'll sugget "fixed", as an alternative to frozen, static, or serialized. > Ability to froze UDT > > > Key: CASSANDRA-7857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7857 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.0 > > Attachments: 7857-v2.txt, 7857.txt > > > Currently, UDT are serialized into a single value. For 3.0, we want to change > that somewhat and allow updating individual subfields: CASSANDRA-7423 (and > ultimately, we'll probably allow querying subpart of UDT to some extend). > Also for 3.0, we want to allow some nesting of collections (CASSANDRA-7826). > However, migrating the currently serialized UDT would be challenging. Besides > that, even with nested collections, we probably won't be able to support > nesting within map keys and sets without serializing (at the very least, not > initially). Also, it can be useful in some specific case to have UDT or > collections for PK columns, even if those are serialized. > So we need a better way to distinguish when a composite types (collections & > UDT) are serialized (which imply you can't update subpart of the value, you > have to rewrite it fully) and when they are not. The suggestion is then to > introduce a new keyword, {{frozen}}, to indicate that a type is serialized: > {noformat} > CREATE TYPE foo (a int, b int); > CREATE TABLE bar ( > k frozen PRIMARY KEY, > m map>, text> > ) > {noformat} > A big advantage is that it makes the downside (you can't update the value > without rewriting it all) clear and upfront. > Now, as of 2.1, we only support frozen UDT, and so we should make this clear > by 1) adding the frozen keyword and 2) don't allow use of UDT unless they are > "frozen" (since that's all we really support). This is what this ticket > proposes to do. And this should be done in 2.1.0 or this will be a breaking > change. > We will have a follow-up ticket that will extend {{frozen}} to collection, > but this is less urgent since this will be strictly an improvement. > I'll note that in term of syntax, {{serialized}} was suggested as an > alternative to {{frozen}}. I personally have a minor preference for > {{serialized}} but it was argued that it had a "sequential" connotation which > {{frozen}} don't have. Changing that is still up for discussion, but we need > to reach a decision quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7826) support arbitrary nesting of collection
[ https://issues.apache.org/jira/browse/CASSANDRA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111690#comment-14111690 ] Tupshin Harper commented on CASSANDRA-7826: --- I'd much rather see it done right, with individual cell level access in 3.0 rather than rushed in. > support arbitrary nesting of collection > --- > > Key: CASSANDRA-7826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7826 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper >Assignee: Tyler Hobbs > Labels: ponies > > The inability to nest collections is one of the bigger data modelling > limitations we have right now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7028) Allow C* to compile under java 8
[ https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110777#comment-14110777 ] Tupshin Harper edited comment on CASSANDRA-7028 at 8/26/14 2:54 PM: Re-opening and adding additional 2.1.1 target for [~tuxslayer] was (Author: tupshin): Re-opening and adding additional 2.1.1 target for [~skyline81] > Allow C* to compile under java 8 > > > Key: CASSANDRA-7028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7028 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 2.1.1, 3.0 > > Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, > 7028_v5.patch > > > antlr 3.2 has a problem with java 8, as described here: > http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656 > updating to antlr 3.5.2 solves this, however they have split up the jars > differently, which adds some changes, but also the generation of > CqlParser.java causes a method to be too large, so i needed to split that > method to reduce the size of it. > (patch against trunk) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7028) Allow C* to compile under java 8
[ https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7028: -- Fix Version/s: 2.1.1 Assignee: Aleksey Yeschenko (was: Dave Brosius) Re-opening and adding additional 2.1.1 target for [~skyline81] > Allow C* to compile under java 8 > > > Key: CASSANDRA-7028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7028 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 2.1.1, 3.0 > > Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, > 7028_v5.patch > > > antlr 3.2 has a problem with java 8, as described here: > http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656 > updating to antlr 3.5.2 solves this, however they have split up the jars > differently, which adds some changes, but also the generation of > CqlParser.java causes a method to be too large, so i needed to split that > method to reduce the size of it. > (patch against trunk) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7826) support arbitrary nesting of collection
[ https://issues.apache.org/jira/browse/CASSANDRA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109449#comment-14109449 ] Tupshin Harper commented on CASSANDRA-7826: --- Actually, it might not require a specific nesting depth if you can nest the same UDT in itself (haven't tried since it doesn't matter in this case). > support arbitrary nesting of collection > --- > > Key: CASSANDRA-7826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7826 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > > The inability to nest collections is one of the bigger data modelling > limitations we have right now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7826) support arbitrary nesting of collection
[ https://issues.apache.org/jira/browse/CASSANDRA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109443#comment-14109443 ] Tupshin Harper commented on CASSANDRA-7826: --- UDT would require predefining a specific nesting depth, though that's not necessarily a huge obstacle. but without CASSANDRA-7423 I couldn't begin to recommend UDTs for most use cases. > support arbitrary nesting of collection > --- > > Key: CASSANDRA-7826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7826 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > > The inability to nest collections is one of the bigger data modelling > limitations we have right now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7826) support arbitrary nesting of collection
Tupshin Harper created CASSANDRA-7826: - Summary: support arbitrary nesting of collection Key: CASSANDRA-7826 URL: https://issues.apache.org/jira/browse/CASSANDRA-7826 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Fix For: 3.0 The inability to nest collections is one of the bigger data modelling limitations we have right now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7642) Adaptive Consistency
[ https://issues.apache.org/jira/browse/CASSANDRA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107052#comment-14107052 ] Tupshin Harper commented on CASSANDRA-7642: --- I don't like the minx/max consistency terminology in the context of: "Transparent downgrading violates the CL contract, and that contract considered be just about the most important element of Cassandra's runtime behaviour. Fully transparent downgrading without any contract is dangerous. However, would it be problem if we specify explicitly only two discrete CL levels - MIN_CL and MAX_CL?" I strongly believe that it is a problem even with only two explicit levels specified. As such, I propose two changes to the spec: 1) the terminology changes from min/max to terms representing "block until" for max and "actual contractual consistency level" for min. 2) Even more critically, ensure that the protocol and driver provide a communication mechanism back to the client for every operation, which of the two CL levels was fulfilled by the request. > Adaptive Consistency > > > Key: CASSANDRA-7642 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7642 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Rustam Aliyev > Fix For: 3.0 > > > h4. Problem > At minimum, application requires consistency level of X, which must be fault > tolerant CL. However, when there is no failure it would be advantageous to > use stronger consistency Y (Y>X). > h4. Suggestion > Application defines minimum (X) and maximum (Y) consistency levels. C* can > apply adaptive consistency logic to use Y whenever possible and downgrade to > X when failure occurs. > Implementation should not negatively impact performance. Therefore, state has > to be maintained globally (not per request). > h4. Example > {{MIN_CL=LOCAL_QUORUM}} > {{MAX_CL=EACH_QUORUM}} > h4. Use Case > Consider a case where user wants to maximize their uptime and consistency. > They designing a system using C* where transactions are read/written with > LOCAL_QUORUM and distributed across 2 DCs. Occasional inconsistencies between > DCs can be tolerated. R/W with LOCAL_QUORUM is satisfactory in most of the > cases. > Application requires new transactions to be read back right after they were > generated. Write and read could be done through different DCs (no > stickiness). In some cases when user writes into DC1 and reads immediately > from DC2, replication delay may cause problems. Transaction won't show up on > read in DC2, user will retry and create duplicate transaction. Occasional > duplicates are fine and the goal is to minimize number of dups. > Therefore, we want to perform writes with stronger consistency (EACH_QUORUM) > whenever possible without compromising on availability. Using adaptive > consistency they should be able to define: >{{Read CL = LOCAL_QUORUM}} >{{Write CL = ADAPTIVE (MIN:LOCAL_QUORUM, MAX:EACH_QUORUM)}} > Similar scenario can be described for {{Write CL = ADAPTIVE (MIN:QUORUM, > MAX:ALL)}} case. > h4. Criticism > # This functionality can/should be implemented by user himself. > bq. It will be hard for an average user to implement topology monitoring and > state machine. Moreover, this is a pattern which repeats. > # Transparent downgrading violates the CL contract, and that contract > considered be just about the most important element of Cassandra's runtime > behavior. > bq.Fully transparent downgrading without any contract is dangerous. However, > would it be problem if we specify explicitly only two discrete CL levels - > MIN_CL and MAX_CL? > # If you have split brain DCs (partitioned in CAP), you have to sacrifice > either consistency or availability, and auto downgrading sacrifices the > consistency in dangerous ways if the application isn't designed to handle it. > And if the application is designed to handle it, then it should be able to > handle it in normal circumstances, not just degraded/extraordinary ones. > bq. Agreed. Application should be designed for MIN_CL. In that case, MAX_CL > will not be causing much harm, only adding flexibility. > # It might be a better idea to loudly downgrade, instead of silently > downgrading, meaning that the client code does an explicit retry with lower > consistency on failure and takes some other kind of action to attempt to > inform either users or operators of the problem. The silent part of the > downgrading which could be dangerous. > bq. There are certainly cases where user should be informed when consistency > changes in order to perform custom action. For this purpose we could > allow/require user to register callback function which will be triggered when > consistency level changes. Best practices could be enforced by requiring > callback. -- This message
[jira] [Created] (CASSANDRA-7730) altering a table to add a static column bypasses clustering column requirement check
Tupshin Harper created CASSANDRA-7730: - Summary: altering a table to add a static column bypasses clustering column requirement check Key: CASSANDRA-7730 URL: https://issues.apache.org/jira/browse/CASSANDRA-7730 Project: Cassandra Issue Type: Bug Reporter: Tupshin Harper Fix For: 2.1.0 cqlsh:test_ks> create TABLE foo ( bar int, primary key (bar)); cqlsh:test_ks> alter table foo add bar2 text static; cqlsh:test_ks> describe table foo; CREATE TABLE foo ( bar int, bar2 text static, PRIMARY KEY ((bar)) ) cqlsh:test_ks> select * from foo; TSocket read 0 bytes ERROR [Thrift:12] 2014-08-09 15:08:22,518 CassandraDaemon.java (line 199) Exception in thread Thread[Thrift:12,5,main] java.lang.AssertionError at org.apache.cassandra.config.CFMetaData.getStaticColumnNameBuilder(CFMetaData.java:2142) at org.apache.cassandra.cql3.statements.SelectStatement.makeFilter(SelectStatement.java:454) at org.apache.cassandra.cql3.statements.SelectStatement.getRangeCommand(SelectStatement.java:360) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:206) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7370) Create a new system table "node_config" to load cassandra.yaml config data.
[ https://issues.apache.org/jira/browse/CASSANDRA-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074541#comment-14074541 ] Tupshin Harper commented on CASSANDRA-7370: --- While I'm very much in favor of this feature, I'd like to propose that the implementation get deferred and ultimately redone in terms of CASSANDRA-7622, so that we will have a more general mechanism for other similar needs. > Create a new system table "node_config" to load cassandra.yaml config data. > --- > > Key: CASSANDRA-7370 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7370 > Project: Cassandra > Issue Type: Wish > Components: Config >Reporter: Hayato Shimizu >Assignee: Robert Stupp >Priority: Minor > Labels: ponies > Attachments: 7370-v3.txt > > > Currently the node configuration information specified in cassandra.yaml can > only be viewed via JMX or by looking at the file on individual machines. > As an administrator, it would be extremely useful to be able to execute > queries like the following example; > select concurrent_reads from system.node_config; > which will list all the concurrent_reads value from all of the nodes in a > cluster. > This will require a new table in the system keyspace and the data to be > loaded (if required) during the bootstrap, and updated when MBeans attribute > value updates are performed. The data from other nodes in the cluster is also > required in the table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7622) Implement virtual tables
Tupshin Harper created CASSANDRA-7622: - Summary: Implement virtual tables Key: CASSANDRA-7622 URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Fix For: 3.0 There are a variety of reasons to want virtual tables, which would be any table that would be backed by an API, rather than data explicitly managed and stored as sstables. One possible use case would be to expose JMX data through CQL as a resurrection of CASSANDRA-3527. Another is a more general framework to implement the ability to expose yaml configuration information. So it would be an alternate approach to CASSANDRA-7370. A possible implementation would be in terms of CASSANDRA-7443, but I am not presupposing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7075: -- Fix Version/s: 3.0 > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7026) CQL:WHERE ... IN with full partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7026: -- Fix Version/s: (was: 3.0) > CQL:WHERE ... IN with full partition keys > - > > Key: CASSANDRA-7026 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7026 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Dan Hunt >Priority: Minor > Labels: cql > > It would be handy to be able to pass in a list of fully qualified composite > partition keys in an IN filter to retrieve multiple distinct rows with a > single select. Not entirely sure how that would work. It looks like maybe > it could be done with the existing token() function, like: > SELECT * FROM table WHERE token(keyPartA, keyPartB) IN (token(1, 1), token(4, > 2)) > Though, I guess you'd also want some way to pass a list of tokens to a > prepared statement through the driver. This of course all assumes that an IN > filter could be faster than a bunch of prepared statements, which might not > be true. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7026) CQL:WHERE ... IN with full partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7026: -- Fix Version/s: 3.0 > CQL:WHERE ... IN with full partition keys > - > > Key: CASSANDRA-7026 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7026 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Dan Hunt >Priority: Minor > Labels: cql > > It would be handy to be able to pass in a list of fully qualified composite > partition keys in an IN filter to retrieve multiple distinct rows with a > single select. Not entirely sure how that would work. It looks like maybe > it could be done with the existing token() function, like: > SELECT * FROM table WHERE token(keyPartA, keyPartB) IN (token(1, 1), token(4, > 2)) > Though, I guess you'd also want some way to pass a list of tokens to a > prepared statement through the driver. This of course all assumes that an IN > filter could be faster than a bunch of prepared statements, which might not > be true. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7471) sstableloader should have the ability to strip ttls
Tupshin Harper created CASSANDRA-7471: - Summary: sstableloader should have the ability to strip ttls Key: CASSANDRA-7471 URL: https://issues.apache.org/jira/browse/CASSANDRA-7471 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: Tupshin Harper Priority: Minor When restoring data from backup, for reasons of data recovery or analysis, if the data was set to TTL, then some or all of the data will be inaccessible unless you either force your entire cluster to have their clocks set in the past, or by slowly and painfully using sstable2json, stripping ttls there, and then json2sstable before loading. I propose a flag "-ignore-ttl" that could be based to sstableloader that would automatically strip any ttls from cells as they are loaded -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046902#comment-14046902 ] Tupshin Harper commented on CASSANDRA-7056: --- I also want to point out that [~iamaleksey]'s response to global indexes (CASSANDRA-6477) was: "I think we should leave it to people's client code. We don't need more complexity on our read/write paths when this can be done client-side." That combined with "alternatively, we just don't invent new unnecessary concepts (batch reads) to justify hypothetical things we could do that nobody asked us for" would leave us with absolutely no approach to achieve consistent cross-partition consistent indexes through either client or server-side code. > Add RAMP transactions > - > > Key: CASSANDRA-7056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Tupshin Harper >Priority: Minor > > We should take a look at > [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] > transactions, and figure out if they can be used to provide more efficient > LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7463) Update CQLSSTableWriter to allow parallel writing of SSTables on the same table within the same JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046082#comment-14046082 ] Tupshin Harper commented on CASSANDRA-7463: --- Since we push people towards doing SSTableLoading for fast import, and since the CQLSSTableWriter is the new shiny way to create sstables, we need to make it easy to generate sstables in parallel. High priority, imo. > Update CQLSSTableWriter to allow parallel writing of SSTables on the same > table within the same JVM > --- > > Key: CASSANDRA-7463 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7463 > Project: Cassandra > Issue Type: Improvement >Reporter: Johnny Miller > > Currently it is not possible to programatically write multiple SSTables for > the same table in parallel using the CQLSSTableWriter. This is quite a > limitation and the workaround of attempting to do this in a separate JVM is > not a great solution. > See: > http://stackoverflow.com/questions/24396902/using-cqlsstablewriter-concurrently -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044162#comment-14044162 ] Tupshin Harper commented on CASSANDRA-7056: --- I am absolutely fine with vetting it as part another feature (indexes) before exposing new API to provide explicit support for RAMP transactions. I'm simply refuting the "hypothetical things we could do that nobody asked us for" part. Just because nobody thought to ask for this specific form of consistency doesn't mean the practical benefits are at all unclear. > Add RAMP transactions > - > > Key: CASSANDRA-7056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Tupshin Harper >Priority: Minor > > We should take a look at > [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] > transactions, and figure out if they can be used to provide more efficient > LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043367#comment-14043367 ] Tupshin Harper edited comment on CASSANDRA-7056 at 6/25/14 12:25 PM: - Cross table consistent reads are of fundamental importance. Once you allow that they are useful for consistent index reads, then you have admitted that they are useful for for direct consumption by users, since we are constantly advising them to build their own index solutions since 2i are horrendously weak. That pressure will be only slightly reduced with global indexes. Even separate from custom (client-side) 2i implementations, having all or nothing read visibility of writes spanning tables captures fundamental business logic that is either painfully worked around today, or else is glossed over as statistically unlikely (depending on the r/w patterns) and the race conditions duly ignored. It would be a tragic mistake to ignore the benefits of the gains in correctness that can be achieved. was (Author: tupshin): Cross table consistent reads are of fundamental importance. Once you allow that they are useful for consistent index reads, then you have admitted that they are useful for for direction consumption by users, since we are constantly advising them to build their own index solutions since 2i are horrendously weak. That pressure will be only slightly reduced with global indexes. Even separate from custom (client-side) 2i implementations, having all or nothing read visibility of writes spanning tables captures fundamental business logic that is either painfully worked around today, or else is glossed over as statistically unlikely (depending on the r/w patterns) and the race conditions duly ignored. It would be a tragic mistake to ignore the benefits of the gains in correctness that can be achieved. > Add RAMP transactions > - > > Key: CASSANDRA-7056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Tupshin Harper >Priority: Minor > > We should take a look at > [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] > transactions, and figure out if they can be used to provide more efficient > LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043367#comment-14043367 ] Tupshin Harper edited comment on CASSANDRA-7056 at 6/25/14 12:05 PM: - Cross table consistent reads are of fundamental importance. Once you allow that they are useful for consistent index reads, then you have admitted that they are useful for for direction consumption by users, since we are constantly advising them to build their own index solutions since 2i are horrendously weak. That pressure will be only slightly reduced with global indexes. Even separate from custom (client-side) 2i implementations, having all or nothing read visibility of writes spanning tables captures fundamental business logic that is either painfully worked around today, or else is glossed over as statistically unlikely (depending on the r/w patterns) and the race conditions duly ignored. It would be a tragic mistake to ignore the benefits of the gains in correctness that can be achieved. was (Author: tupshin): Cross table consistent reads are of fundamental importance. Once you allow that they are useful for consistent index reads, then you have admitted that they are useful for for direction consumption by users, since we are constantly advising them to build their own index solutions since 2i are horrendously weak. That pressure will be only slightly reduced with global indexes. Even separate from custom (client-side) 2i implementations, having all or nothing read visibility of writes spanning partitions/tables captures fundamental business logic that is either painfully worked around today, or else is glossed over as statistically unlikely (depending on the r/w patterns) and the race conditions duly ignored. It would be a tragic mistake to ignore the benefits of the gains in correctness that can be achieved. > Add RAMP transactions > - > > Key: CASSANDRA-7056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Tupshin Harper >Priority: Minor > > We should take a look at > [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] > transactions, and figure out if they can be used to provide more efficient > LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043367#comment-14043367 ] Tupshin Harper commented on CASSANDRA-7056: --- Cross table consistent reads are of fundamental importance. Once you allow that they are useful for consistent index reads, then you have admitted that they are useful for for direction consumption by users, since we are constantly advising them to build their own index solutions since 2i are horrendously weak. That pressure will be only slightly reduced with global indexes. Even separate from custom (client-side) 2i implementations, having all or nothing read visibility of writes spanning partitions/tables captures fundamental business logic that is either painfully worked around today, or else is glossed over as statistically unlikely (depending on the r/w patterns) and the race conditions duly ignored. It would be a tragic mistake to ignore the benefits of the gains in correctness that can be achieved. > Add RAMP transactions > - > > Key: CASSANDRA-7056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Tupshin Harper >Priority: Minor > > We should take a look at > [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] > transactions, and figure out if they can be used to provide more efficient > LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7423) make user defined types useful for non-trivial use cases
[ https://issues.apache.org/jira/browse/CASSANDRA-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7423: -- Description: Since user defined types were implemented in CASSANDRA-5590 as blobs (you have to rewrite the entire type in order to make any modifications), they can't be safely used without LWT for any operation that wants to modify a subset of the UDT's fields by any client process that is not authoritative for the entire blob. When trying to use UDTs to model complex records (particularly with nesting), this is not an exceptional circumstance, this is the totally expected normal situation. The use of UDTs for anything non-trivial is harmful to either performance or consistency or both. edit: to clarify, i believe that most potential uses of UDTs should be considered anti-patterns until/unless we have field-level r/w access to individual elements of the UDT, with individual timestamps and standard LWW semantics was: Since user defined types were implemented in CASSANDRA-5590 as blobs (you have to rewrite the entire type in order to make any modifications), they can't be safely used without LWT for any operation that wants to modify a subset of the UDT's fields by any client process that is not authoritative for the entire blob. When trying to use UDTs to model complex records (particularly with nesting), this is not an exceptional circumstance, this is the totally expected normal situation. The use of UDTs for anything non-trivial is harmful to either performance or consistency or both. > make user defined types useful for non-trivial use cases > > > Key: CASSANDRA-7423 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7423 > Project: Cassandra > Issue Type: Improvement > Components: API, Core >Reporter: Tupshin Harper > > Since user defined types were implemented in CASSANDRA-5590 as blobs (you > have to rewrite the entire type in order to make any modifications), they > can't be safely used without LWT for any operation that wants to modify a > subset of the UDT's fields by any client process that is not authoritative > for the entire blob. > When trying to use UDTs to model complex records (particularly with nesting), > this is not an exceptional circumstance, this is the totally expected normal > situation. > The use of UDTs for anything non-trivial is harmful to either performance or > consistency or both. > edit: to clarify, i believe that most potential uses of UDTs should be > considered anti-patterns until/unless we have field-level r/w access to > individual elements of the UDT, with individual timestamps and standard LWW > semantics -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7370) Create a new system table "node_config" to load cassandra.yaml config data.
[ https://issues.apache.org/jira/browse/CASSANDRA-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039248#comment-14039248 ] Tupshin Harper commented on CASSANDRA-7370: --- i'm +1 on abusing the system with virtual/phantom tables. they are a well established rdbms pattern that is conceptually simple (albeit not particularly elegant) and well understood. reflection could be leveraged to eliminate the need to keep an up to date list > Create a new system table "node_config" to load cassandra.yaml config data. > --- > > Key: CASSANDRA-7370 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7370 > Project: Cassandra > Issue Type: Wish > Components: Config >Reporter: Hayato Shimizu >Priority: Minor > Labels: ponies > > Currently the node configuration information specified in cassandra.yaml can > only be viewed via JMX or by looking at the file on individual machines. > As an administrator, it would be extremely useful to be able to execute > queries like the following example; > select concurrent_reads from system.node_config; > which will list all the concurrent_reads value from all of the nodes in a > cluster. > This will require a new table in the system keyspace and the data to be > loaded (if required) during the bootstrap, and updated when MBeans attribute > value updates are performed. The data from other nodes in the cluster is also > required in the table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7423) make user defined types useful for non-trivial use cases
Tupshin Harper created CASSANDRA-7423: - Summary: make user defined types useful for non-trivial use cases Key: CASSANDRA-7423 URL: https://issues.apache.org/jira/browse/CASSANDRA-7423 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Tupshin Harper Since user defined types were implemented in CASSANDRA-5590 as blobs (you have to rewrite the entire type in order to make any modifications), they can't be safely used without LWT for any operation that wants to modify a subset of the UDT's fields by any client process that is not authoritative for the entire blob. When trying to use UDTs to model complex records (particularly with nesting), this is not an exceptional circumstance, this is the totally expected normal situation. The use of UDTs for anything non-trivial is harmful to either performance or consistency or both. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7156) Add a new seed provider for Apache Cloudstack platforms
[ https://issues.apache.org/jira/browse/CASSANDRA-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033815#comment-14033815 ] Tupshin Harper commented on CASSANDRA-7156: --- I'm fine with linking to external github repos to provide additional seed providers, at least initially. There just needs to be very clear and straightforward instructions for building and deploying them. > Add a new seed provider for Apache Cloudstack platforms > --- > > Key: CASSANDRA-7156 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7156 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: needs access to a cloudstack API endpoint >Reporter: Pierre-Yves Ritschard >Assignee: Michael Shuler >Priority: Minor > Fix For: 2.0.9, 2.1.1 > > Attachments: 0001-initial-work-on-a-cloudstack-seed-provider.patch > > > The attached patch adds a new seed provider which queries a cloudstack API > endpoint for instances having a specific tag. > The tag key and value can be controlled in the configuration file and > > will default to 'cassandra_seed' and 'default'. > > The Cloudstack endpoint is configured by three parameters in the > > configuration file: 'cloudstack_api_endpoint', 'cloudstack_api_key' and > > 'cloudstack_api_secret' > > By default, CloudstackSeedProvider fetchs the ipaddress of the first > > interface, if another index should be used, the nic_index parameter will hold > it. > A typical configuration file would thus have: > {code:yaml} > seed_provider: > - class_name: org.apache.cassandra.locator.CloudstackSeedProvider > parameters: > - cloudstack_api_endpoint: "https://some.cloudstack.host"; > cloudstack_api_key: "X" > cloudstack_api_secret: "X" > tag_value: "my_cluster_name" > {code} > This introduces no new dependency and together with CASSANDRA-7147 gives an > easy way of getting started on cloudstack platforms -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6178) Consider allowing timestamp at the protocol level ... and deprecating server side timestamps
[ https://issues.apache.org/jira/browse/CASSANDRA-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031203#comment-14031203 ] Tupshin Harper commented on CASSANDRA-6178: --- FWIW I am very negative on client side time stamps ever being mandatory. > Consider allowing timestamp at the protocol level ... and deprecating server > side timestamps > > > Key: CASSANDRA-6178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6178 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > > Generating timestamps server side by default for CQL has been done for > convenience, so that end-user don't have to provide one with every query. > However, doing it server side has the downside that updates made sequentially > by one single client (thread) are no guaranteed to have sequentially > increasing timestamps. Unless a client thread is always pinned to one > specific server connection that is, but no good client driver out (that is, > including thrit driver) there does that because that's contradictory to > abstracting fault tolerance to the driver user (and goes again most sane load > balancing strategy). > Very concretely, this means that if you write a very trivial test program > that sequentially insert a value and then erase it (or overwrite it), then, > if you let CQL pick timestamp server side, the deletion might not erase the > just inserted value (because the delete might reach a different coordinator > than the insert and thus get a lower timestamp). From the user point of view, > this is a very confusing behavior, and understandably so: if timestamps are > optional, you'd hope that they are least respect the sequentiality of > operation from a single client thread. > Of course we do support client-side assigned timestamps so it's not like the > test above is not fixable. And you could argue that's it's not a bug per-se. > Still, it's a very confusing "default" behavior for something very simple, > which suggest it's not the best default. > You could also argue that inserting a value and deleting/overwriting right > away (in the same thread) is not something real program often do. And indeed, > it's likely that in practice server-side timestamps work fine for most real > application. Still, it's too easy to get counter-intuitive behavior with > server-side timestamps and I think we should consider moving away from them. > So what I'd suggest is that we push back the job of providing timestamp > client side. But to make it easy for the driver to generate it (rather than > the end user), we should allow providing said timestamp at the protocol level. > As a side note, letting the client provide the timestamp would also have the > advantage of making it easy for the driver to retry failed operations with > their initial timestamp, so that retries are truly idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7362) Be able to selectively "mount" a snapshot of a table as a read-only version of that table
[ https://issues.apache.org/jira/browse/CASSANDRA-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7362: -- Description: When doing batch jobs (thinking hive and shark as prominent examples) or repeated analysis of the same data, it can be challenging to get a consistent result if the data is changing under your feet. Rather than the low level CASSSANDRA-2527, I propose that we add the capability to take a named snapshot (exact uuid in 2.1 and later), and be able to activate and deactivate it as a regular sstable (e.g. myks.mytable snapshot could be activated as myks.mytable-longuuid). That table would be queryable just like any other, but would not be writable. Any attempt to insert or update would throw an exception. (was: When doing batch jobs (thinking hive and shark as prominent examples) or repeated analysis of the same data, it can be challenging to get a consistent result if the data is changing under your feet. Rather than the low level CASSSANDRA-2527, I propose that we add the capability to take a named snapshot (exact uuid in 2.1 and later), and be able to activate and deactivate it as a regular sstable (e.g. myks.mytable snapshot could be activated as myks.mytable-longuuid). That table would be queryable just like any other, but would not be writable. Any attempt to insert or update would throw an exception. Because it would ) > Be able to selectively "mount" a snapshot of a table as a read-only version > of that table > - > > Key: CASSANDRA-7362 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7362 > Project: Cassandra > Issue Type: New Feature > Components: Core, Tools >Reporter: Tupshin Harper >Priority: Minor > Fix For: 3.0 > > > When doing batch jobs (thinking hive and shark as prominent examples) or > repeated analysis of the same data, it can be challenging to get a consistent > result if the data is changing under your feet. Rather than the low level > CASSSANDRA-2527, I propose that we add the capability to take a named > snapshot (exact uuid in 2.1 and later), and be able to activate and > deactivate it as a regular sstable (e.g. myks.mytable snapshot could be > activated as myks.mytable-longuuid). That table would be queryable just like > any other, but would not be writable. Any attempt to insert or update would > throw an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7362) Be able to selectively "mount" a snapshot of a table as a read-only version of that table
Tupshin Harper created CASSANDRA-7362: - Summary: Be able to selectively "mount" a snapshot of a table as a read-only version of that table Key: CASSANDRA-7362 URL: https://issues.apache.org/jira/browse/CASSANDRA-7362 Project: Cassandra Issue Type: New Feature Components: Core, Tools Reporter: Tupshin Harper Priority: Minor Fix For: 3.0 When doing batch jobs (thinking hive and shark as prominent examples) or repeated analysis of the same data, it can be challenging to get a consistent result if the data is changing under your feet. Rather than the low level CASSSANDRA-2527, I propose that we add the capability to take a named snapshot (exact uuid in 2.1 and later), and be able to activate and deactivate it as a regular sstable (e.g. myks.mytable snapshot could be activated as myks.mytable-longuuid). That table would be queryable just like any other, but would not be writable. Any attempt to insert or update would throw an exception. Because it would -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7306) Support "edge dcs" with more flexible gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010194#comment-14010194 ] Tupshin Harper commented on CASSANDRA-7306: --- #1 is definitely more ill-defined than it should be. The main thing I'd want to see is good overall cluster stability and behavior with 100s of spoke DCs that each could be offline up to 50% of the time (as a useful baseline). Until, and unless, that is formally tested, I don't have too much to add. > Support "edge dcs" with more flexible gossip > > > Key: CASSANDRA-7306 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > > As Cassandra clusters get bigger and bigger, and their topology becomes more > complex, there is more and more need for a notion of "hub" and "spoke" > datacenters. > One of the big obstacles to supporting hundreds (or thousands) of remote dcs, > is the assumption that all dcs need to talk to each other (and be connected > all the time). > This ticket is a vague placeholder with the goals of achieving: > 1) better behavioral support for occasionally disconnected datacenters > 2) explicit support for custom dc to dc routing. A simple approach would be > an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7306) Support "edge dcs" with more flexible gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010194#comment-14010194 ] Tupshin Harper edited comment on CASSANDRA-7306 at 5/27/14 7:56 PM: #1 is definitely more ill-defined than it should be. The main thing I'd want to see is good overall cluster stability and behavior with 100s of spoke DCs, where each DC could be offline up to 50% of the time (as a useful baseline). Until, and unless, that is formally tested, I don't have too much to add. was (Author: tupshin): #1 is definitely more ill-defined than it should be. The main thing I'd want to see is good overall cluster stability and behavior with 100s of spoke DCs that each could be offline up to 50% of the time (as a useful baseline). Until, and unless, that is formally tested, I don't have too much to add. > Support "edge dcs" with more flexible gossip > > > Key: CASSANDRA-7306 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > > As Cassandra clusters get bigger and bigger, and their topology becomes more > complex, there is more and more need for a notion of "hub" and "spoke" > datacenters. > One of the big obstacles to supporting hundreds (or thousands) of remote dcs, > is the assumption that all dcs need to talk to each other (and be connected > all the time). > This ticket is a vague placeholder with the goals of achieving: > 1) better behavioral support for occasionally disconnected datacenters > 2) explicit support for custom dc to dc routing. A simple approach would be > an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7306) Support "edge dcs" with more flexible gossip
Tupshin Harper created CASSANDRA-7306: - Summary: Support "edge dcs" with more flexible gossip Key: CASSANDRA-7306 URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper As Cassandra clusters get bigger and bigger, and their topology becomes more complex, there is more and more need for a notion of "hub" and "spoke" datacenters. One of the big obstacles to supporting hundreds (or thousands) of remote dcs, is the assumption that all dcs need to talk to each other (and be connected all the time). This ticket is a vague placeholder with the goals of achieving: 1) better behavioral support for occasionally disconnected datacenters 2) explicit support for custom dc to dc routing. A simple approach would be an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7306) Support "edge dcs" with more flexible gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper updated CASSANDRA-7306: -- Labels: ponies (was: ) > Support "edge dcs" with more flexible gossip > > > Key: CASSANDRA-7306 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tupshin Harper > Labels: ponies > > As Cassandra clusters get bigger and bigger, and their topology becomes more > complex, there is more and more need for a notion of "hub" and "spoke" > datacenters. > One of the big obstacles to supporting hundreds (or thousands) of remote dcs, > is the assumption that all dcs need to talk to each other (and be connected > all the time). > This ticket is a vague placeholder with the goals of achieving: > 1) better behavioral support for occasionally disconnected datacenters > 2) explicit support for custom dc to dc routing. A simple approach would be > an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7297) semi-immutable CQL rows
[ https://issues.apache.org/jira/browse/CASSANDRA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008902#comment-14008902 ] Tupshin Harper commented on CASSANDRA-7297: --- The functionality described in CASSANDRA-6412 would provide a super-set of this ticket. > semi-immutable CQL rows > --- > > Key: CASSANDRA-7297 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7297 > Project: Cassandra > Issue Type: Improvement > Components: API, Core >Reporter: Tupshin Harper > > There are many use cases, where data is immutable at the domain model level. > Most time-series/audit trail/logging applications fit this approach. > A relatively simple way to implement a bare-bones version of this would be to > have a table-level schema option for "first writer wins", so that in the > event of any conflict, the more recent version would be thrown on the floor. > Obviously, this is not failure proof in the face of inconsistent timestamps, > but that is a problem to be addressed outside of Cassandra. > Optional additional features could include logging any non-identical cells > discarded due to collision. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7297) semi-immutable CQL rows
Tupshin Harper created CASSANDRA-7297: - Summary: semi-immutable CQL rows Key: CASSANDRA-7297 URL: https://issues.apache.org/jira/browse/CASSANDRA-7297 Project: Cassandra Issue Type: Improvement Components: API, Core Reporter: Tupshin Harper There are many use cases, where data is immutable at the domain model level. Most time-series/audit trail/logging applications fit this approach. A relatively simple way to implement a bare-bones version of this would be to have a table-level schema option for "first writer wins", so that in the event of any conflict, the more recent version would be thrown on the floor. Obviously, this is not failure proof in the face of inconsistent timestamps, but that is a problem to be addressed outside of Cassandra. Optional additional features could include logging any non-identical cells discarded due to collision. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
Tupshin Harper created CASSANDRA-7296: - Summary: Add CL.COORDINATOR_ONLY Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-5394) Allow assigning disk quotas by keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008189#comment-14008189 ] Tupshin Harper commented on CASSANDRA-5394: --- Grouping together multitenant feature requests. There might be a "soft cap" approach to make this one viable. > Allow assigning disk quotas by keyspace > --- > > Key: CASSANDRA-5394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5394 > Project: Cassandra > Issue Type: New Feature >Reporter: J.B. Langston >Assignee: Tupshin Harper >Priority: Minor > > A customer is requesting this. They are implementing a multi-tenant Cassandra > Service offering. They want to limit the amount of diskspace that a user or > application can consume. They would also want to be able to modify the quota > after the keyspace is set up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (CASSANDRA-5394) Allow assigning disk quotas by keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper reassigned CASSANDRA-5394: - Assignee: Tupshin Harper > Allow assigning disk quotas by keyspace > --- > > Key: CASSANDRA-5394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5394 > Project: Cassandra > Issue Type: New Feature >Reporter: J.B. Langston >Assignee: Tupshin Harper >Priority: Minor > > A customer is requesting this. They are implementing a multi-tenant Cassandra > Service offering. They want to limit the amount of diskspace that a user or > application can consume. They would also want to be able to modify the quota > after the keyspace is set up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-841) Track statistics by user as well as CF
[ https://issues.apache.org/jira/browse/CASSANDRA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008188#comment-14008188 ] Tupshin Harper commented on CASSANDRA-841: -- Grouping together multitenant feature requests > Track statistics by user as well as CF > -- > > Key: CASSANDRA-841 > URL: https://issues.apache.org/jira/browse/CASSANDRA-841 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Tupshin Harper >Priority: Minor > Fix For: 0.8 beta 1 > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (CASSANDRA-841) Track statistics by user as well as CF
[ https://issues.apache.org/jira/browse/CASSANDRA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper reassigned CASSANDRA-841: Assignee: Tupshin Harper > Track statistics by user as well as CF > -- > > Key: CASSANDRA-841 > URL: https://issues.apache.org/jira/browse/CASSANDRA-841 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Tupshin Harper >Priority: Minor > Fix For: 0.8 beta 1 > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (CASSANDRA-2068) Improvements for Multi-tenant clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tupshin Harper reassigned CASSANDRA-2068: - Assignee: Tupshin Harper > Improvements for Multi-tenant clusters > -- > > Key: CASSANDRA-2068 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2068 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Goffinet >Assignee: Tupshin Harper >Priority: Minor > > It would be helpful if we could actually set some limits per CF to help > Multi-tenant clusters. Here are some ideas I was thinking: > (per CF) > 1. Set an upper bound (max) for count when slicing or multi/get calls > 2. Set an upper bound (max) for how much data in bytes can be returned > (64KB, 512KB, 1MB, etc) > This would introduce new exceptions that can be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6602) Compaction improvements to optimize time series data
[ https://issues.apache.org/jira/browse/CASSANDRA-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1401#comment-1401 ] Tupshin Harper commented on CASSANDRA-6602: --- Two comments: 1) Promising solution that I'd love to see validated and backported to at least 2.1, and if at all possible, all the way to 2.0.x 2) I don't want to end up closing the issue and losing track of the approaches benedict and I were talking about, so one or the other should become a new ticket. > Compaction improvements to optimize time series data > > > Key: CASSANDRA-6602 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6602 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Björn Hegerfors > Labels: compaction, performance > Fix For: 3.0 > > Attachments: > cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy.txt, > cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy_v2.txt > > > There are some unique characteristics of many/most time series use cases that > both provide challenges, as well as provide unique opportunities for > optimizations. > One of the major challenges is in compaction. The existing compaction > strategies will tend to re-compact data on disk at least a few times over the > lifespan of each data point, greatly increasing the cpu and IO costs of that > write. > Compaction exists to > 1) ensure that there aren't too many files on disk > 2) ensure that data that should be contiguous (part of the same partition) is > laid out contiguously > 3) deleting data due to ttls or tombstones > The special characteristics of time series data allow us to optimize away all > three. > Time series data > 1) tends to be delivered in time order, with relatively constrained exceptions > 2) often has a pre-determined and fixed expiration date > 3) Never gets deleted prior to TTL > 4) Has relatively predictable ingestion rates > Note that I filed CASSANDRA-5561 and this ticket potentially replaces or > lowers the need for it. In that ticket, jbellis reasonably asks, how that > compaction strategy is better than disabling compaction. > Taking that to heart, here is a compaction-strategy-less approach that could > be extremely efficient for time-series use cases that follow the above > pattern. > (For context, I'm thinking of an example use case involving lots of streams > of time-series data with a 5GB per day ingestion rate, and a 1000 day > retention with TTL, resulting in an eventual steady state of 5TB per node) > 1) You have an extremely large memtable (preferably off heap, if/when doable) > for the table, and that memtable is sized to be able to hold a lengthy window > of time. A typical period might be one day. At the end of that period, you > flush the contents of the memtable to an sstable and move to the next one. > This is basically identical to current behaviour, but with thresholds > adjusted so that you can ensure flushing at predictable intervals. (Open > question is whether predictable intervals is actually necessary, or whether > just waiting until the huge memtable is nearly full is sufficient) > 2) Combine the behaviour with CASSANDRA-5228 so that sstables will be > efficiently dropped once all of the columns have. (Another side note, it > might be valuable to have a modified version of CASSANDRA-3974 that doesn't > bother storing per-column TTL since it is required that all columns have the > same TTL) > 3) Be able to mark column families as read/write only (no explicit deletes), > so no tombstones. > 4) Optionally add back an additional type of delete that would delete all > data earlier than a particular timestamp, resulting in immediate dropping of > obsoleted sstables. > The result is that for in-order delivered data, Every cell will be laid out > optimally on disk on the first pass, and over the course of 1000 days and 5TB > of data, there will "only" be 1000 5GB sstables, so the number of filehandles > will be reasonable. > For exceptions (out-of-order delivery), most cases will be caught by the > extended (24 hour+) memtable flush times and merged correctly automatically. > For those that were slightly askew at flush time, or were delivered so far > out of order that they go in the wrong sstable, there is relatively low > overhead to reading from two sstables for a time slice, instead of one, and > that overhead would be incurred relatively rarely unless out-of-order > delivery was the common case, in which case, this strategy should not be used. > Another possible optimization to address out-of-order would be to maintain > more than one time-centric memtables in memory at a time (e.g. two 12 hour > ones), and then you always ins
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989553#comment-13989553 ] Tupshin Harper commented on CASSANDRA-6696: --- It does, thanks. > Drive replacement in JBOD can cause data to reappear. > -- > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989543#comment-13989543 ] Tupshin Harper commented on CASSANDRA-6696: --- I may be misunderstanding, but this seems to be optimizing for compaction throughput/parallelization, but at the expense of doing more total compaction activity (number of compactions per mutation over the life of that mutation, a form of write-amplification) by starting with smaller tables. If that's not the case, then please ignore, but it is important to note that for the largest scale, highest velocity, longest retained use cases, it's the number of recompactions/write amplification that really hurts. > Drive replacement in JBOD can cause data to reappear. > -- > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7136) Change default paths to ~ instead of /var
[ https://issues.apache.org/jira/browse/CASSANDRA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987842#comment-13987842 ] Tupshin Harper commented on CASSANDRA-7136: --- +1 > Change default paths to ~ instead of /var > - > > Key: CASSANDRA-7136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7136 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Albert P Tobey > Fix For: 2.1.0 > > > Defaulting to /var makes it more difficult for both multi-user systems and > people unfamiliar with the command line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987835#comment-13987835 ] Tupshin Harper commented on CASSANDRA-6696: --- They are basically splittable and resizable vnodes if you were to use shuffled vnodes with a byte ordered partitioner. Which makes them have more in common with CQL partitions than with vnodes, from a "range of data" point of view. Except that the size of the ranges don't vary with the data model like they do with Cassandra. > Drive replacement in JBOD can cause data to reappear. > -- > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987837#comment-13987837 ] Tupshin Harper commented on CASSANDRA-6696: --- Hbase actually has pluggable compaction strategies these days. > Drive replacement in JBOD can cause data to reappear. > -- > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Fix For: 3.0 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7136) Change default paths to ~ instead of /var
[ https://issues.apache.org/jira/browse/CASSANDRA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987799#comment-13987799 ] Tupshin Harper commented on CASSANDRA-7136: --- $CASSANDRA_HOME, and if not set, extracted_location/data. That's the only right answer. > Change default paths to ~ instead of /var > - > > Key: CASSANDRA-7136 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7136 > Project: Cassandra > Issue Type: Bug >Reporter: Jonathan Ellis >Assignee: Albert P Tobey > Fix For: 2.1.0 > > > Defaulting to /var makes it more difficult for both multi-user systems and > people unfamiliar with the command line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-3783) Add 'null' support to CQL 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978277#comment-13978277 ] Tupshin Harper commented on CASSANDRA-3783: --- Hi Dmytro, This ticket contracted from its original scope and turned into just support for upserting a null actually performing a delete operation on the cell. There is currently no select support for indexed nulls, and given the design of Cassandra, is considered a difficult/prohibitive problem. > Add 'null' support to CQL 3.0 > - > > Key: CASSANDRA-3783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3783 > Project: Cassandra > Issue Type: Sub-task > Components: API >Reporter: Sylvain Lebresne >Assignee: Michał Michalski >Priority: Minor > Labels: cql3 > Fix For: 1.2.4 > > Attachments: 3783-v2.patch, 3783-v3.patch, 3783-v4.txt, 3783-v5.txt, > 3783-wip-v1.patch > > > Dense composite supports adding records where only a prefix of all the > component specifying the key is defined. In other words, with: > {noformat} > CREATE TABLE connections ( >userid int, >ip text, >port int, >protocol text, >time timestamp, >PRIMARY KEY (userid, ip, port, protocol) > ) WITH COMPACT STORAGE > {noformat} > you can insert > {noformat} > INSERT INTO connections (userid, ip, port, time) VALUES (2, '192.168.0.1', > 80, 123456789); > {noformat} > You cannot however select that column specifically (i.e, without selecting > column (2, '192.168.0.1', 80, 'http') for instance). > This ticket proposes to allow that though 'null', i.e. to allow > {noformat} > SELECT * FROM connections WHERE userid = 2 AND ip = '192.168.0.1' AND port = > 80 AND protocol = null; > {noformat} > It would then also make sense to support: > {noformat} > INSERT INTO connections (userid, ip, port, protocol, time) VALUES (2, > '192.168.0.1', 80, null, 123456789); > {noformat} > as an equivalent to the insert query above. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
Tupshin Harper created CASSANDRA-7075: - Summary: Add the ability to automatically distribute your commitlogs across all data volumes Key: CASSANDRA-7075 URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 Project: Cassandra Issue Type: New Feature Components: Core Environment: given the prevalance of ssds (no need to separate commitlog and data), and improved jbod support, along with [#3578|https://issues.apache.org/jira/browse/CASSANDRA-3578], it seems like we should have an option to have one commitlog per data volume, to even the load. i've been seeing more and more cases where there isn't an obvious "extra" volume to put the commitlog on, and sticking it on only one of the jbodded ssd volumes leads to IO imbalance. Reporter: Tupshin Harper Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)