[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563258#comment-15563258 ] Brian Hess commented on CASSANDRA-7296: Consistency Level does feel like the right approach. The THIS_ prefix is in line with LOCAL_ in that it would identify the locus of nodes that are available for consistency. With LOCAL_ONE, we need just one replica from the data center of this coordinator. If no replicas exist (like the RF=0) then you get UnavailableException. Namely, you don't reach out to other nodes and proxy for another DC, etc. Also note that while the client can certainly see that it's talking to a DC with no RF by looking at system tables or driver API calls, we still throw the UnavailableException. In the THIS_ONE, we are saying that the locus of available nodes for consistency level is just the coordonator itself. If that node is not a replica, then it should also throw an UnavailableException. It should not silently go ask the actual replicas, just like in the LOCAL_ONE case we don't ask other DCs. While it is true that the client could know that this node is not a replica, it is the same as in LOCAL_ONE and RF. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562407#comment-15562407 ] Brian Hess commented on CASSANDRA-7296: Just my 2 cents, but having this be a per-table option is not really a great solution for the debugging issue. I'd be okay if we had that as a default, but we'd certainly need to support having this behavior even if that table option isn't set (it would be unfortunate to have to ALTER the table to get this behavior). > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12216) TTL Reading And Writing is Asymmetric
[ https://issues.apache.org/jira/browse/CASSANDRA-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384557#comment-15384557 ] Brian Hess commented on CASSANDRA-12216: - This looks like it should apply to previous versions, going back to 2.1 even. The only behavior change is an undocumented behavior of throwing an InvalidQueryException when providing a null TTL. > TTL Reading And Writing is Asymmetric > -- > > Key: CASSANDRA-12216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12216 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Russell Spitzer >Assignee: Russell Spitzer >Priority: Minor > Attachments: 12216-3.7-2.txt, 12216-3.7.txt > > > There is an inherent asymmetry in the way TTL's are read and Written. > An `TTL` of 0 when written becomes a `null` in C* > When read, this `TTL` becomes a `null` > The `null` cannot be written back to C* as `TTL` > This means that end users attempting to copy tables with TTL have to do > manual mapping of the null TTL values to 0 to avoid NPE. This is a bit > onerous when C* seems to have an internal logic that 0 == NULL. I don't think > C* should return values which are not directly insertable back to C*. > Even with the advent CASSANDRA-7304 this still remains a problem that the > User needs to be aware of and take care of. > The following prepared statement > {code} > INSERT INTO test.table2 (k,v) (?,?) USING TTL: ? > {code} > Will throw NPEs unless we specifically check that the value to be bound to > TTL is not null. > I think we should discuss whether `null` should be treated as 0 in TTL for > prepared statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335986#comment-15335986 ] Brian Hess commented on CASSANDRA-11873: - That largely covers my thoughts, with a couple things: 1. Change "us" to "u" to be consistent with InfluxDB, and to keep the abbreviations consistently to 1 letter. 2. Change the "ns" to "n" to keep all the abbreviations consistently to one letter. 3. Change "y" to "Y" so year and month are both capital. I'm not sure exactly, but I would be inclined to changing "d" to "D" and "w" to "W" so units smaller than a day are lower-case and the others are upper-case, for user-friendliness. Two other suggestions/questions: a) Is it possible to change Month or Minute to be something other than M/m so that we could make these units case insensitive? I don't have a great suggestion there, but I have no good suggestion there (maybe M for month and N for minute? Like I said, not great). b) What CQL type has nanosecond resolution? Is there any way someone can do any arithmetic on a CQL type where it would do anything with "4ns"? now() is a timestamp which has millisecond resolution. So, I actually don't see the data type that has anything fine grained enough here. If that's the case we can get rid of "ns". I'm not sure what had microsecond resolution either now that I look at our docs, so perhaps the same for "u". > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334052#comment-15334052 ] Brian Hess commented on CASSANDRA-11873: - I will save the discussion/debate on the relationship between CQL and SQL to another venue. The reason to bring it up is in the context of user/developer experience and usability. If SQL has an approach then we should consider it, but if we can do better then by all means we should do that instead (which I think nobody is debating). A few comments: 1. We should certainly consider the month and year durations. These are common uses and we should at least sketch out how we would support that (if not also implement it in this ticket - which I think we should do). 2. How would we abbreviate the example that Sylvain proposes "1 year 2 months 3 days 4 hours 5 minutes 6 seconds"? Specifically, what is the abbreviation for months and minutes? ISO 8601 has M for both, but the P/T format allows for disambiguation. 3. With respect to ISO 8601 that Postgres does also support, if someone bothers to read the CQL documentation on Date formats for Timestamp types he will find that it states "A timestamp type can be entered as an integer for CQL input, or as a string literal in any of the following ISO 8601 formats" (https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timestamp_type_r.html). So, C* already chose ISO 8601 for Date formats. For consistency with CQL itself we should seriously consider making the same choice for durations. 4. According to the C* documentation, the TIMESTAMP data type, which is what is returned from the Now() call, is the "number of milliseconds since the standard base time known as the epoch". How are we going to support microseconds and nanoseconds? Even Version 1 UUIDs (UUID/TimeUUID format for C*) don't support nanosecond resolution. 5. If we choose to stick with the current bespoke syntax, I suggest moving at least to the Influx format. That leaves 2 items: a) change microseconds from "us" to "u", which is what Influx uses b) support weeks with the "w" abbreviation. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333529#comment-15333529 ] Brian Hess edited comment on CASSANDRA-11873 at 6/16/16 10:27 AM: --- Being subtlety different on syntax is in some cases worse than being very different. So, if we are thinking we will go with ISO 8601 format (an option that could make sense - it is a widely recognized format and present in more than a few systems (not just databases, I mean)) then we should make sure we include the "P" and the "T". While Postgres does support ISO 8601 formats (of course I bothered to read it), in that format the highest resolution is seconds. There is a good reason to want milliseconds and microseconds (and maybe nanoseconds). The standard Postgres format support all of these (with the exception of nanoseconds, though that addition to their format would be straightforward to understand). If you want to shorten the Postgres format to save typing, what abbreviation do you propose for "minute" and "month"? I will certainly agree that the Oracle syntax is not user-friendly. I think arguing it is desirable is a stretch. I have a reservation on the Influx syntax here, though. Influx does not support month or year. They only have up to week (https://docs.influxdata.com/influxdb/v0.13/query_language/data_exploration/#relative-time). So, it is not possible to say "now() - 2 months" or "now() - 1 year". To do 1 year, what would you do? "now() - 365d"? What about leap year? What about going back one month? In fact, if one bothers to read https://docs.influxdata.com/influxdb/v0.13/query_language/data_exploration/#time-syntax-in-queries, he would find that this patch only had a subset of Influx's supported format. I don't see a week unit. Moreover, Influx doesn't use "us", it uses just "u". So, our proposed syntax isn't even consistent (in subtle ways) with Influx's format. Let alone that Influx's format is incomplete (specifically, no support for months and years). Of the formats I've seen here, Postgres native format is the most user friendly, and accomplishes the goals of durations for us. I'm (non-PMC, non-binding) -1 on the currently proposed format from a usability/product/developer POV. was (Author: brianmhess): Being subtlety different on syntax is in some cases worse than being very different. So, if we are thinking we will go with ISO 8601 format (an option that could make sense - it is a widely recognized format and present in more than a few systems (not just databases, I mean)) then we should make sure we include the "P" and the "T". While Postgres does support ISO 8601 formats (of course I bothered to read it), in that format the highest resolution is seconds. There is a good reason to want milliseconds and microseconds (and maybe nanoseconds). The standard Postgres format support all of these (with the exception of nanoseconds, though that addition to their format would be straightforward to understand). If you want to shorten the Postgres format to save typing, what abbreviation do you propose for "minute" and "month"? I will certainly agree that the Oracle syntax is not user-friendly. I think arguing it is desirable is a stretch. I have a reservation on the Influx syntax here, though. Influx does not support month or year. They only have up to week (https://docs.influxdata.com/influxdb/v0.13/query_language/data_exploration/#relative-time). So, it is not possible to say "now() - 2 months" or "now() - 1 year". To do 1 year, what would you do? "now() - 365d"? What about leap year? What about going back one month? In fact, this patch only had a subset of Influx's supported format. I don't see a week unit. Moreover, Influx doesn't use "us", it uses just "u". So, our proposed syntax isn't even consistent (in subtle ways) with Influx's format. Let alone that Influx's format is incomplete (specifically, no support for months and years). Of the formats I've seen here, Postgres native format is the most user friendly, and accomplishes the goals of durations for us. I'm (non-PMC, non-binding) -1 on the currently proposed format from a usability/product/developer POV. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333529#comment-15333529 ] Brian Hess commented on CASSANDRA-11873: - Being subtlety different on syntax is in some cases worse than being very different. So, if we are thinking we will go with ISO 8601 format (an option that could make sense - it is a widely recognized format and present in more than a few systems (not just databases, I mean)) then we should make sure we include the "P" and the "T". While Postgres does support ISO 8601 formats (of course I bothered to read it), in that format the highest resolution is seconds. There is a good reason to want milliseconds and microseconds (and maybe nanoseconds). The standard Postgres format support all of these (with the exception of nanoseconds, though that addition to their format would be straightforward to understand). If you want to shorten the Postgres format to save typing, what abbreviation do you propose for "minute" and "month"? I will certainly agree that the Oracle syntax is not user-friendly. I think arguing it is desirable is a stretch. I have a reservation on the Influx syntax here, though. Influx does not support month or year. They only have up to week (https://docs.influxdata.com/influxdb/v0.13/query_language/data_exploration/#relative-time). So, it is not possible to say "now() - 2 months" or "now() - 1 year". To do 1 year, what would you do? "now() - 365d"? What about leap year? What about going back one month? In fact, this patch only had a subset of Influx's supported format. I don't see a week unit. Moreover, Influx doesn't use "us", it uses just "u". So, our proposed syntax isn't even consistent (in subtle ways) with Influx's format. Let alone that Influx's format is incomplete (specifically, no support for months and years). Of the formats I've seen here, Postgres native format is the most user friendly, and accomplishes the goals of durations for us. I'm (non-PMC, non-binding) -1 on the currently proposed format from a usability/product/developer POV. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332685#comment-15332685 ] Brian Hess commented on CASSANDRA-11873: - So, one question about syntax. SQL has a time interval type with its own syntax. Instead of inventing additional syntax, what about taking on the Postgres or Oracle/SQL-Standard syntax. For example, Postgres uses "1 day ago" or "-1 day" (or hour, minute, etc): https://www.postgresql.org/docs/current/static/datatype-datetime.html Oracle uses "INTERVAL '1' HOUR": https://docs.oracle.com/cd/B28359_01/server.111/b28286/sql_elements003.htm#i38598 I suggest we choose something similar to one that exists, rather than create our own new syntax. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330154#comment-15330154 ] Brian Hess commented on CASSANDRA-11873: - This raises an interesting question. Will you be able to bind to the duration? As in: session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > now() - ?") or session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > now() - ?h") Because you can do the following, right? session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > DateDiff(now, ?)") > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320976#comment-15320976 ] Brian Hess commented on CASSANDRA-7622: +1 > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320933#comment-15320933 ] Brian Hess commented on CASSANDRA-7622: For metrics, Postgres appears to use a SELECT syntax. For system settings they use SHOW/SET. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320913#comment-15320913 ] Brian Hess commented on CASSANDRA-7622: I think this is a bit narrow of thinking. I can see many use cases for programmatic access to these values. If SHOW VARIABLE is useful in cqlsh it could have special cqlsh commands that convert into this. If you want to mix regular CQL and these, you'd need different Sessions as you'd need separate Cluster objects (so you can have a WhiteListPolicy for the one node of interest). Or if you want to look at the metrics for multiple nodes you would need one Cluster object per node. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320876#comment-15320876 ] Brian Hess commented on CASSANDRA-7622: Jinx > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320874#comment-15320874 ] Brian Hess commented on CASSANDRA-7622: If the plan is to enable writing to these virtual tables, then we should think more about a SELECT type syntax. That would make the future UPDATE operations make a lot more sense. And if we are thinking that eventually we will enable user-defined VTs then INSERT and DELETE operations may make sense. There really is no need for special syntax here, so I'd caution against that. For that reason, I'd suggest we stick with SELECT syntax even in this first use case - in a forward thinking way. > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320846#comment-15320846 ] Brian Hess commented on CASSANDRA-7622: I'm still curious how you would get the JMX values (SHOW VARIABLES or whatever) for a particular node. That is, how would this look syntactically (WHERE clause? Special CQL?)? And if it's something like a WHERE clause, then how will the driver route the query correctly? Would it be through a custom LoadBalancingPolicy? If so, then it could be hard to mix normal CQL with these (as you'd want a different LBP for the normal CQL queries). > Implement virtual tables > > > Key: CASSANDRA-7622 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > Fix For: 3.x > > > There are a variety of reasons to want virtual tables, which would be any > table that would be backed by an API, rather than data explicitly managed and > stored as sstables. > One possible use case would be to expose JMX data through CQL as a > resurrection of CASSANDRA-3527. > Another is a more general framework to implement the ability to expose yaml > configuration information. So it would be an alternate approach to > CASSANDRA-7370. > A possible implementation would be in terms of CASSANDRA-7443, but I am not > presupposing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316875#comment-15316875 ] Brian Hess commented on CASSANDRA-7622: It appears there are a few different things that could be accomplished with Virtual Tables, and I'm curious which one this ticket will focus on - and I'd suggest that we consider breaking out the full set of things below into separate tickets as I think some deserve more discussion/debate than others. 1. CQL access to internal metrics and management. This would be access to metrics (such as JMX) and management settings (such as cassandra.yaml settings). These could be covered by read-only access to built-in virtual tables (update/write access is below). 2. CQL access to updating management settings. This is different than management actions (such as `nodetool flush` or repair). 3. CQL access to management actions. This is the category of things such as `nodetool flush`, repair, drain, etc. 4. CQL access to arbitrary classes that implement user-defined behavior. I think that 1 is the least controversial, and clearly of high value. My suggestion is that this be built-in Virtual Tables, perhaps in system._vt_*. For example system._vt_metrics as SELECT access to the JMX metrics, or system._vt_settings as SELECT access to yaml/command-line/etc settings. These would be predefined virtual tables. I am concerned about how we will indicate how we specify which node's metrics we are querying, and how we route the query to that node (more on this below - it applies to all 4 (and more if we come up with more) scenarios). I also think that 2 is not very controversial. This would allow updating settings. That could be, for example via a CQL UPDATE command (e.g., `UPDATE system._vt_settings SET max_hint_window_in_ms = 1080 WHERE node='1.2.3.4'` - for illustrative purposes). The reason why this is different than 3 is that after doing the update, a read will return the same value, since it was changed. Would we want to support `UPDATE system._vt_settings SET max_hint_window_in_ms = 1080` and update all nodes, which is not valid CQL since we don't specify a partition key? I'm not sure about 3, and I think it should be discussed further. The thing is that something like `ALTER system_virtual.actions SET drain=true` or really `UPDATE system_virtual.actions SET drain=true` what would the value be if we SELECT it after the ALTER/UPDATE? It feels like it would be set back to FALSE after the drain, which semantically feels strange. I like [~snazy]'s suggestion of completely separate CQL syntax for these actions. Maybe something like `EXECUTE ACTION drain ON '1.2.3.4'` - again, syntax just for illustrative purposes. The semantics here is that the system is being altered, but outside a table, so a SELECT doesn't have the confusion as above. As [~snazy] pointed out, we would also need to make sure we are careful about access to this capability (perhaps user/role permissions is sufficient, and we just need to add new permission types). 4 is probably the one that causes the most pause, IMO, as there would be arbitrary user code running in the read and write path. We need to take care with that - as we have with UDFs. It isn't exactly clear to me what the goal is for arbitrary virtual tables. What are we trying to accomplish with this extension point? As I mentioned above, I'm curious how we will route the query (SELECT, UPDATE, EXECUTE, whatever) to the right node. In the simple case of SELECT of JMX metrics, how would we query the JMX metrics of node 1.2.3.4? These queries/operations are per-node, not cluster-wide, so we need to think about that. Is that the partition key of the virtual table (` PARTITION BY ((node), ...)`)? How would we route that internally in CQL (and with awareness on the driver)? It sounds like it would require a special partitioner (like system tables have - or maybe ByteOrderedPartitioner could work with a customer replication strategy, but with Murmur3Partitioner things are hard (it's essentially hash inversion)) and possibly replication strategy (I don't think that a replication strategy alone would cover this, right [~jjordan]). If these are fully user-defined (as [~jjirsa] suggests), then we need to have a way to specify partitioners at the table (or at least keyspace) level, as opposed to cluster wide (which would actually be a nice change, IMO). If they are built-in, then we can cover this case like we do with system keyspace tables. Another approach may be to have the drivers deal with the routing. The Virtual Tables could be special cases for routing purposes and have an API call to specify which node to query - or the LoadBalancingPolicy could inspect the Statement and route appropriately, which could work for any/all partitioners and/or re
[jira] [Commented] (CASSANDRA-10783) Allow literal value as parameter of UDF & UDA
[ https://issues.apache.org/jira/browse/CASSANDRA-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314193#comment-15314193 ] Brian Hess commented on CASSANDRA-10783: - [~slebresne] - okay, but let's open that ticket, then. With respect to function resolution with literals (or non-literals, for that matter), the topic of function overloading that you point to (with your Add example), Postgres has some guidance on what can/should be done. It would be better to match *something* than to create something completely different. > Allow literal value as parameter of UDF & UDA > - > > Key: CASSANDRA-10783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10783 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: DOAN DuyHai >Assignee: Robert Stupp >Priority: Minor > Labels: CQL3, UDF, client-impacting, doc-impacting > Fix For: 3.x > > > I have defined the following UDF > {code:sql} > CREATE OR REPLACE FUNCTION maxOf(current int, testValue int) RETURNS NULL ON > NULL INPUT > RETURNS int > LANGUAGE java > AS 'return Math.max(current,testValue);' > CREATE TABLE maxValue(id int primary key, val int); > INSERT INTO maxValue(id, val) VALUES(1, 100); > SELECT maxOf(val, 101) FROM maxValue WHERE id=1; > {code} > I got the following error message: > {code} > SyntaxException: message="line 1:19 no viable alternative at input '101' (SELECT maxOf(val1, > [101]...)"> > {code} > It would be nice to allow literal value as parameter of UDF and UDA too. > I was thinking about an use-case for an UDA groupBy() function where the end > user can *inject* at runtime a literal value to select which aggregation he > want to display, something similar to GROUP BY ... HAVING -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10783) Allow literal value as parameter of UDF & UDA
[ https://issues.apache.org/jira/browse/CASSANDRA-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314150#comment-15314150 ] Brian Hess commented on CASSANDRA-10783: - So, I think it would be good to pick up this topic in this ticket. Furthermore, I think there is some direction to take from SQL in terms of implicit types. Looking at Postgres' documentation (here: https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS), they do the following {quote} A numeric constant that contains neither a decimal point nor an exponent is initially presumed to be type integer if its value fits in type integer (32 bits); otherwise it is presumed to be type bigint if its value fits in type bigint (64 bits); otherwise it is taken to be type numeric. Constants that contain decimal points and/or exponents are always initially presumed to be type numeric. {quote} As this pertains to functions and function overloading, the appropriate Postgres documentation is here: https://www.postgresql.org/docs/current/static/typeconv-func.html > Allow literal value as parameter of UDF & UDA > - > > Key: CASSANDRA-10783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10783 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: DOAN DuyHai >Assignee: Robert Stupp >Priority: Minor > Labels: CQL3, UDF, client-impacting, doc-impacting > Fix For: 3.x > > > I have defined the following UDF > {code:sql} > CREATE OR REPLACE FUNCTION maxOf(current int, testValue int) RETURNS NULL ON > NULL INPUT > RETURNS int > LANGUAGE java > AS 'return Math.max(current,testValue);' > CREATE TABLE maxValue(id int primary key, val int); > INSERT INTO maxValue(id, val) VALUES(1, 100); > SELECT maxOf(val, 101) FROM maxValue WHERE id=1; > {code} > I got the following error message: > {code} > SyntaxException: message="line 1:19 no viable alternative at input '101' (SELECT maxOf(val1, > [101]...)"> > {code} > It would be nice to allow literal value as parameter of UDF and UDA too. > I was thinking about an use-case for an UDA groupBy() function where the end > user can *inject* at runtime a literal value to select which aggregation he > want to display, something similar to GROUP BY ... HAVING -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals
[ https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307722#comment-15307722 ] Brian Hess commented on CASSANDRA-11871: - I also agree with [~slebresne] that it would be a shame if only the one function was supported, and am +1 on adding a property to built-in and user-defined functions that indicate whether the function is monotonic. > Allow to aggregate by time intervals > > > Key: CASSANDRA-11871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11871 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 3.x > > > For time series data it can be usefull to aggregate by time intervals. > The idea would be to add support for one or several functions in the {{GROUP > BY}} clause. > Regarding the implementation, even if in general I also prefer to follow the > SQL syntax, I do not believe it will be a good fit for Cassandra. > If we have a table like: > {code} > CREATE TABLE trades > { > symbol text, > date date, > time time, > priceMantissa int, > priceExponent tinyint, > volume int, > PRIMARY KEY ((symbol, date), time) > }; > {code} > The trades will be inserted with an increasing time and sorted in the same > order. As we can have to process a large amount of data, we want to try to > limit ourself to the cases where we can build the groups on the flight (which > is not a requirement in the SQL world). > If we want to get the number of trades per minutes with the SQL syntax we > will have to write: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY hour(time), minute(time);}} > which is fine. The problem is that if the user invert by mistake the > functions like that: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY minute(time), hour(time);}} > the query will return weird results. > The only way to prevent that would be to check the function order and make > sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), > second(time)}}). > In my opinion a function like {{floor(, )}} will be > much better as it does not allow for this type of mistakes and is much more > flexible (you can create 5 minutes buckets if you want to). > {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = > '2016-01-11' GROUP BY floor(time, m);}} > An important aspect to keep in mind with a function like {{floor}} is the > starting point. For a query like: {{SELECT floor(time, m), count() FROM > Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' > AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the > result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals
[ https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307720#comment-15307720 ] Brian Hess commented on CASSANDRA-11871: - It may not be totally important here, but the following are not the same thing: 1. SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' GROUP BY minute(time); 2. SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' GROUP BY floor(time, m); In the first we would get exactly 60 groups (one for every minute of the hour, assuming data for every minute of the hour). That is data for 12:01, 1:01, 2:01, etc would all be in this bucket. In the second we would get 1440 groups (one for every minute of the day, assuming data for every minute of the day). That is data for 12:01, 1:01, and 2:01 would each be in their own groups. Sometimes people will want to group by both the Hour and the Minute (and other combinations, but I'm just following the example here) and get 1440 groups, and other times they really will want to group by the Minute and get 60 groups. Both use cases are necessary. It is worth noting that the Minute (and Hour and Day, etc) functions are *not* monotonic. When it wraps around the value drops from 59 to 0 (or 11 to 0, or 28/30/31 to 1, etc). The Floor function, however, is monotonic because what you get is the full date/time truncated to the desired resolution (no wrapping). With respect to the starting ranges for things like "WHERE time > now - 10h GROUP BY Floor(time, 1h)", I think Benjamin is right with saying that if now is 12:21, that the starting point is then 2:21 and the bucket starts 2:00. The strange thing is that this first time bucket (2:00) will only be partially filled since it will only have the data from 2:21-2:59, where all the other buckets (except the 12:00 one, which only has data for 12:00-12:21) will be full hours. Of course, the proper way to specify that is probably "WHERE time > Floor(now - 10h, 1h) GROUP BY Floor(time, 1h)" - will that be supported [~blerer]? > Allow to aggregate by time intervals > > > Key: CASSANDRA-11871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11871 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 3.x > > > For time series data it can be usefull to aggregate by time intervals. > The idea would be to add support for one or several functions in the {{GROUP > BY}} clause. > Regarding the implementation, even if in general I also prefer to follow the > SQL syntax, I do not believe it will be a good fit for Cassandra. > If we have a table like: > {code} > CREATE TABLE trades > { > symbol text, > date date, > time time, > priceMantissa int, > priceExponent tinyint, > volume int, > PRIMARY KEY ((symbol, date), time) > }; > {code} > The trades will be inserted with an increasing time and sorted in the same > order. As we can have to process a large amount of data, we want to try to > limit ourself to the cases where we can build the groups on the flight (which > is not a requirement in the SQL world). > If we want to get the number of trades per minutes with the SQL syntax we > will have to write: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY hour(time), minute(time);}} > which is fine. The problem is that if the user invert by mistake the > functions like that: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY minute(time), hour(time);}} > the query will return weird results. > The only way to prevent that would be to check the function order and make > sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), > second(time)}}). > In my opinion a function like {{floor(, )}} will be > much better as it does not allow for this type of mistakes and is much more > flexible (you can create 5 minutes buckets if you want to). > {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = > '2016-01-11' GROUP BY floor(time, m);}} > An important aspect to keep in mind with a function like {{floor}} is the > starting point. For a query like: {{SELECT floor(time, m), count() FROM > Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' > AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the > result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8488) Filter by UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264523#comment-15264523 ] Brian Hess commented on CASSANDRA-8488: There is a way around CASSANDRA-8273 for UDFs, but it might not be the best (though, then again, there doesn't appear to be a solution for 8273). We could not have replica-based filtering if the filter is a UDF. That is, you will bring all rows that meet the *other* filters back to the coordinator and do the filtering of the UDF(s) on the replica after ConsistencyLevel is applied. > Filter by UDF > - > > Key: CASSANDRA-8488 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8488 > Project: Cassandra > Issue Type: New Feature > Components: CQL, Local Write-Read Paths >Reporter: Jonathan Ellis > Labels: client-impacting, cql, udf > Fix For: 3.x > > > Allow user-defined functions in WHERE clause with ALLOW FILTERING. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251904#comment-15251904 ] Brian Hess commented on CASSANDRA-8844: I think that restricting a user to not be able to ALTER any table in side a keyspace that has CDC enabled is a bit too much. Additionally, I see use cases where the keyspace exists already and is in use and then a user layers in CDC - namely, that CDC is enabled after the keyspace exists. So, I'm -1 on saying that CDC needs to be there at keyspace creation time. One big reason is that if I want to add CDC later, there isn't a good way in Cassandra to create a new keyspace (with CDC enabled) and then copy all the data from the first one to the second one (no INSERT-SELECT in Cassandra). So, I'd be stuck. As for the durability question, I think we should throw an error if someone wants to set the keyspace to have DURABLE_WRITES=false and CDC enabled - or to alter the keyspace to put it in that position. I do not like the idea of automagically changing the DURABLE_WRITES=true for the user. I don't like that surprise behavior. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles w
[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235653#comment-15235653 ] Brian Hess commented on CASSANDRA-11521: - This is configurable, but the default for reads is LOCAL_ONE and on writes is LOCAL_QUORUM. > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232771#comment-15232771 ] Brian Hess commented on CASSANDRA-11521: - Ah - that's a good point (about internal things for other CLs). > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232715#comment-15232715 ] Brian Hess commented on CASSANDRA-11521: - [~slebresne] - I like this new approach better. I think it simplifies things a bit and I'm worried about the server easily overpowering the client, which I think could be really easy to do (then we'd have to think about things like back-pressure, etc). There could be a way to tell the server that the client is going to ask for all (or a lot) of the pages, so keep this stuff ready to flow, etc. Additionally we could have a setting that will tell the server "if you haven't heard me ask for the next page (or given some heartbeat) in a X long, then feel free to clean things up and throw an error if I ask for the next page later", or something, so that we don't have resources tied up even if the client dies. WRT CL, with this approach, I don't quite see why you would have to stick to CL_ONE here. That said, starting with CL_ONE and "growing" to other CL's is probably okay. Just not entirely sure what it gains given this new approach. > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-2527) Add ability to snapshot data as input to hadoop jobs
[ https://issues.apache.org/jira/browse/CASSANDRA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202252#comment-15202252 ] Brian Hess commented on CASSANDRA-2527: [~iamaleksey] - this ticket isn't only applicable to Hadoop, but to any analytics engine, including Spark. I think we should either reopen this ticket and retarget to Spark (though the same mechanism would be interesting to Hadoop, Flink, etc) or open a new one with the word "Spark" replacing the word "Hadoop", but otherwise the exact same idea. > Add ability to snapshot data as input to hadoop jobs > > > Key: CASSANDRA-2527 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2527 > Project: Cassandra > Issue Type: New Feature >Reporter: Jeremy Hanna >Priority: Minor > Labels: hadoop > > It is desirable to have immutable inputs to hadoop jobs for the duration of > the job. That way re-execution of individual tasks do not alter the output. > One way to accomplish this would be to snapshot the data that is used as > input to a job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179992#comment-15179992 ] Brian Hess commented on CASSANDRA-10707: - A quick question/clarification on the GROUP BY and ORDER BY discussion from above. The following are valid SQL: {code:SQL} SELECT pkey, ccol1, Max(x) As maxx FROM myTable GROUP BY pkey, ccol1 ORDER BY ccol1 pkey; SELECT pkey, ccol1, Max(x) As maxx FROM myTable GROUP BY pkey, ccol1 ORDER BY maxx; {code} I think you are suggesting that the only real ordering that is allowed is the native ordering in the CQL tables. Specifically, these 2 queries would not be supported. Is that correct? I think the logic is more like that the following CQL {code:SQL} SELECT pkey, ccol1 Max(x) AS maxx FROM myTable GROUP BY pkey, ccol1 ORDER BY pkey, ccol1 {code} turns into {code:SQL} SELECT pkey, ccol1, Max(x) AS maxx FROM ( SELECT pkey, ccol1, x FROM myTable ORDER BY pkey, ccol1 ) AS sub1 GROUP BY pkey, ccol1; {code} That means that the ORDER BY clause must work in that inner query as valid CQL. More generally: {code:SQL} SELECT [grouping columns], [aggregate function]([aggregate columns]) FROM [table] GROUP BY [grouping columns] ORDER BY [ordering columns] {code} Must satisfy the transformation to: {code:SQL} SELECT [grouping columns], [aggregate function]([aggregate columns]) FROM ( SELECT [grouping columns], [aggregate columns] FROM [table] ORDER BY [ordering columns] ) AS sub1 GROUP BY [grouping columns] {code} And specifically that the inner sub-query is valid CQL, namely: {code:SQL} SELECT [grouping columns], [aggregate columns] FROM [table] ORDER BY [ordering columns] {code} That is certainly different than SQL, which does not have this restriction. I'm +0.5 on having the syntax be the same as SQL as I think it is slightly better than the alternative. I'm just noting that the semantics really are a bit different and there are more restrictions with the ORDER BY clause in CQL with this ticket than in SQL. That nuance needs to be called out in the documentation or folks will certainly run into the error. I would also add that if someone uses an incorrect ORDER BY, the error should not only call out that it is an error, but also indicate what sorts of ORDER BY clauses are supported. > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11053) COPY FROM on large datasets: fix progress report and debug performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130553#comment-15130553 ] Brian Hess commented on CASSANDRA-11053: - [~Stefania] - I believe that is the correct invocation. However, that performance is about half of what I saw when I ran against the same setup (r3.2xlarge client writing to a cluster of 8 i2.2xlarge machines). The rate I had was 98K writes/sec (total time was 209 seconds). > COPY FROM on large datasets: fix progress report and debug performance > -- > > Key: CASSANDRA-11053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11053 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Stefania >Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: copy_from_large_benchmark.txt, > copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, > worker_profiles.txt, worker_profiles_2.txt > > > Running COPY from on a large dataset (20G divided in 20M records) revealed > two issues: > * The progress report is incorrect, it is very slow until almost the end of > the test at which point it catches up extremely quickly. > * The performance in rows per second is similar to running smaller tests with > a smaller cluster locally (approx 35,000 rows per second). As a comparison, > cassandra-stress manages 50,000 rows per second under the same set-up, > therefore resulting 1.5 times faster. > See attached file _copy_from_large_benchmark.txt_ for the benchmark details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111571#comment-15111571 ] Brian Hess commented on CASSANDRA-10707: - Can a UDF be used as the GROUP BY key, such as: SELECT partitionKey, clusteringCol1, Udf(clusteringCol2), Max( x ) FROM myData WHERE partitionKey=5 GROUP BY partitionKey, clusteringCol1, Udf(clusteringCol2); > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109181#comment-15109181 ] Brian Hess edited comment on CASSANDRA-10707 at 1/20/16 7:07 PM: -- Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max( x ) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum( x ) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. So, the assumption that the data will be sorted when it arrives to the coordinator might not be true in all cases. was (Author: brianmhess): Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max(y) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, y INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum(y) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. So, the assumption that the data will be sorted when it arrives to the coordinator might not be true in all cases. > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109181#comment-15109181 ] Brian Hess edited comment on CASSANDRA-10707 at 1/20/16 7:06 PM: -- Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max(y) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, y INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum(y) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. So, the assumption that the data will be sorted when it arrives to the coordinator might not be true in all cases. was (Author: brianmhess): Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max(x) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum(x) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. So, the assumption that the data will be sorted when it arrives to the coordinator might not be true in all cases. > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109181#comment-15109181 ] Brian Hess edited comment on CASSANDRA-10707 at 1/20/16 7:04 PM: -- Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max(x) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum(x) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. So, the assumption that the data will be sorted when it arrives to the coordinator might not be true in all cases. was (Author: brianmhess): Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max(x) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum(x) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109181#comment-15109181 ] Brian Hess commented on CASSANDRA-10707: - Correct, what [~iamaleksey] said. In fact, pushing the aggregate computation to the replicas is troublesome at an RF>1. Quick follow up - will this ticket also cover: SELECT clusterCol, Max(x) FROM myData GROUP BY clusterCol; That is, you group on a clustering column, but not on a partition key? Second question - consider a table with schema myData(partitionKey INT, clusteringCol1 INT, clusteringCol2 INT, x INT, PRIMARY KEY ((partitionKey), clusteringCol1, clusteringCol2). Now, will the following query be supported: SELECT partitionKey, clusteringCol2, Sum(x) FROM myData GROUP BY partitionKey, clusteringCol2; The reason I ask is that the following is not supported: SELECT partitionKey, clusteringCol2, x FROM myData WHERE partitionKey=5 ORDER BY clusteringCol2; Because you cannot order by clusteringCol2, only clusteringCol1. > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107583#comment-15107583 ] Brian Hess commented on CASSANDRA-10707: - I think that supporting grouping by clustering column (or perhaps even a regular column) with a partition key predicate is a good idea. I think that supporting grouping by partition key (either in part, or in toto) is a bad idea. In that query, all the data in the cluster would stream to the coordinator who would then be responsible for doing a *lot* of processing. In other distributed systems that do GROUP BY queries, the groups end up being split up among the nodes in the system and each node is responsible for rolling up the data for those groups it was assigned. This is a common way to get all the nodes in the system to help with a pretty significant computation - and the data streamed out (potentially via a single node in the system) to the client. However, in this approach, all the data is streaming to a single node and that node is doing all the work, for all the groups. This feels like either a ton of work to orchestrate the computation (that would start to mimic other systems - e.g., Spark) or would do a lot of work and risk being very inefficient and slow. I am also concerned to what this would do in the face of QueryTimeoutException - would we really be able to do a GROUP BY partitionKey aggregate under the QTE limit? > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9778) CQL support for time series aggregation
[ https://issues.apache.org/jira/browse/CASSANDRA-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107407#comment-15107407 ] Brian Hess commented on CASSANDRA-9778: [~blerer] - that really isn't an example of window functions (A/K/A window aggregates, window aggregate functions, etc). That's really an example of a grouped aggregate with time functions (Floor, Minute, Hour, etc). The cardinality of the output of this query is that the number of rows equals the number of groups. Whereas, with window functions the cardinality of the output is that the number of rows equals the number of rows of input. Let me simplify your trades example to daily stock prices with a schema of (symbol TEXT, transDate DATE, closePrice DOUBLE). For each stock you'd like the sliding 3-day average of the stock closing prices. You would do that with the following SQL-99 syntax: SELECT symbol, transDate, closePrice, Avg(closePrice) OVER (PARTITION BY symbol ORDER BY closePrice ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS threeDayAverage FROM stocks WHERE symbol = 'XYZ'; Here, each day will have a "window" of rows that stretches from 2 rows before (if they exist) to the current row, and the value is the average of the three closePrice values. Thus, there is an output for every row of input. For Thursday's threeDayAverage for stock XYZ we will use the closePrice from Tuesday, Wednesday, and Thursday. For Friday's threeDayAverage for stock XYZ we will use the closePrice from Wednesday, Thursday, and Friday. And so on. For what it's worth, this is not hard to do if there is a partition key predicate. We are simply doing a pass over the rows to return to the client and rolling things up. It is possible we need to sort the data depending on the ORDER BY clause, but otherwise the aggregation is a simple rollup. It should be noted that SQL allows for very flexible window specifications that can cause trouble, such as "OVER (PARTITION BY symbol ORDER BY transDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)" which would go from the current row to the end of the partition. That can be a tricky case. SQL99 also supports RANGE window specifications in addition to ROW specifications. That can also be tricky. That said, window functions would be a nice addition (especially with a partition key predicate). > CQL support for time series aggregation > --- > > Key: CASSANDRA-9778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9778 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: Jonathan Ellis >Assignee: Benjamin Lerer > Fix For: 3.x > > > Along with MV (CASSANDRA-6477), time series aggregation or "rollups" are a > common design pattern in cassandra applications. I'd like to add CQL support > for this along these lines: > {code} > CREATE MATERIALIZED VIEW stocks_by_hour AS > SELECT exchange, day, day_time(1h) AS hour, symbol, avg(price), sum(volume) > FROM stocks > GROUP BY exchange, day, symbol, hour > PRIMARY KEY ((exchange, day), hour, symbol); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9552) COPY FROM times out after 110000 inserts
[ https://issues.apache.org/jira/browse/CASSANDRA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102092#comment-15102092 ] Brian Hess commented on CASSANDRA-9552: [~slebresne] - it was different than the 9302, 9303, and 9304 improvements. This was a case where COPY failed. Do the improvements to COPY FROM also happen to fix this problem? Maybe [~Stefania] knows. > COPY FROM times out after 11 inserts > > > Key: CASSANDRA-9552 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9552 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > Labels: cqlsh > Fix For: 2.1.x > > > I am trying to test out performance of COPY FROM on various schemas. I have > a 100-BIGINT-column table defined as: > {code} > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '3'} AND durable_writes = true; > CREATE TABLE test.test100 ( > pkey bigint,ccol bigint,col0 bigint,col1 bigint,col10 > bigint, > col11 bigint,col12 bigint,col13 bigint,col14 bigint,col15 > bigint, > col16 bigint,col17 bigint,col18 bigint,col19 bigint,col2 > bigint, > col20 bigint,col21 bigint,col22 bigint,col23 bigint,col24 > bigint, > col25 bigint,col26 bigint,col27 bigint,col28 bigint,col29 > bigint, > col3 bigint,col30 bigint,col31 bigint,col32 bigint,col33 > bigint, > col34 bigint,col35 bigint,col36 bigint,col37 bigint,col38 > bigint, > col39 bigint,col4 bigint,col40 bigint,col41 bigint,col42 > bigint, > col43 bigint,col44 bigint,col45 bigint,col46 bigint,col47 > bigint, > col48 bigint,col49 bigint,col5 bigint,col50 bigint,col51 > bigint, > col52 bigint,col53 bigint,col54 bigint,col55 bigint,col56 > bigint, > col57 bigint,col58 bigint,col59 bigint,col6 bigint,col60 > bigint, > col61 bigint,col62 bigint,col63 bigint,col64 bigint,col65 > bigint, > col66 bigint,col67 bigint,col68 bigint,col69 bigint,col7 > bigint, > col70 bigint,col71 bigint,col72 bigint,col73 bigint,col74 > bigint, > col75 bigint,col76 bigint,col77 bigint,col78 bigint,col79 > bigint, > col8 bigint,col80 bigint,col81 bigint,col82 bigint,col83 > bigint, > col84 bigint,col85 bigint,col86 bigint,col87 bigint,col88 > bigint, > col89 bigint,col9 bigint,col90 bigint,col91 bigint,col92 > bigint, > col93 bigint,col94 bigint,col95 bigint,col96 bigint,col97 > bigint, > PRIMARY KEY (pkey, ccol) > ) WITH CLUSTERING ORDER BY (ccol ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > {code} > I then try to load the linked file of 120,000 rows of 100 BIGINT columns via: > {code} > cqlsh -e "COPY > test.test100(pkey,ccol,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33,col34,col35,col36,col37,col38,col39,col40,col41,col42,col43,col44,col45,col46,col47,col48,col49,col50,col51,col52,col53,col54,col55,col56,col57,col58,col59,col60,col61,col62,col63,col64,col65,col66,col67,col68,col69,col70,col71,col72,col73,col74,col75,col76,col77,col78,col79,col80,col81,col82,col83,col84,col85,col86,col87,col88,col89,col90,col91,col92,col93,col94,col95,col96,col97) > FROM 'data120K.csv'" > {code} > Data file here: > https://drive.google.com/file/d/0B87-Pevy14fuUVcxemFRcFFtRjQ/view?usp=sharing > After 11 rows, it errors and hangs: > {code} > :1:11 rows; Write: 19848.21 rows/s > Connection heartbeat failure > :1:Aborting import at record #1196. Previously inserted records are > still present, and some records after that may be present as well. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090643#comment-15090643 ] Brian Hess commented on CASSANDRA-8844: But this would allow you to have 2 CDC keyspaces, one with ERROR_ON_OVERFLOW and one with DISCARD_OLD_ON_OVERFLOW. As it's designed (as I understand it), mutations from both keyspaces end up in the same CDC log files. So, if you hit overflow on the keyspace that has DISCARD_OLD_ON_OVERFLOW you need to discard some files, but you probably shouldn't discard mutations from the keyspace that has ERROR_ON_OVERFLOW. That's the situation I'm thinking about. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090626#comment-15090626 ] Brian Hess commented on CASSANDRA-8844: As I understand it, making it per-keyspace could be problematic with DISCARD_OLD_ON_OVERFLOW since each CDC file will have data from all keyspaces. So, if you say to discard some but not others, then you would need to process the CDC file to only discard the ones that have the DISCARD_OLD_ON_OVERFLOW but keep the others. Is that feasible? The other 2 seem fine, though. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > processing > -- Be able to continuously "tail" t
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088257#comment-15088257 ] Brian Hess commented on CASSANDRA-8844: The semantics don't change. There are additional semantics - mutations have to be in the same keyspace, have the same partition, have the same CDC-enabled flag, and can be in different tables. That is the same as today, as all tables have the CDC-enabled flag set to false. Changing the whole keyspace would mean my CDC consumer is now processing a lot of unneeded data, just to get to the one table of interest. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > proce
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088210#comment-15088210 ] Brian Hess commented on CASSANDRA-8844: The only supported language for CDC consumer is Java, then? Will there be an example consumer leveraging the CommitLogReplayer? Will that program need to have Cassandra running on that machine to process the logs? As in, will I be able to copy off the CDC logs to be processed elsewhere? > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > processing > -- Be able to continuously "tail" the most recent logfile and get > low-latency(ms?) access to the data as it is writ
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088208#comment-15088208 ] Brian Hess commented on CASSANDRA-8844: With respect to the atomicity, saying that you break into CDC and non-CDC commit logs would not change existing behavior, as essentially prior to this ticket all tables have CDC disabled. If I have a table in a non-CDC keyspace and I choose enable CDC on that table, how would I do that here? Or vice versa. I expect that turning on CDC after the fact is likely to come up (possibly often). > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order, even when it is multiple logfiles behind in > processing > -- Be able to c
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088155#comment-15088155 ] Brian Hess commented on CASSANDRA-8844: A couple comments/question 1. You seem to have changed from specifying CDC at the table level to the Keyspace level. Is that correct? That sort of contradicts one of the requirements in your document. 2. The CDC consumer needs to process a serialized mutation. How will a developer write that? Is there a commitlog serialized mutation format document that someone can use? Do they need to write that in Java and link to the full Cassandra jars (and dependencies)? > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A sophisticated daemon should be able to be written that could > -- Catch up, in written-order
[jira] [Commented] (CASSANDRA-8519) Mechanism to determine which nodes own which token ranges without Thrift
[ https://issues.apache.org/jira/browse/CASSANDRA-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083826#comment-15083826 ] Brian Hess commented on CASSANDRA-8519: [~snazy] I do still think this should be something you can query from the SYSTEM tables somewhere. I do agree that the Java driver does provide this information, though I'm not sure about the other drivers. It feels like a good thing to be able to simply lookup the information in system tables, versus having each driver implement the same logic (and have to change if/when it changes). > Mechanism to determine which nodes own which token ranges without Thrift > > > Key: CASSANDRA-8519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8519 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess >Priority: Minor > > Right now the only way to determine which nodes own which token ranges is via > the Thrift interface. There is not a Java/CQL driver mechanism to determine > this. Applications that make multiple connections to Cassandra to extract > data in parallel need this ability so they can split the data into pieces, > and it is reasonable to want those splits to be on token range boundaries. > Of course, once you split this way, you would want to route those queries to > nodes that own that token range / split, for efficiency. > This applies for both Hadoop and Spark, but other applications, too. Hadoop > and Spark currently use Thrift to determine this topology. > Additionally, different replication strategies and replication factors result > in different token range ownership, so there will have to be a different > answer based on which keyspace is used. > It would be useful if this data was stored in a CQL table and could be simply > queried. A suggestion would be to add a column to the > SYSTEM.SCHEMA_KEYSPACES table (maybe a complex Map of Host to a UDT that has > a List of (beginRange, endRange) pairs - as an example). This table would > need to be updated on an ALTER KEYSPACE command or on a topology change > event. This would allow the server(s) to hold this information and the > drivers could simply query it (as opposed to having each driver manage this > separately). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076099#comment-15076099 ] Brian Hess commented on CASSANDRA-10707: - [~blerer] - will this GROUP BY require a partitionKey predicate? That is, will the following be supported: SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable GROUP BY partitionKey, clusteringColumn1; Or will it also require a partitionKey predicate like: SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY partitionKey, clusteringColumn1; Or will both be supported? > Add support for Group By to Select statement > > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-10050) Secondary Index Performance Dependent on TokenRange Searched in Analytics
[ https://issues.apache.org/jira/browse/CASSANDRA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hess updated CASSANDRA-10050: --- Comment: was deleted (was: I'm a little confused here. [~beobal], why would the performance get slower as the tokens got larger, though? Wouldn't it be just as inefficient to read the first bit of the token range rather than the last bit of the token range? Seems like something else is going on here, no?) > Secondary Index Performance Dependent on TokenRange Searched in Analytics > - > > Key: CASSANDRA-10050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10050 > Project: Cassandra > Issue Type: Improvement > Environment: Single node, macbook, 2.1.8 >Reporter: Russell Alexander Spitzer > Fix For: 3.x > > > In doing some test work on the Spark Cassandra Connector I saw some odd > performance when pushing down range queries with Secondary Index filters. > When running the queries we see huge amount of time when the C* server is not > doing any work and the query seem to be hanging. This investigation led to > the work in this document > https://docs.google.com/spreadsheets/d/1aJg3KX7nPnY77RJ9ZT-IfaYADgJh0A--nAxItvC6hb4/edit#gid=0 > The Spark Cassandra Connector builds up token range specific queries and > allows the user to pushdown relevant fields to C*. Here we have two indexed > fields (size) and (color) being pushed down to C*. > {code} > SELECT count(*) FROM ks.tab WHERE token("store") > $min AND token("store") <= > $max AND color = 'red' AND size = 'P' ALLOW FILTERING;{code} > These queries will have different token ranges inserted and executed as > separate spark tasks. Spark tasks with token ranges near the Min(token) end > up executing much faster than those near Max(token) which also happen to > through errors. > {code} > Coordinator node timed out waiting for replica nodes' responses] > message="Operation timed out - received only 0 responses." > info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} > {code} > I took the queries and ran them through CQLSH to see the difference in time. > A linear relationship is seen based on where the tokenRange being queried is > starting with only 2 second for queries near the beginning of the full token > spectrum and over 12 seconds at the end of the spectrum. > The question is, can this behavior be improved? or should we not recommend > using secondary indexes with Analytics workloads? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10050) Secondary Index Performance Dependent on TokenRange Searched in Analytics
[ https://issues.apache.org/jira/browse/CASSANDRA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075572#comment-15075572 ] Brian Hess commented on CASSANDRA-10050: - I'm a little confused here. [~beobal], why would the performance get slower as the tokens got larger, though? Wouldn't it be just as inefficient to read the first bit of the token range rather than the last bit of the token range? Seems like something else is going on here, no? > Secondary Index Performance Dependent on TokenRange Searched in Analytics > - > > Key: CASSANDRA-10050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10050 > Project: Cassandra > Issue Type: Improvement > Environment: Single node, macbook, 2.1.8 >Reporter: Russell Alexander Spitzer > Fix For: 3.x > > > In doing some test work on the Spark Cassandra Connector I saw some odd > performance when pushing down range queries with Secondary Index filters. > When running the queries we see huge amount of time when the C* server is not > doing any work and the query seem to be hanging. This investigation led to > the work in this document > https://docs.google.com/spreadsheets/d/1aJg3KX7nPnY77RJ9ZT-IfaYADgJh0A--nAxItvC6hb4/edit#gid=0 > The Spark Cassandra Connector builds up token range specific queries and > allows the user to pushdown relevant fields to C*. Here we have two indexed > fields (size) and (color) being pushed down to C*. > {code} > SELECT count(*) FROM ks.tab WHERE token("store") > $min AND token("store") <= > $max AND color = 'red' AND size = 'P' ALLOW FILTERING;{code} > These queries will have different token ranges inserted and executed as > separate spark tasks. Spark tasks with token ranges near the Min(token) end > up executing much faster than those near Max(token) which also happen to > through errors. > {code} > Coordinator node timed out waiting for replica nodes' responses] > message="Operation timed out - received only 0 responses." > info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} > {code} > I took the queries and ran them through CQLSH to see the difference in time. > A linear relationship is seen based on where the tokenRange being queried is > starting with only 2 second for queries near the beginning of the full token > spectrum and over 12 seconds at the end of the spectrum. > The question is, can this behavior be improved? or should we not recommend > using secondary indexes with Analytics workloads? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071175#comment-15071175 ] Brian Hess commented on CASSANDRA-8844: I certainly think that Cassandra should not transparently "upgrade" or alter the requested consistency level of the write from the application. Doing so is "surprise behavior" and I think should be avoided, especially since the client requested a specific consistency level (which we'd be then ignoring and doing something different). So, I'm -1 on that approach (do we do anything like that anywhere else in the system?). I'm also -1 on the choice that the resiliency of the CDC log is only up to the original consistency level of the write. CDC is a layer that should be additive to a table without needing to alter the application that is writing to that table. The semantics of RF and CL for the CDC log itself should be governed by separate settings, though having the RF be equivalent to the base table and the CL equivalent to the original write make sense - however, the semantics should allow for the CDC log to achieve the full RF just as the base table does (e.g., via repair, etc). Essentially, I think I disagree with this Assumption in the design doc: bq. Don’t have to worry about repair - guarantee is CL # of nodes have the update and it will be consumed by client within ms. at-least-once semantics and time horizon mean we don’t have to worry about repairing CDC data. I think the second goal/requirement you laid out above (a record of all mutations seen by the cluster) is an important part of the purpose of this ticket (and the original intent I had when [~tupshin] and I discussed this). Many uses stem from this interpretation of CDC logs, especially when the goal is for consumers other than other Cassandra clusters are processing the CDC log, but in some cases even Cassandra clusters would need to see every mutation, not just the final delta (e.g., Cassandra triggers on the destination system that provide for side-effects to mutations). The issue of capturing every mutation in a durable/distributed/resilient way is intertwined with my belief that we should not have to process all RF copies of the CDC log. Needing to process all RF copies of the CDC log puts a processing burden on the consuming system that is at least undesirable (and at most prohibitive). There does not appear to be an approach to avoiding processing all RF copies to ensure at-least-once-delivery (but not at-least-RF-deliveries). Moreover, the resiliency approach depends on requiring the consumer to process all RF copies of the logs in order to ensure processing every mutation. In addition, processing all RF copies of the CDC log will incur a large processing overhead (200% extra in the RF=3 case, for example). So, I'm also -1 on needing to process all the CDC logs on all replicas. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070494#comment-15070494 ] Brian Hess commented on CASSANDRA-8844: This makes the consistency level of the CDC log completely defined by the CL of the mutation as it is written. That impacts anyone using CDC all the way to the application. The CDC log should transparently be additive to the table. That is, the user should not have to write to the table in any special or prescribed way in order to get the mutations captured. Requiring the client to write at CL_QUORUM of CDC is enabled is not okay. The thing that concerns me is that the consistency/replication of the CDC log now has a completely different guarantee than the base table (system, actually). The fact that a user can use Cassandra with any CL on writes and the system will eventually get to a consistent state is a good thing. The CDC log should be similarly eventually consistent, via any CL for the mutations. The situation described above is a situation that Cassandra covers just fine and results in an eventually consistent state. However, it will result in CDC loss which is not acceptable. The goals of CDC is not to get to snapshot differences, but to capture every mutation (including the intermediate mutations). The solution should tolerate nodes being offline and not lose any mutations, without requiring the application producing the mutations to be modified. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to app
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068897#comment-15068897 ] Brian Hess commented on CASSANDRA-8844: [~JoshuaMcKenzie] can you explain what happens in the following case. I do a CL_ONE write and only one replica is available, so the other 2 (assume RF=3) are not available. The other 2 come back online and a repair operation happens to bring those nodes up to date. Then the original node that got the write goes down. At this point, I want to process the changes. But the mutation is not in the commitlog of either of the online nodes. Is that correct? I'm worried that there is a mutation that made it into this cluster but is now not in any CDC log and that the CDC log is not resilient in this case. Do I have this correct? > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would
[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068893#comment-15068893 ] Brian Hess commented on CASSANDRA-8844: I share [~rustyrazorblade]'s concerns here. It would be good to think about how to consume the CDC logs without having to process each mutation RF times. That is in many cases redundant work that may be prohibitive in some/many cases. At-least-once processing is what the goal should be. If there is a way to layer at-least-once semantics (but not at-least-RF) then okay, but that is not clear here. At to [~tupshin]'s Kafka suggestion, you cannot specify the IDs/offsets on writing into the queue, so you would end up enqueueing RF times and not really solve the problem. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's d
[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047130#comment-15047130 ] Brian Hess commented on CASSANDRA-6696: Will this change do anything for the case where there is only one disk? Will each sstable contain a single token range (or at least not all token ranges)? > Partition sstables by token range > - > > Key: CASSANDRA-6696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6696 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Marcus Eriksson > Labels: compaction, correctness, dense-storage, > jbod-aware-compaction, performance > Fix For: 3.2 > > > In JBOD, when someone gets a bad drive, the bad drive is replaced with a new > empty one and repair is run. > This can cause deleted data to come back in some cases. Also this is true for > corrupt stables in which we delete the corrupt stable and run repair. > Here is an example: > Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. > row=sankalp col=sankalp is written 20 days back and successfully went to all > three nodes. > Then a delete/tombstone was written successfully for the same row column 15 > days back. > Since this tombstone is more than gc grace, it got compacted in Nodes A and B > since it got compacted with the actual data. So there is no trace of this row > column in node A and B. > Now in node C, say the original data is in drive1 and tombstone is in drive2. > Compaction has not yet reclaimed the data and tombstone. > Drive2 becomes corrupt and was replaced with new empty drive. > Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp > has come back to life. > Now after replacing the drive we run repair. This data will be propagated to > all nodes. > Note: This is still a problem even if we run repair every gc grace. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035771#comment-15035771 ] Brian Hess edited comment on CASSANDRA-9303 at 12/2/15 1:22 PM: - 3 questions: 1. If I set SKIPROWS to 10 say, and also set HEADER to true, will I skip 10 or 11 rows? 2. Why do you disable the ERRFILE for stdin? 3. If the MAXERRORS/MAXINSERTERRORS is >1, where do you keep the error around? Is it captured anywhere so someone can look back on what type of error occurred? NoHostAvailableException, WriteTimeoutException, bad date format, etc. was (Author: brianmhess): 2 questions: 1. If I set SKIPROWS to 10 say, and also set HEADER to true, will I skip 10 or 11 rows? 2. Why do you disable the ERRFILE for stdin? > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035771#comment-15035771 ] Brian Hess commented on CASSANDRA-9303: 2 questions: 1. If I set SKIPROWS to 10 say, and also set HEADER to true, will I skip 10 or 11 rows? 2. Why do you disable the ERRFILE for stdin? > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014280#comment-15014280 ] Brian Hess commented on CASSANDRA-9303: I'm curious - how do you set the following in CQLSH COPY FROM: - numFutures (the number of concurrent asynchronous requests "in flight" at a time) - batchSize (the number of INSERTs to batch and send as one request) - queryTimeout (the amount of time to wait on queries) - numRetries (the number of times to retry failed/timed-out queries) - progressRate (the rate at which progress is reported) All of these are marked as "already available", but it isn't clear how to set them (nor from the documentation). > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014261#comment-15014261 ] Brian Hess commented on CASSANDRA-9303: Are there no plans to support loading a directory of files? I would say that that is one of the bigger options leveraged by users of cassandra-loader. I'm +1 on not doing the things that CQLSH already handles (username, password, etc). > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9304) COPY TO improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970961#comment-14970961 ] Brian Hess commented on CASSANDRA-9304: [~Stefania] have you re-run the benchmark that [~dkua] performed above (or something similar) - specifically, comparing the "old version" to this new version (and to the cassandra-unloader (https://github.com/brianmhess/cassandra-loader))? What performance improvement do we see? > COPY TO improvements > > > Key: CASSANDRA-9304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9304 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Minor > Labels: cqlsh > Fix For: 3.x, 2.1.x, 2.2.x > > > COPY FROM has gotten a lot of love. COPY TO not so much. One obvious > improvement could be to parallelize reading and writing (write one page of > data while fetching the next). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8234) CTAS (CREATE TABLE AS SELECT)
[ https://issues.apache.org/jira/browse/CASSANDRA-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660640#comment-14660640 ] Brian Hess commented on CASSANDRA-8234: Another question: 6. If we include Spark in Cassandra, what are we going to do about the Spark Master? That is, a feature of Cassandra (CTAS) will depend on a system that does not have the same HA profile (namely, always-on) as Cassandra. Will we modify Spark to be "masterless"? Will this ticket have a dependency on a "masterless" OSS Spark? Or will Cassandra have this as a SPOF? > CTAS (CREATE TABLE AS SELECT) > - > > Key: CASSANDRA-8234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8234 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Robin Schumacher > Fix For: 3.x > > > Continuous request from users is the ability to do CREATE TABLE AS SELECT. > The simplest form would be copying the entire table. More advanced would > allow specifying thes column and UDF to call as well as filtering rows out in > WHERE. > More advanced still would be to get all the way to allowing JOIN, for which > we probably want to integrate Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8234) CTAS (CREATE TABLE AS SELECT)
[ https://issues.apache.org/jira/browse/CASSANDRA-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660443#comment-14660443 ] Brian Hess commented on CASSANDRA-8234: A few questions: 1. For OSS C* (as opposed to DSE), will the Spark Master be visible to users other than C* itself? As in, will Cassandra be the only process/user able to execute Spark jobs? Or will users be able to submit jobs, start the SparkSQL thriftserver, etc? 2. How current will Spark be kept with Cassandra? Will there be any guidance (or guarantees) about how stale the Spark is that is being included? Or how often Cassandra will be upgraded to incorporate a new Spark? Same for the OSS spark-cassandra-connector. 3. Will there be a load-sharing system in place so that multiple CTAS queries can run simultaneously (Spark in stand-alone mode will by default "reserve" all available cores and "lock out" another spark job)? 4. Will there be some "sandboxing" of Spark so that C* and Spark play nicely (with respect to RAM, CPU, etc)? 5. My assumption is that "CREATE TABLE b(x INT, y INT, z INT) AS SELECT x, y, z FROM a WITH PRIMARY KEY ((x), y)" [syntax for illustrative purposes only] will be an asynchronous operation. That is, it will return "success" to the client, but the operation will be a background operation. First, is that correct? If so, I think there will have to be a status like in MVs and 2Is, correct? If not, what will do about timing out of this query? > CTAS (CREATE TABLE AS SELECT) > - > > Key: CASSANDRA-8234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8234 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Robin Schumacher > Fix For: 3.x > > > Continuous request from users is the ability to do CREATE TABLE AS SELECT. > The simplest form would be copying the entire table. More advanced would > allow specifying thes column and UDF to call as well as filtering rows out in > WHERE. > More advanced still would be to get all the way to allowing JOIN, for which > we probably want to integrate Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
[ https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659385#comment-14659385 ] Brian Hess edited comment on CASSANDRA-9302 at 8/6/15 4:17 AM: There is also https://issues.apache.org/jira/browse/CASSANDRA-9048 That ticket has been tagged as "Later", but the work continues at https://github.com/brianmhess/cassandra-loader In addition to more options, the performance of cassandra-loader is 4x (or more) than COPY FROM in 2.1. Also, it is more stable and can handle wider rows (see https://issues.apache.org/jira/browse/CASSANDRA-9552) was (Author: brianmhess): There is also https://issues.apache.org/jira/browse/CASSANDRA-9048 That ticket has been tagged as "Later" for some reason, but the work continues at https://github.com/brianmhess/cassandra-loader In addition to more options, the performance of cassandra-loader is 4x (or more) than COPY FROM in 2.1. Also, it is more stable and can handle wider rows (see https://issues.apache.org/jira/browse/CASSANDRA-9552) > Optimize cqlsh COPY FROM, part 3 > > > Key: CASSANDRA-9302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9302 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Jonathan Ellis >Assignee: David Kua > Fix For: 2.1.x > > > We've had some discussion moving to Spark CSV import for bulk load in 3.x, > but people need a good bulk load tool now. One option is to add a separate > Java bulk load tool (CASSANDRA-9048), but if we can match that performance > from cqlsh I would prefer to leave COPY FROM as the preferred option to which > we point people, rather than adding more tools that need to be supported > indefinitely. > Previous work on COPY FROM optimization was done in CASSANDRA-7405 and > CASSANDRA-8225. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
[ https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hess updated CASSANDRA-9302: -- Comment: was deleted (was: There is also https://issues.apache.org/jira/browse/CASSANDRA-9048 That ticket has been tagged as "Later" for some reason, but the work continues at https://github.com/brianmhess/cassandra-loader In addition to more options, the performance of cassandra-loader is 4x (or more) than COPY FROM in 2.1. Also, it is more stable and can handle wider rows (see https://issues.apache.org/jira/browse/CASSANDRA-9552)) > Optimize cqlsh COPY FROM, part 3 > > > Key: CASSANDRA-9302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9302 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Jonathan Ellis >Assignee: David Kua > Fix For: 2.1.x > > > We've had some discussion moving to Spark CSV import for bulk load in 3.x, > but people need a good bulk load tool now. One option is to add a separate > Java bulk load tool (CASSANDRA-9048), but if we can match that performance > from cqlsh I would prefer to leave COPY FROM as the preferred option to which > we point people, rather than adding more tools that need to be supported > indefinitely. > Previous work on COPY FROM optimization was done in CASSANDRA-7405 and > CASSANDRA-8225. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
[ https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659387#comment-14659387 ] Brian Hess commented on CASSANDRA-9302: There is also https://issues.apache.org/jira/browse/CASSANDRA-9048 That ticket has been tagged as "Later" for some reason, but the work continues at https://github.com/brianmhess/cassandra-loader In addition to more options, the performance of cassandra-loader is 4x (or more) than COPY FROM in 2.1. Also, it is more stable and can handle wider rows (see https://issues.apache.org/jira/browse/CASSANDRA-9552) > Optimize cqlsh COPY FROM, part 3 > > > Key: CASSANDRA-9302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9302 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Jonathan Ellis >Assignee: David Kua > Fix For: 2.1.x > > > We've had some discussion moving to Spark CSV import for bulk load in 3.x, > but people need a good bulk load tool now. One option is to add a separate > Java bulk load tool (CASSANDRA-9048), but if we can match that performance > from cqlsh I would prefer to leave COPY FROM as the preferred option to which > we point people, rather than adding more tools that need to be supported > indefinitely. > Previous work on COPY FROM optimization was done in CASSANDRA-7405 and > CASSANDRA-8225. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
[ https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659385#comment-14659385 ] Brian Hess commented on CASSANDRA-9302: There is also https://issues.apache.org/jira/browse/CASSANDRA-9048 That ticket has been tagged as "Later" for some reason, but the work continues at https://github.com/brianmhess/cassandra-loader In addition to more options, the performance of cassandra-loader is 4x (or more) than COPY FROM in 2.1. Also, it is more stable and can handle wider rows (see https://issues.apache.org/jira/browse/CASSANDRA-9552) > Optimize cqlsh COPY FROM, part 3 > > > Key: CASSANDRA-9302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9302 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Jonathan Ellis >Assignee: David Kua > Fix For: 2.1.x > > > We've had some discussion moving to Spark CSV import for bulk load in 3.x, > but people need a good bulk load tool now. One option is to add a separate > Java bulk load tool (CASSANDRA-9048), but if we can match that performance > from cqlsh I would prefer to leave COPY FROM as the preferred option to which > we point people, rather than adding more tools that need to be supported > indefinitely. > Previous work on COPY FROM optimization was done in CASSANDRA-7405 and > CASSANDRA-8225. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9892) Add support for unsandboxed UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642912#comment-14642912 ] Brian Hess commented on CASSANDRA-9892: Yeah - I agree with [~tjake] and [~snazy]. TRUSTED should mean that the function can be run without the extra safeguards. You trust the function. > Add support for unsandboxed UDF > --- > > Key: CASSANDRA-9892 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9892 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp >Priority: Minor > > From discussion on CASSANDRA-9402, > The approach postgresql takes is to distinguish between "trusted" (sandboxed) > and "untrusted" (anything goes) UDF languages. > Creating an untrusted language always requires superuser mode. Once that is > done, creating functions in it requires nothing special. > Personally I would be fine with this approach, but I think it would be more > useful to have the extra permission on creating the function, and also > wouldn't require adding explicit CREATE LANGUAGE. > So I'd suggest just providing different CQL permissions for trusted and > untrusted, i.e. if you have CREATE FUNCTION permission that allows you to > create sandboxed UDF, but you can only create unsandboxed if you have CREATE > UNTRUSTED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9892) Add support for unsandboxed UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640610#comment-14640610 ] Brian Hess commented on CASSANDRA-9892: I don't particularly like the change of the meaning of FENCED/UNFENCED, but we could use that. FENCED being safe and UNFENCED being unsafe. This is what DB2 and Netezza do. > Add support for unsandboxed UDF > --- > > Key: CASSANDRA-9892 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9892 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp >Priority: Minor > > From discussion on CASSANDRA-9402, > The approach postgresql takes is to distinguish between "trusted" (sandboxed) > and "untrusted" (anything goes) UDF languages. > Creating an untrusted language always requires superuser mode. Once that is > done, creating functions in it requires nothing special. > Personally I would be fine with this approach, but I think it would be more > useful to have the extra permission on creating the function, and also > wouldn't require adding explicit CREATE LANGUAGE. > So I'd suggest just providing different CQL permissions for trusted and > untrusted, i.e. if you have CREATE FUNCTION permission that allows you to > create sandboxed UDF, but you can only create unsandboxed if you have CREATE > UNTRUSTED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9481) FENCED UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640437#comment-14640437 ] Brian Hess commented on CASSANDRA-9481: See 9892 > FENCED UDFs > --- > > Key: CASSANDRA-9481 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9481 > Project: Cassandra > Issue Type: New Feature >Reporter: Brian Hess > > Related to security/sandboxing of UDFs (CASSANDRA-9042) > Essentially, the UDF will run in a separate process when it is registered as > FENCED, and run in-process when it is registered as UNFENCED. > This doesn't necessarily remove all the issues, but it does help mitigate > them/some - especially since it would (optionally) run as another user. > This could look like the following with Cassandra: > - FENCED is a GRANTable privilege > - In cassandra.yaml you can specify the user to use when launching the > separate process (so that it is not the same user that is running the > database - or optionally is) > - This is good so that the UDF can't stop the database, delete database > files, etc. > - For FENCED UDFs, IPC would be used to transfer rows to the UDF and to > return results. We could use CQL rows for the data. This could be shared > memory or sockets (Unux or TPC - slight preference for sockets for some > follow-on ideas). > - Ideally, switching from FENCED to UNFENCED would be just a DDL change. That > is, the API would work such that a simple "ALTER FUNCTION myFunction(DOUBLE, > DOUBLE) UNFENCED" would change it. > - If you wanted, because this is a separate process you could use a separate > class loader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632059#comment-14632059 ] Brian Hess commented on CASSANDRA-6477: Let's go back to the basic use case that this is supposed to replace/help/make better. The case where we want two query tables for the same data. That is, they have the same primary keys, but different partition keys (and clustering column orders). Today, I would do this by having a logged batch for the insert and that batch would insert into each of the two query tables. With this I get some data consistency guarantees. For example, if the client returns "success", I know that *both* of the inserts were accepted at the desired consistency level. So, if I did 2 writes at CL_QUORUM, and I receive a "success", then I know I can then do a CL_QUORUM read of *either* table and see the most recent data. However, with this "asynchronous" MV approach, I no longer get this behavior. I write to the base table at CL_QUORUM and get the "success" return. At that point, I can do a CL_QUORUM read from the base table and see the most recent insert. However, if I do a CL_QUORUM read from the MV, I have no guarantees at all. This approach does not address the basic situation that we are trying to cover. That concerns me greatly. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632008#comment-14632008 ] Brian Hess commented on CASSANDRA-6477: Let's go back to the basic use case that this is supposed to replace/help/make better. The case where we want two query tables for the same data. That is, they have the same primary keys, but different partition keys (and clustering column orders). Today, I would do this by having a logged batch for the insert and that batch would insert into each of the two query tables. With this I get some data consistency guarantees. For example, if the client returns "success", I know that *both* of the inserts were accepted at the desired consistency level. So, if I did 2 writes at CL_QUORUM, and I receive a "success", then I know I can then do a CL_QUORUM read of *either* table and see the most recent data. However, with this "asynchronous" MV approach, I no longer get this behavior. I write to the base table at CL_QUORUM and get the "success" return. At that point, I can do a CL_QUORUM read from the base table and see the most recent insert. However, if I do a CL_QUORUM read from the MV, I have no guarantees at all. This approach does not address the basic situation that we are trying to cover. That concerns me greatly. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631991#comment-14631991 ] Brian Hess commented on CASSANDRA-6477: Let's go back to the basic use case that this is supposed to replace/help/make better. The case where we want two query tables for the same data. That is, they have the same primary keys, but different partition keys (and clustering column orders). Today, I would do this by having a logged batch for the insert and that batch would insert into each of the two query tables. With this I get some data consistency guarantees. For example, if the client returns "success", I know that *both* of the inserts were accepted at the desired consistency level. So, if I did 2 writes at CL_QUORUM, and I receive a "success", then I know I can then do a CL_QUORUM read of *either* table and see the most recent data. However, with this "asynchronous" MV approach, I no longer get this behavior. I write to the base table at CL_QUORUM and get the "success" return. At that point, I can do a CL_QUORUM read from the base table and see the most recent insert. However, if I do a CL_QUORUM read from the MV, I have no guarantees at all. This approach does not address the basic situation that we are trying to cover. That concerns me greatly. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631992#comment-14631992 ] Brian Hess commented on CASSANDRA-6477: Let's go back to the basic use case that this is supposed to replace/help/make better. The case where we want two query tables for the same data. That is, they have the same primary keys, but different partition keys (and clustering column orders). Today, I would do this by having a logged batch for the insert and that batch would insert into each of the two query tables. With this I get some data consistency guarantees. For example, if the client returns "success", I know that *both* of the inserts were accepted at the desired consistency level. So, if I did 2 writes at CL_QUORUM, and I receive a "success", then I know I can then do a CL_QUORUM read of *either* table and see the most recent data. However, with this "asynchronous" MV approach, I no longer get this behavior. I write to the base table at CL_QUORUM and get the "success" return. At that point, I can do a CL_QUORUM read from the base table and see the most recent insert. However, if I do a CL_QUORUM read from the MV, I have no guarantees at all. This approach does not address the basic situation that we are trying to cover. That concerns me greatly. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631988#comment-14631988 ] Brian Hess commented on CASSANDRA-6477: Let's go back to the basic use case that this is supposed to replace/help/make better. The case where we want two query tables for the same data. That is, they have the same primary keys, but different partition keys (and clustering column orders). Today, I would do this by having a logged batch for the insert and that batch would insert into each of the two query tables. With this I get some data consistency guarantees. For example, if the client returns "success", I know that *both* of the inserts were accepted at the desired consistency level. So, if I did 2 writes at CL_QUORUM, and I receive a "success", then I know I can then do a CL_QUORUM read of *either* table and see the most recent data. However, with this "asynchronous" MV approach, I no longer get this behavior. I write to the base table at CL_QUORUM and get the "success" return. At that point, I can do a CL_QUORUM read from the base table and see the most recent insert. However, if I do a CL_QUORUM read from the MV, I have no guarantees at all. This approach does not address the basic situation that we are trying to cover. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631267#comment-14631267 ] Brian Hess commented on CASSANDRA-6477: +1 I think that is the "promise" of the MV. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629870#comment-14629870 ] Brian Hess commented on CASSANDRA-6477: Just for clarity [~slebresne] - you'd have to make the first name the Partition Key and the rest of the Primary Keys of the base table as Clustering Columns, correct? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9778) CQL support for time series aggregation
[ https://issues.apache.org/jira/browse/CASSANDRA-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628427#comment-14628427 ] Brian Hess commented on CASSANDRA-9778: Did you mean to have DAY be an argument to these functions: "day_time(dtime, 1h)"? Or is implied to be the "hour" of Now()? Do we want to support "Day_time(dtime, 2h) as doublehour"? Or are the options really Day_time(dtimedtime, 1h), Day_time(dtime, 1d), Day_time(dtime, 1w), etc? If so, Oracle has syntax for extracting the parts of the date or datetime type: https://docs.oracle.com/cd/E17952_01/refman-5.1-en/date-and-time-functions.html#function_hour For example "Day(dtime)", "Hour(dtime)", etc. Postgres also has this, though their syntax is a bit fancier: http://www.postgresql.org/docs/8.2/static/functions-datetime.html For example "Extract(hour FROM dtime)", "Extract(day FROM dtime), etc. So, while windowed aggregates are overkill here (and really, they are), if we want this functionality in Cassandra, we should consider the similar syntax in Postgres or Oracle (or DB2 (Day(), Month(), etc), MySQL (Day(), Month(), etc), SQL Server (DatePart()), etc). > CQL support for time series aggregation > --- > > Key: CASSANDRA-9778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9778 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis > Fix For: 3.x > > > Along with MV (CASSANDRA-6477), time series aggregation or "rollups" are a > common design pattern in cassandra applications. I'd like to add CQL support > for this along these lines: > {code} > CREATE MATERIALIZED VIEW stocks_by_hour AS > SELECT exchange, day, day_time(1h) AS hour, symbol, avg(price), sum(volume) > FROM stocks > GROUP BY exchange, day, symbol, hour > PRIMARY KEY ((exchange, day), hour, symbol); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626668#comment-14626668 ] Brian Hess commented on CASSANDRA-6477: A question to behavior here. If we do the following {noformat} INSERT INTO t (k, t, v) VALUES (0, 0, 0) USING TTL 1000; UPDATE t SET v = null WHERE k = 0 AND t = 0; INSERT INTO t (k, t) VALUES (1, 1) USING TTL 1000; {noformat} Then, doing a {noformat} SELECT * FROM t; {noformat} would result in getting both rows: (0, 0, null) and (1, 1, null) However, if we have the view also have a null for the first row, so it would be (0, null, 0), wouldn't we also expect to see a row for (1, null, 1)? The fact that a cell was non-null and is now null shouldn't really have any different behavior than if it was null from the start. Another point - I think the primary use case will be to mimic query tables. That is, the MV will have the same columns in its primary key as the base table's, but with different partition keys and different orders for clustering columns. I could be wrong, but that seems like the biggest use case. And in that use case the keys are all non-null. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626368#comment-14626368 ] Brian Hess commented on CASSANDRA-6477: Moreover, if you were to handle NULLs in the application tier (say by the application using the string "NULL" or the INT -1 or whatever) you end up with exactly the same hot spot. This is a known issue for distributed systems in general (not just Cassandra), so I would note it and move on - just like we note that GENDER is a bad partition key today. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT
[ https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616747#comment-14616747 ] Brian Hess commented on CASSANDRA-9415: We already silently rewrite queries for users when they use a secondary index. The user doesn't specify that C* should consult the secondary index when they query; it is implicit. > Implicit use of Materialized Views on SELECT > > > Key: CASSANDRA-9415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9415 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > Labels: ponies > > CASSANDRA-6477 introduces Materialized Views. This greatly simplifies the > write path for the best-practice of "query tables". But it does not simplify > the read path as much as our users want/need. > We suggest to folks to create multiple copies of their base table optimized > for certain queries - hence "query table". For example, we may have a USER > table with two type of queries: lookup by userid and lookup by email address. > We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL. Both > would have the exact same schema, with the same PRIMARY KEY columns, but > different PARTITION KEY - the first would be USERID and the second would be > EMAIL. > One complicating thing with this approach is that the application now needs > to know that when it INSERT/UPDATE/DELETEs from the base table it needs to > INSERT/UPDATE/DELETE from all of the query tables as well. CASSANDRA-6477 > covers this nicely. > However, the other side of the coin is that the application needs to know > which query table to leverage based on the selection criteria. Using the > example above, if the query has a predicate such as "WHERE userid = 'bhess'", > then USERS_BY_USERID is the better table to use. Similarly, when the > predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL is > appropriate. > On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" > to the collection of tables. You do operations just on the base table. It > is very attractive for the SELECT side as well. It would be very good to > allow an application to simply do "SELECT * FROM users WHERE userid = > 'bhess'" and have that query implicitly leverage the USERS_BY_USERID > materialized view. > For additional use cases, especially analytics use cases like in Spark, this > allows the Spark code to simply push down the query without having to know > about all of the MVs that have been set up. The system will route the query > appropriately. And if additional MVs are necessary to make a query run > better/faster, then those MVs can be set up and Spark will implicitly > leverage them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT
[ https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616746#comment-14616746 ] Brian Hess commented on CASSANDRA-9415: For #1, that would be part of choosing the MV. If it didn't contain all the columns in the projection list then you couldn't use that MV. For #2, that would also be understandable by C*. That is, you can see if the predicate of the query "matches" the predicate of the MV. For example, if the MV had a predicate like "WHERE x > 100", then if the query had a predicate like "WHERE x=200" then you know you could use the MV. As to the driver being able to route the query - in order for the driver to route the query you need to prepare the statement. In preparing the statement, you would choose the MV, and the driver gets that anyway. > Implicit use of Materialized Views on SELECT > > > Key: CASSANDRA-9415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9415 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > Labels: ponies > > CASSANDRA-6477 introduces Materialized Views. This greatly simplifies the > write path for the best-practice of "query tables". But it does not simplify > the read path as much as our users want/need. > We suggest to folks to create multiple copies of their base table optimized > for certain queries - hence "query table". For example, we may have a USER > table with two type of queries: lookup by userid and lookup by email address. > We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL. Both > would have the exact same schema, with the same PRIMARY KEY columns, but > different PARTITION KEY - the first would be USERID and the second would be > EMAIL. > One complicating thing with this approach is that the application now needs > to know that when it INSERT/UPDATE/DELETEs from the base table it needs to > INSERT/UPDATE/DELETE from all of the query tables as well. CASSANDRA-6477 > covers this nicely. > However, the other side of the coin is that the application needs to know > which query table to leverage based on the selection criteria. Using the > example above, if the query has a predicate such as "WHERE userid = 'bhess'", > then USERS_BY_USERID is the better table to use. Similarly, when the > predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL is > appropriate. > On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" > to the collection of tables. You do operations just on the base table. It > is very attractive for the SELECT side as well. It would be very good to > allow an application to simply do "SELECT * FROM users WHERE userid = > 'bhess'" and have that query implicitly leverage the USERS_BY_USERID > materialized view. > For additional use cases, especially analytics use cases like in Spark, this > allows the Spark code to simply push down the query without having to know > about all of the MVs that have been set up. The system will route the query > appropriately. And if additional MVs are necessary to make a query run > better/faster, then those MVs can be set up and Spark will implicitly > leverage them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT
[ https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616743#comment-14616743 ] Brian Hess commented on CASSANDRA-9415: Wouldn't the preparing of the statement handle that? > Implicit use of Materialized Views on SELECT > > > Key: CASSANDRA-9415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9415 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > Labels: ponies > > CASSANDRA-6477 introduces Materialized Views. This greatly simplifies the > write path for the best-practice of "query tables". But it does not simplify > the read path as much as our users want/need. > We suggest to folks to create multiple copies of their base table optimized > for certain queries - hence "query table". For example, we may have a USER > table with two type of queries: lookup by userid and lookup by email address. > We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL. Both > would have the exact same schema, with the same PRIMARY KEY columns, but > different PARTITION KEY - the first would be USERID and the second would be > EMAIL. > One complicating thing with this approach is that the application now needs > to know that when it INSERT/UPDATE/DELETEs from the base table it needs to > INSERT/UPDATE/DELETE from all of the query tables as well. CASSANDRA-6477 > covers this nicely. > However, the other side of the coin is that the application needs to know > which query table to leverage based on the selection criteria. Using the > example above, if the query has a predicate such as "WHERE userid = 'bhess'", > then USERS_BY_USERID is the better table to use. Similarly, when the > predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL is > appropriate. > On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" > to the collection of tables. You do operations just on the base table. It > is very attractive for the SELECT side as well. It would be very good to > allow an application to simply do "SELECT * FROM users WHERE userid = > 'bhess'" and have that query implicitly leverage the USERS_BY_USERID > materialized view. > For additional use cases, especially analytics use cases like in Spark, this > allows the Spark code to simply push down the query without having to know > about all of the MVs that have been set up. The system will route the query > appropriately. And if additional MVs are necessary to make a query run > better/faster, then those MVs can be set up and Spark will implicitly > leverage them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8986) Major cassandra-stress refactor
[ https://issues.apache.org/jira/browse/CASSANDRA-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608369#comment-14608369 ] Brian Hess commented on CASSANDRA-8986: It would be good if you could leverage asynchronous execution instead of spawning so many threads yourself. You can have flags/options to limit the number of futures in flight (see FutureManager at https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/futures/FutureManager.java for example) and the number of threads, in addition to limiting the rate (which is already an option). > Major cassandra-stress refactor > --- > > Key: CASSANDRA-8986 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8986 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Benedict >Assignee: Benedict > > We need a tool for both stressing _and_ validating more complex workloads > than stress currently supports. Stress needs a raft of changes, and I think > it would be easier to deliver many of these as a single major endeavour which > I think is justifiable given its audience. The rough behaviours I want stress > to support are: > * Ability to know exactly how many rows it will produce, for any clustering > prefix, without generating those prefixes > * Ability to generate an amount of data proportional to the amount it will > produce to the server (or consume from the server), rather than proportional > to the variation in clustering columns > * Ability to reliably produce near identical behaviour each run > * Ability to understand complex overlays of operation types (LWT, Delete, > Expiry, although perhaps not all implemented immediately, the framework for > supporting them easily) > * Ability to (with minimal internal state) understand the complete cluster > state through overlays of multiple procedural generations > * Ability to understand the in-flight state of in-progress operations (i.e. > if we're applying a delete, understand that the delete may have been applied, > and may not have been, for potentially multiple conflicting in flight > operations) > I think the necessary changes to support this would give us the _functional_ > base to support all the functionality I can currently envisage stress > needing. Before embarking on this (which I may attempt very soon), it would > be helpful to get input from others as to features missing from stress that I > haven't covered here that we will certainly want in the future, so that they > can be factored in to the overall design and hopefully avoid another refactor > one year from now, as its complexity is scaling each time, and each time it > is a higher sunk cost. [~jbellis] [~iamaleksey] [~slebresne] [~tjake] > [~enigmacurry] [~aweisberg] [~blambov] [~jshook] ... and @everyone else :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9550) Support fixed precision data type
[ https://issues.apache.org/jira/browse/CASSANDRA-9550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573703#comment-14573703 ] Brian Hess commented on CASSANDRA-9550: [~snazy] - I'm not sure what you mean by "you lose the value of the BigDecimal.precision field during serialization", but losing the precision is certainly an issue. > Support fixed precision data type > - > > Key: CASSANDRA-9550 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9550 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > > SQL databases support a fixed precision data type. When converting data > models from SQL DBs to Cassandra, the choices are not clear. DOUBLE or FLOAT > are possibilities, but with UDFs and Aggregates this is actually a lot less > clear, as you really do need fixed precision arithmetic. That is, one flaw > with floating-point math is that the order of the operations can change the > final result. That is not an issue with fixed precision arithmetic (though > other trade-offs are there). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9550) Support fixed precision data type
[ https://issues.apache.org/jira/browse/CASSANDRA-9550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573612#comment-14573612 ] Brian Hess commented on CASSANDRA-9550: There is a difference between arbitrary precision numbers (namely BigDecimal) and fixed precision numbers (namely SQL's NUMERIC). Are you suggesting that we use DECIMAL in C* instead of NUMERIC(p,s) in SQL systems? The fixed precision is frequently needed in SQL - especially in financial situations (which we specifically call out in the C* docs). Scaling by a factor in the app does not work well with the advent of UDFs and UDAs. For example, the scale changes on multiplication and division, and without NUMERIC(p,s) you would not be able to convey that the scale changed. > Support fixed precision data type > - > > Key: CASSANDRA-9550 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9550 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > > SQL databases support a fixed precision data type. When converting data > models from SQL DBs to Cassandra, the choices are not clear. DOUBLE or FLOAT > are possibilities, but with UDFs and Aggregates this is actually a lot less > clear, as you really do need fixed precision arithmetic. That is, one flaw > with floating-point math is that the order of the operations can change the > final result. That is not an issue with fixed precision arithmetic (though > other trade-offs are there). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9552) COPY FROM times out after 110000 inserts
[ https://issues.apache.org/jira/browse/CASSANDRA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573605#comment-14573605 ] Brian Hess commented on CASSANDRA-9552: Ah - thank you - this is version 2.1.5.469 (according to CQLSH). > COPY FROM times out after 11 inserts > > > Key: CASSANDRA-9552 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9552 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > Labels: cqlsh > > I am trying to test out performance of COPY FROM on various schemas. I have > a 100-BIGINT-column table defined as: > {code} > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '3'} AND durable_writes = true; > CREATE TABLE test.test100 ( > pkey bigint,ccol bigint,col0 bigint,col1 bigint,col10 > bigint, > col11 bigint,col12 bigint,col13 bigint,col14 bigint,col15 > bigint, > col16 bigint,col17 bigint,col18 bigint,col19 bigint,col2 > bigint, > col20 bigint,col21 bigint,col22 bigint,col23 bigint,col24 > bigint, > col25 bigint,col26 bigint,col27 bigint,col28 bigint,col29 > bigint, > col3 bigint,col30 bigint,col31 bigint,col32 bigint,col33 > bigint, > col34 bigint,col35 bigint,col36 bigint,col37 bigint,col38 > bigint, > col39 bigint,col4 bigint,col40 bigint,col41 bigint,col42 > bigint, > col43 bigint,col44 bigint,col45 bigint,col46 bigint,col47 > bigint, > col48 bigint,col49 bigint,col5 bigint,col50 bigint,col51 > bigint, > col52 bigint,col53 bigint,col54 bigint,col55 bigint,col56 > bigint, > col57 bigint,col58 bigint,col59 bigint,col6 bigint,col60 > bigint, > col61 bigint,col62 bigint,col63 bigint,col64 bigint,col65 > bigint, > col66 bigint,col67 bigint,col68 bigint,col69 bigint,col7 > bigint, > col70 bigint,col71 bigint,col72 bigint,col73 bigint,col74 > bigint, > col75 bigint,col76 bigint,col77 bigint,col78 bigint,col79 > bigint, > col8 bigint,col80 bigint,col81 bigint,col82 bigint,col83 > bigint, > col84 bigint,col85 bigint,col86 bigint,col87 bigint,col88 > bigint, > col89 bigint,col9 bigint,col90 bigint,col91 bigint,col92 > bigint, > col93 bigint,col94 bigint,col95 bigint,col96 bigint,col97 > bigint, > PRIMARY KEY (pkey, ccol) > ) WITH CLUSTERING ORDER BY (ccol ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > {code} > I then try to load the linked file of 120,000 rows of 100 BIGINT columns via: > {code} > cqlsh -e "COPY > test.test100(pkey,ccol,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33,col34,col35,col36,col37,col38,col39,col40,col41,col42,col43,col44,col45,col46,col47,col48,col49,col50,col51,col52,col53,col54,col55,col56,col57,col58,col59,col60,col61,col62,col63,col64,col65,col66,col67,col68,col69,col70,col71,col72,col73,col74,col75,col76,col77,col78,col79,col80,col81,col82,col83,col84,col85,col86,col87,col88,col89,col90,col91,col92,col93,col94,col95,col96,col97) > FROM 'data120K.csv'" > {code} > Data file here: > https://drive.google.com/file/d/0B87-Pevy14fuUVcxemFRcFFtRjQ/view?usp=sharing > After 11 rows, it errors and hangs: > {code} > :1:11 rows; Write: 19848.21 rows/s > Connection heartbeat failure > :1:Aborting import at record #1196. Previously inserted records are > still present, and some records after that may be present as well. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9552) COPY FROM times out after 110000 inserts
Brian Hess created CASSANDRA-9552: -- Summary: COPY FROM times out after 11 inserts Key: CASSANDRA-9552 URL: https://issues.apache.org/jira/browse/CASSANDRA-9552 Project: Cassandra Issue Type: Improvement Reporter: Brian Hess I am trying to test out performance of COPY FROM on various schemas. I have a 100-BIGINT-column table defined as: {{ CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; CREATE TABLE test.test100 ( pkey bigint,ccol bigint,col0 bigint,col1 bigint,col10 bigint, col11 bigint,col12 bigint,col13 bigint,col14 bigint,col15 bigint, col16 bigint,col17 bigint,col18 bigint,col19 bigint,col2 bigint, col20 bigint,col21 bigint,col22 bigint,col23 bigint,col24 bigint, col25 bigint,col26 bigint,col27 bigint,col28 bigint,col29 bigint, col3 bigint,col30 bigint,col31 bigint,col32 bigint,col33 bigint, col34 bigint,col35 bigint,col36 bigint,col37 bigint,col38 bigint, col39 bigint,col4 bigint,col40 bigint,col41 bigint,col42 bigint, col43 bigint,col44 bigint,col45 bigint,col46 bigint,col47 bigint, col48 bigint,col49 bigint,col5 bigint,col50 bigint,col51 bigint, col52 bigint,col53 bigint,col54 bigint,col55 bigint,col56 bigint, col57 bigint,col58 bigint,col59 bigint,col6 bigint,col60 bigint, col61 bigint,col62 bigint,col63 bigint,col64 bigint,col65 bigint, col66 bigint,col67 bigint,col68 bigint,col69 bigint,col7 bigint, col70 bigint,col71 bigint,col72 bigint,col73 bigint,col74 bigint, col75 bigint,col76 bigint,col77 bigint,col78 bigint,col79 bigint, col8 bigint,col80 bigint,col81 bigint,col82 bigint,col83 bigint, col84 bigint,col85 bigint,col86 bigint,col87 bigint,col88 bigint, col89 bigint,col9 bigint,col90 bigint,col91 bigint,col92 bigint, col93 bigint,col94 bigint,col95 bigint,col96 bigint,col97 bigint, PRIMARY KEY (pkey, ccol) ) WITH CLUSTERING ORDER BY (ccol ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; }} I then try to load the linked file of 120,000 rows of 100 BIGINT columns via: {{ cqlsh -e "COPY test.test100(pkey,ccol,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33,col34,col35,col36,col37,col38,col39,col40,col41,col42,col43,col44,col45,col46,col47,col48,col49,col50,col51,col52,col53,col54,col55,col56,col57,col58,col59,col60,col61,col62,col63,col64,col65,col66,col67,col68,col69,col70,col71,col72,col73,col74,col75,col76,col77,col78,col79,col80,col81,col82,col83,col84,col85,col86,col87,col88,col89,col90,col91,col92,col93,col94,col95,col96,col97) FROM 'data120K.csv'" }} Data file here: https://drive.google.com/file/d/0B87-Pevy14fuUVcxemFRcFFtRjQ/view?usp=sharing After 11 rows, it errors and hangs: {{ :1:11 rows; Write: 19848.21 rows/s Connection heartbeat failure :1:Aborting import at record #1196. Previously inserted records are still present, and some records after that may be present as well. }} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9550) Support fixed precision data type
Brian Hess created CASSANDRA-9550: -- Summary: Support fixed precision data type Key: CASSANDRA-9550 URL: https://issues.apache.org/jira/browse/CASSANDRA-9550 Project: Cassandra Issue Type: Improvement Reporter: Brian Hess SQL databases support a fixed precision data type. When converting data models from SQL DBs to Cassandra, the choices are not clear. DOUBLE or FLOAT are possibilities, but with UDFs and Aggregates this is actually a lot less clear, as you really do need fixed precision arithmetic. That is, one flaw with floating-point math is that the order of the operations can change the final result. That is not an issue with fixed precision arithmetic (though other trade-offs are there). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9402) Implement proper sandboxing for UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559420#comment-14559420 ] Brian Hess edited comment on CASSANDRA-9402 at 5/26/15 5:17 PM: - I do think that you may want to leverage FENCED UDFs to accomplish some of the objections/objectives here. Either way, I created CASSANDRA-9481 for FENCED UDFs was (Author: brianmhess): I'm do think that you may want to leverage FENCED UDFs to accomplish some of the objections/objectives here. Either way, I created CASSANDRA-9481 for FENCED UDFs > Implement proper sandboxing for UDFs > > > Key: CASSANDRA-9402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9402 > Project: Cassandra > Issue Type: Task >Reporter: T Jake Luciani >Assignee: Robert Stupp >Priority: Critical > Labels: doc-impacting, security > Fix For: 2.2.0 rc1 > > > We want to avoid a security exploit for our users. We need to make sure we > ship 2.2 UDFs with good defaults so someone exposing it to the internet > accidentally doesn't open themselves up to having arbitrary code run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9402) Implement proper sandboxing for UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559420#comment-14559420 ] Brian Hess commented on CASSANDRA-9402: I'm do think that you may want to leverage FENCED UDFs to accomplish some of the objections/objectives here. Either way, I created CASSANDRA-9481 for FENCED UDFs > Implement proper sandboxing for UDFs > > > Key: CASSANDRA-9402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9402 > Project: Cassandra > Issue Type: Task >Reporter: T Jake Luciani >Assignee: Robert Stupp >Priority: Critical > Labels: doc-impacting, security > Fix For: 2.2.0 rc1 > > > We want to avoid a security exploit for our users. We need to make sure we > ship 2.2 UDFs with good defaults so someone exposing it to the internet > accidentally doesn't open themselves up to having arbitrary code run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9481) FENCED UDFs
Brian Hess created CASSANDRA-9481: -- Summary: FENCED UDFs Key: CASSANDRA-9481 URL: https://issues.apache.org/jira/browse/CASSANDRA-9481 Project: Cassandra Issue Type: New Feature Reporter: Brian Hess Related to security/sandboxing of UDFs (CASSANDRA-9042) Essentially, the UDF will run in a separate process when it is registered as FENCED, and run in-process when it is registered as UNFENCED. This doesn't necessarily remove all the issues, but it does help mitigate them/some - especially since it would (optionally) run as another user. This could look like the following with Cassandra: - FENCED is a GRANTable privilege - In cassandra.yaml you can specify the user to use when launching the separate process (so that it is not the same user that is running the database - or optionally is) - This is good so that the UDF can't stop the database, delete database files, etc. - For FENCED UDFs, IPC would be used to transfer rows to the UDF and to return results. We could use CQL rows for the data. This could be shared memory or sockets (Unux or TPC - slight preference for sockets for some follow-on ideas). - Ideally, switching from FENCED to UNFENCED would be just a DDL change. That is, the API would work such that a simple "ALTER FUNCTION myFunction(DOUBLE, DOUBLE) UNFENCED" would change it. - If you wanted, because this is a separate process you could use a separate class loader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9402) Implement proper sandboxing for UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559291#comment-14559291 ] Brian Hess commented on CASSANDRA-9402: There is another alternative - FENCED UDFs. DB2 and Netezza had these options. Essentially, the UDF will run in a separate process when it is registered as FENCED, and run in-process when it is registered as UNFENCED. This doesn't necessarily remove all the issues, but it does help mitigate them/some - especially since it would (optionally) run as another user. This could look like the following with Cassandra: - FENCED is a GRANTable privilege - In cassandra.yaml you can specify the user to use when launching the separate process (so that it is *not* the same user that is running the database - or optionally is) - This is good so that the UDF can't stop the database, delete database files, etc. - For FENCED UDFs, IPC would be used to transfer rows to the UDF and to return results. We could use CQL rows for the data. This could be shared memory or sockets (Unux or TPC - slight preference for sockets for some follow-on ideas). - Ideally, switching from FENCED to UNFENCED would be just a DDL change. That is, the API would work such that a simple "ALTER FUNCTION myFunction(DOUBLE, DOUBLE) UNFENCED" would change it. - If you wanted, because this is a separate process you could use a separate class loader. > Implement proper sandboxing for UDFs > > > Key: CASSANDRA-9402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9402 > Project: Cassandra > Issue Type: Task >Reporter: T Jake Luciani >Assignee: Robert Stupp >Priority: Critical > Labels: doc-impacting, security > Fix For: 2.2.0 rc1 > > > We want to avoid a security exploit for our users. We need to make sure we > ship 2.2 UDFs with good defaults so someone exposing it to the internet > accidentally doesn't open themselves up to having arbitrary code run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554333#comment-14554333 ] Brian Hess commented on CASSANDRA-6477: Can someone clarify whether, based on the current implementation, the Materialized View (or some of the Materialized Views, if there are more than one) can contain the mutation but the base table does not - or vice versa? Can you clarify what happens in each of the failure scenarios? It will be very important for us to know for sure when we discuss this with users/customers. > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549088#comment-14549088 ] Brian Hess commented on CASSANDRA-6477: I really don't need to re-ignite the nomenclature debate, but another name that might be more fitting with how we've talked about these tables in the past is DB2's name: Materialized Query Tables Anyone like or hate that name more? > Materialized Views (was: Global Indexes) > > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis >Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.x > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT
[ https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548608#comment-14548608 ] Brian Hess commented on CASSANDRA-9415: [~jbellis] Oracle, DB2, and SQL Server do this (at least - maybe others). In Oracle it is Materialized Views (see http://docs.oracle.com/cd/B28359_01/server.111/b28313/basicmv.htm): bq. The end user queries the tables and views at the detail data level. The query rewrite mechanism in the Oracle server automatically rewrites the SQL query to use the summary tables. This mechanism reduces response time for returning results from the query. Materialized views within the data warehouse are transparent to the end user or to the database application." In DB2 it is Materialized Query Tables (see http://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/) bq. Materialized query tables can significantly improve the performance of queries, especially complex queries. If the optimizer determines that a query or part of a query could be resolved using an MQT, the query might be rewritten to take advantage of the MQT. In SQL Server it is Indexed Views (see https://msdn.microsoft.com/en-us/library/dd171921(SQL.100).aspx): bq. The indexed view can be used in a query execution in two ways. The query can reference the indexed view directly, or, more importantly, the query optimizer can select the view if it determines that the view can be substituted for some or all of the query in the lowest-cost query plan. In the second case, the indexed view is used instead of the underlying tables and their ordinary indexes. The view does not need to be referenced in the query for the query optimizer to use it during query execution. This allows existing applications to benefit from the newly created indexed views without changing those applications. > Implicit use of Materialized Views on SELECT > > > Key: CASSANDRA-9415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9415 > Project: Cassandra > Issue Type: Improvement >Reporter: Brian Hess > Labels: ponies > > CASSANDRA-6477 introduces Materialized Views. This greatly simplifies the > write path for the best-practice of "query tables". But it does not simplify > the read path as much as our users want/need. > We suggest to folks to create multiple copies of their base table optimized > for certain queries - hence "query table". For example, we may have a USER > table with two type of queries: lookup by userid and lookup by email address. > We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL. Both > would have the exact same schema, with the same PRIMARY KEY columns, but > different PARTITION KEY - the first would be USERID and the second would be > EMAIL. > One complicating thing with this approach is that the application now needs > to know that when it INSERT/UPDATE/DELETEs from the base table it needs to > INSERT/UPDATE/DELETE from all of the query tables as well. CASSANDRA-6477 > covers this nicely. > However, the other side of the coin is that the application needs to know > which query table to leverage based on the selection criteria. Using the > example above, if the query has a predicate such as "WHERE userid = 'bhess'", > then USERS_BY_USERID is the better table to use. Similarly, when the > predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL is > appropriate. > On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" > to the collection of tables. You do operations just on the base table. It > is very attractive for the SELECT side as well. It would be very good to > allow an application to simply do "SELECT * FROM users WHERE userid = > 'bhess'" and have that query implicitly leverage the USERS_BY_USERID > materialized view. > For additional use cases, especially analytics use cases like in Spark, this > allows the Spark code to simply push down the query without having to know > about all of the MVs that have been set up. The system will route the query > appropriately. And if additional MVs are necessary to make a query run > better/faster, then those MVs can be set up and Spark will implicitly > leverage them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8234) CTAS for COPY
[ https://issues.apache.org/jira/browse/CASSANDRA-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548484#comment-14548484 ] Brian Hess commented on CASSANDRA-8234: What will the user-experience be for CTAS? Will it come under the same query timeout as with a normal SELECT query? Will it be synchronous or asynchronous? > CTAS for COPY > - > > Key: CASSANDRA-8234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8234 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Robin Schumacher > Fix For: 3.x > > > Continuous request from users is the ability to do CREATE TABLE AS SELECT... > The COPY command can be enhanced to perform simple and customized copies of > existing tables to satisfy the need. > - Simple copy is COPY table a TO new table b. > - Custom copy can mimic Postgres: (e.g. COPY (SELECT * FROM country WHERE > country_name LIKE 'A%') TO …) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9200) Sequences
[ https://issues.apache.org/jira/browse/CASSANDRA-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548436#comment-14548436 ] Brian Hess commented on CASSANDRA-9200: [~jjordan] - I was responding to this comment "numbers that a human should be able to read" from [~snazy] It would at least cover that space. Hence asking for the real customer need for increasing/decreasing values. > Sequences > - > > Key: CASSANDRA-9200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9200 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp > Fix For: 3.x > > > UUIDs are usually the right choice for surrogate keys, but sometimes > application constraints dictate an increasing numeric value. > We could do this by using LWT to reserve "blocks" of the sequence for each > member of the cluster, which would eliminate paxos contention at the cost of > not being strictly increasing. > PostgreSQL syntax: > http://www.postgresql.org/docs/9.4/static/sql-createsequence.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT
Brian Hess created CASSANDRA-9415: -- Summary: Implicit use of Materialized Views on SELECT Key: CASSANDRA-9415 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415 Project: Cassandra Issue Type: Improvement Reporter: Brian Hess CASSANDRA-6477 introduces Materialized Views. This greatly simplifies the write path for the best-practice of "query tables". But it does not simplify the read path as much as our users want/need. We suggest to folks to create multiple copies of their base table optimized for certain queries - hence "query table". For example, we may have a USER table with two type of queries: lookup by userid and lookup by email address. We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL. Both would have the exact same schema, with the same PRIMARY KEY columns, but different PARTITION KEY - the first would be USERID and the second would be EMAIL. One complicating thing with this approach is that the application now needs to know that when it INSERT/UPDATE/DELETEs from the base table it needs to INSERT/UPDATE/DELETE from all of the query tables as well. CASSANDRA-6477 covers this nicely. However, the other side of the coin is that the application needs to know which query table to leverage based on the selection criteria. Using the example above, if the query has a predicate such as "WHERE userid = 'bhess'", then USERS_BY_USERID is the better table to use. Similarly, when the predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL is appropriate. On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" to the collection of tables. You do operations just on the base table. It is very attractive for the SELECT side as well. It would be very good to allow an application to simply do "SELECT * FROM users WHERE userid = 'bhess'" and have that query implicitly leverage the USERS_BY_USERID materialized view. For additional use cases, especially analytics use cases like in Spark, this allows the Spark code to simply push down the query without having to know about all of the MVs that have been set up. The system will route the query appropriately. And if additional MVs are necessary to make a query run better/faster, then those MVs can be set up and Spark will implicitly leverage them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9200) Sequences
[ https://issues.apache.org/jira/browse/CASSANDRA-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548364#comment-14548364 ] Brian Hess commented on CASSANDRA-9200: Can you not get a lot of what you want with a UDF that maps a UUID to a BIGINT hash? That will not be increasing/decreasing numbers, but will be an 8-byte integer. Can someone explain more on how the increasing/decreasing quality is needed/leveraged/used by customers? > Sequences > - > > Key: CASSANDRA-9200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9200 > Project: Cassandra > Issue Type: New Feature >Reporter: Jonathan Ellis >Assignee: Robert Stupp > Fix For: 3.x > > > UUIDs are usually the right choice for surrogate keys, but sometimes > application constraints dictate an increasing numeric value. > We could do this by using LWT to reserve "blocks" of the sequence for each > member of the cluster, which would eliminate paxos contention at the cost of > not being strictly increasing. > PostgreSQL syntax: > http://www.postgresql.org/docs/9.4/static/sql-createsequence.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9259) Bulk Reading from Cassandra
Brian Hess created CASSANDRA-9259: -- Summary: Bulk Reading from Cassandra Key: CASSANDRA-9259 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Hess This ticket is following on from the 2015 NGCC. This ticket is designed to be a place for discussing and designing an approach to bulk reading. The goal is to have a bulk reading path for Cassandra. That is, a path optimized to grab a large portion of the data for a table (potentially all of it). This is a core element in the Spark integration with Cassandra, and the speed at which Cassandra can deliver bulk data to Spark is limiting the performance of Spark-plus-Cassandra operations. This is especially of importance as Cassandra will (likely) leverage Spark for internal operations (for example CASSANDRA-8234). The core CQL to consider is the following: SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) > X AND Token(partitionKey) <= Y Here, we choose X and Y to be contained within one token range (perhaps considering the primary range of a node without vnodes, for example). This query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk operations via Spark (or other processing frameworks - ETL, etc). There are a few causes (e.g., inefficient paging). There are a few approaches that could be considered. First, we consider a new "Streaming Compaction" approach. The key observation here is that a bulk read from Cassandra is a lot like a major compaction, though instead of outputting a new SSTable we would output CQL rows to a stream/socket/etc. This would be similar to a CompactionTask, but would strip out some unnecessary things in there (e.g., some of the indexing, etc). Predicates and projections could also be encapsulated in this new "StreamingCompactionTask", for example. Another approach would be an alternate storage format. For example, we might employ Parquet (just as an example) to store the same data as in the primary Cassandra storage (aka SSTables). This is akin to Global Indexes (an alternate storage of the same data optimized for a particular query). Then, Cassandra can choose to leverage this alternate storage for particular CQL queries (e.g., range scans). These are just 2 suggestions to get the conversation going. One thing to note is that it will be useful to have this storage segregated by token range so that when you extract via these mechanisms you do not get replications-factor numbers of copies of the data. That will certainly be an issue for some Spark operations (e.g., counting). Thus, we will want per-token-range storage (even for single disks), so this will likely leverage CASSANDRA-6696 (though, we'll want to also consider the single disk case). It is also worth discussing what the success criteria is here. It is unlikely to be as fast as EDW or HDFS performance (though, that is still a good goal), but being within some percentage of that performance should be set as success. For example, 2x as long as doing bulk operations on HDFS with similar node count/size/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9048) Delimited File Bulk Loader
[ https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384764#comment-14384764 ] Brian Hess commented on CASSANDRA-9048: [~thobbs] - do you mean CASSANDRA-8234? That's CTAS for COPY. I think we all agree that using Spark for that use case is very appropriate. This is about bulk loading files from outside Cassandra. > Delimited File Bulk Loader > -- > > Key: CASSANDRA-9048 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9048 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Brian Hess > Fix For: 3.0 > > Attachments: CASSANDRA-9048.patch > > > There is a strong need for bulk loading data from delimited files into > Cassandra. Starting with delimited files means that the data is not > currently in the SSTable format, and therefore cannot immediately leverage > Cassandra's bulk loading tool, sstableloader, directly. > A tool supporting delimited files much closer matches the format of the data > more often than the SSTable format itself, and a tool that loads from > delimited files is very useful. > In order for this bulk loader to be more generally useful to customers, it > should handle a number of options at a minimum: > - support specifying the input file or to read the data from stdin (so other > command-line programs can pipe into the loader) > - supply the CQL schema for the input data > - support all data types other than collections (collections is a stretch > goal/need) > - an option to specify the delimiter > - an option to specify comma as the decimal delimiter (for international use > casese) > - an option to specify how NULL values are specified in the file (e.g., the > empty string or the string NULL) > - an option to specify how BOOLEAN values are specified in the file (e.g., > TRUE/FALSE or 0/1) > - an option to specify the Date and Time format > - an option to skip some number of rows at the beginning of the file > - an option to only read in some number of rows from the file > - an option to indicate how many parse errors to tolerate > - an option to specify a file that will contain all the lines that did not > parse correctly (up to the maximum number of parse errors) > - an option to specify the CQL port to connect to (with 9042 as the default). > Additional options would be useful, but this set of options/features is a > start. > A word on COPY. COPY comes via CQLSH which requires the client to be the > same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra, > etc). This tool should be able to connect to any version of Cassandra > (within reason). For example, it should be able to handle 2.0.x and 2.1.x. > Moreover, CQLSH's COPY command does not support a number of the options > above. Lastly, the performance of COPY in 2.0.x is not high enough to be > considered a bulk ingest tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9048) Delimited File Bulk Loader
[ https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384448#comment-14384448 ] Brian Hess commented on CASSANDRA-9048: To answer [~jbellis]'s and [~iamaleksey]'s question about performance. I took the latest 2.1 (including [~thobbs]'s latest improvements to COPY) and set up a 5-node i2.xlarge cluster in EC2. I had another i2.xlarge node in the same EC2 data center be the client. I compared the COPY command with the java tool in the patch. There were 2 tests I did. The first was 13107200 rows of 10 BIGINTs (so about 1GB of data) from a single file (on disk about 630MB). COPY was able to load this in 16:46 (1006 seconds). The java tool did it in 8:24 (504 seconds). So, the java tool was twice as fast as COPY. The second test was 10M rows of 6 BIGINTs, 2 DOUBLEs, and 2 TEXT fields (random date strings). COPY failed to complete this load (I tried a number of times - it failed with "" - which is also weird since CQLSH defaults to CL_ONE), but the java tool completed it in 6:58 (418 seconds). I dropped the number of rows to 4M so COPY could complete. COPY was able to load in 5:17 (317 seconds). The java tool completed the load in 2:44 (164 seconds). Again, the java tool is about twice as fast as COPY. > Delimited File Bulk Loader > -- > > Key: CASSANDRA-9048 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9048 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Brian Hess > Fix For: 3.0 > > Attachments: CASSANDRA-9048.patch > > > There is a strong need for bulk loading data from delimited files into > Cassandra. Starting with delimited files means that the data is not > currently in the SSTable format, and therefore cannot immediately leverage > Cassandra's bulk loading tool, sstableloader, directly. > A tool supporting delimited files much closer matches the format of the data > more often than the SSTable format itself, and a tool that loads from > delimited files is very useful. > In order for this bulk loader to be more generally useful to customers, it > should handle a number of options at a minimum: > - support specifying the input file or to read the data from stdin (so other > command-line programs can pipe into the loader) > - supply the CQL schema for the input data > - support all data types other than collections (collections is a stretch > goal/need) > - an option to specify the delimiter > - an option to specify comma as the decimal delimiter (for international use > casese) > - an option to specify how NULL values are specified in the file (e.g., the > empty string or the string NULL) > - an option to specify how BOOLEAN values are specified in the file (e.g., > TRUE/FALSE or 0/1) > - an option to specify the Date and Time format > - an option to skip some number of rows at the beginning of the file > - an option to only read in some number of rows from the file > - an option to indicate how many parse errors to tolerate > - an option to specify a file that will contain all the lines that did not > parse correctly (up to the maximum number of parse errors) > - an option to specify the CQL port to connect to (with 9042 as the default). > Additional options would be useful, but this set of options/features is a > start. > A word on COPY. COPY comes via CQLSH which requires the client to be the > same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra, > etc). This tool should be able to connect to any version of Cassandra > (within reason). For example, it should be able to handle 2.0.x and 2.1.x. > Moreover, CQLSH's COPY command does not support a number of the options > above. Lastly, the performance of COPY in 2.0.x is not high enough to be > considered a bulk ingest tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9048) Delimited File Bulk Loader
[ https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382724#comment-14382724 ] Brian Hess commented on CASSANDRA-9048: [~carlyeks] and [~jjirsa] - this tool (as implemented, at least), does not depend on the version of Cassandra. It leverages the Java driver, much like cassandra-stress and can be pointed at C* 2.0 or 2.1, etc. In similar lines, it aligns with the tree in the same way that cassandra-stress aligns with the tree. [~iamaleksey] - The use case here is that a client machine has a pile of delimited files that need to be loaded in bulk to Cassandra - a common use case we see. In the Spark-based tool, you would have to have Spark on the client (perhaps that's okay) but moreover it would be reading files from the local filesystem (or stdin), not from a distributed file system, so there would be no parallelism from Spark. I think that a Spark-based tool to do COPY for files in a distributed file system might make sense, but I don't think it applies for files on the one machine. Spark certainly also makes sense for the CREATE-TABLE-AS-SELECT (which is what CASSANDRA-8234 is about, not about loading). > Delimited File Bulk Loader > -- > > Key: CASSANDRA-9048 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9048 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Brian Hess > Fix For: 3.0 > > Attachments: CASSANDRA-9048.patch > > > There is a strong need for bulk loading data from delimited files into > Cassandra. Starting with delimited files means that the data is not > currently in the SSTable format, and therefore cannot immediately leverage > Cassandra's bulk loading tool, sstableloader, directly. > A tool supporting delimited files much closer matches the format of the data > more often than the SSTable format itself, and a tool that loads from > delimited files is very useful. > In order for this bulk loader to be more generally useful to customers, it > should handle a number of options at a minimum: > - support specifying the input file or to read the data from stdin (so other > command-line programs can pipe into the loader) > - supply the CQL schema for the input data > - support all data types other than collections (collections is a stretch > goal/need) > - an option to specify the delimiter > - an option to specify comma as the decimal delimiter (for international use > casese) > - an option to specify how NULL values are specified in the file (e.g., the > empty string or the string NULL) > - an option to specify how BOOLEAN values are specified in the file (e.g., > TRUE/FALSE or 0/1) > - an option to specify the Date and Time format > - an option to skip some number of rows at the beginning of the file > - an option to only read in some number of rows from the file > - an option to indicate how many parse errors to tolerate > - an option to specify a file that will contain all the lines that did not > parse correctly (up to the maximum number of parse errors) > - an option to specify the CQL port to connect to (with 9042 as the default). > Additional options would be useful, but this set of options/features is a > start. > A word on COPY. COPY comes via CQLSH which requires the client to be the > same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra, > etc). This tool should be able to connect to any version of Cassandra > (within reason). For example, it should be able to handle 2.0.x and 2.1.x. > Moreover, CQLSH's COPY command does not support a number of the options > above. Lastly, the performance of COPY in 2.0.x is not high enough to be > considered a bulk ingest tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9048) Delimited File Bulk Loader
[ https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382474#comment-14382474 ] Brian Hess commented on CASSANDRA-9048: I have created a version of this as a Java program via executeAsync(). Some testing has shown that for bulk writing to Cassandra, if you are starting with delimited files (not SSTables), that Java's executeAsync() is more efficient/performant than creating SSTables and then calling sstableloader. This implementation provides for the options above, as well as a way to specify the parallelism of the asynchronous writing (the number of futures "in flight"). In addition to the Java implementation, I created a command-line utility a la cassandra-stress called cassandra-loader to invoke the Java classes with the appropriate CLASSPATH. As such, I also modified build.xml and tools/bin/cassandra.in.sh as appropriate. The patch is attached for review. The command-line usage statement is: {{Usage: -f -host -schema [OPTIONS] OPTIONS: -delim Delimiter to use [,] -delmInQuotes true Set to 'true' if delimiter can be inside quoted fields [false] -dateFormat Date format [default for Locale.ENGLISH] -nullStringString that signifies NULL [none] -skipRowsNumber of rows to skip [0] -maxRows Maximum number of rows to read (-1 means all) [-1] -maxErrors Maximum errors to endure [10] -badFile Filename for where to place badly parsed rows. [none] -port CQL Port Number [9042] -numFuturesNumber of CQL futures to keep in flight [1000] -decimalDelimDecimal delimiter [.] Other option is ',' -boolStyleStyle for booleans [TRUE_FALSE] }} > Delimited File Bulk Loader > -- > > Key: CASSANDRA-9048 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9048 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Brian Hess > Attachments: CASSANDRA-9048.patch > > > There is a strong need for bulk loading data from delimited files into > Cassandra. Starting with delimited files means that the data is not > currently in the SSTable format, and therefore cannot immediately leverage > Cassandra's bulk loading tool, sstableloader, directly. > A tool supporting delimited files much closer matches the format of the data > more often than the SSTable format itself, and a tool that loads from > delimited files is very useful. > In order for this bulk loader to be more generally useful to customers, it > should handle a number of options at a minimum: > - support specifying the input file or to read the data from stdin (so other > command-line programs can pipe into the loader) > - supply the CQL schema for the input data > - support all data types other than collections (collections is a stretch > goal/need) > - an option to specify the delimiter > - an option to specify comma as the decimal delimiter (for international use > casese) > - an option to specify how NULL values are specified in the file (e.g., the > empty string or the string NULL) > - an option to specify how BOOLEAN values are specified in the file (e.g., > TRUE/FALSE or 0/1) > - an option to specify the Date and Time format > - an option to skip some number of rows at the beginning of the file > - an option to only read in some number of rows from the file > - an option to indicate how many parse errors to tolerate > - an option to specify a file that will contain all the lines that did not > parse correctly (up to the maximum number of parse errors) > - an option to specify the CQL port to connect to (with 9042 as the default). > Additional options would be useful, but this set of options/features is a > start. > A word on COPY. COPY comes via CQLSH which requires the client to be the > same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra, > etc). This tool should be able to connect to any version of Cassandra > (within reason). For example, it should be able to handle 2.0.x and 2.1.x. > Moreover, CQLSH's COPY command does not support a number of the options > above. Lastly, the performance of COPY in 2.0.x is not high enough to be > considered a bulk ingest tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)