[jira] [Commented] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200445#comment-14200445 ] Todd Nine commented on CASSANDRA-8257: -- Already done. The opscenter tag is on the issue in this ticket. http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster > Opscenter Agent does not properly download target cassandra cluster > --- > > Key: CASSANDRA-8257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8257 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Opscenter 5.0.1, Cassandra 1.2.19 >Reporter: Todd Nine > > Rather than re-post the issue, it is outlined here. > http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster. > Note that when omitting the target Cassandra cluster, using the same cluster > as the agent works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200435#comment-14200435 ] Todd Nine edited comment on CASSANDRA-8257 at 11/6/14 4:53 PM: --- [~brandon.williams] I would love to, where is it? I'm assuming it's here. https://datastax.jira.com But I can't actually get in to create an issue. was (Author: tnine): [~brandon.williams] I would love to, where is it? > Opscenter Agent does not properly download target cassandra cluster > --- > > Key: CASSANDRA-8257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8257 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Opscenter 5.0.1, Cassandra 1.2.19 >Reporter: Todd Nine > > Rather than re-post the issue, it is outlined here. > http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster. > Note that when omitting the target Cassandra cluster, using the same cluster > as the agent works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200435#comment-14200435 ] Todd Nine commented on CASSANDRA-8257: -- [~brandon.williams] I would love to, where is it? > Opscenter Agent does not properly download target cassandra cluster > --- > > Key: CASSANDRA-8257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8257 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Opscenter 5.0.1, Cassandra 1.2.19 >Reporter: Todd Nine > > Rather than re-post the issue, it is outlined here. > http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster. > Note that when omitting the target Cassandra cluster, using the same cluster > as the agent works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-8257: - Environment: Opscenter 5.0.1, Cassandra 1.2.19 (was: Opscenter 5.0, Cassandra 1.2.19) > Opscenter Agent does not properly download target cassandra cluster > --- > > Key: CASSANDRA-8257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8257 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Opscenter 5.0.1, Cassandra 1.2.19 >Reporter: Todd Nine > > Rather than re-post the issue, it is outlined here. > http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster. > Note that when omitting the target Cassandra cluster, using the same cluster > as the agent works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster
Todd Nine created CASSANDRA-8257: Summary: Opscenter Agent does not properly download target cassandra cluster Key: CASSANDRA-8257 URL: https://issues.apache.org/jira/browse/CASSANDRA-8257 Project: Cassandra Issue Type: Bug Components: Tools Environment: Opscenter 5.0, Cassandra 1.2.19 Reporter: Todd Nine Rather than re-post the issue, it is outlined here. http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster. Note that when omitting the target Cassandra cluster, using the same cluster as the agent works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7855) Genralize use of IN for compound partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117519#comment-14117519 ] Todd Nine edited comment on CASSANDRA-7855 at 9/1/14 4:13 PM: -- I would argue that the syntax should always have the following format. {code} SELECT * FROM foo WHERE (k1, k2) IN ( (0, 1) , (1, 2) ) {code} Simply because in the use case provided in my ticket, you know all possible combinations of fields to construct partition keys. By grouping them together within the parenthesis, it is clear to both the user and the grammar that all terms within the parens comprise a partition key. Visually is it clear that (0, 1) is a partition key, as is (1, 2) was (Author: tnine): I would argue that the syntax should always have the following format. {code} SELECT * FROM foo WHERE (k1, k2) IN ( (0, 1) , (1, 2) ) {code} Simply because in the use case provided in my ticket, you know all possible combinations of fields to construct partition keys. By grouping them together within the parenthesis, it is clear to both the user and the grammar that all terms within the parens comprise a partition key. By reading the above it is clear that (0, 1) is a partition key, as is (1, 2) > Genralize use of IN for compound partition keys > --- > > Key: CASSANDRA-7855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7855 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Priority: Minor > Labels: cql > Fix For: 2.0.11 > > > When you have a compount partition key, we currently only support to have a > {{IN}} on the last column of that partition key. So given: > {noformat} > CREATE TABLE foo ( > k1 int, > k2 int, > v int, > PRIMARY KEY ((k1, k2)) > ) > {noformat} > we allow > {noformat} > SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2) > {noformat} > but not > {noformat} > SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2) > {noformat} > There is no particular reason for us not supporting the later (to the best of > my knowledge) since it's reasonably straighforward, so we should fix it. > I'll note that using {{IN}} on a partition key is not necessarily a better > idea than parallelizing queries server client side so this syntax, when > introduced, should probably be used sparingly, but given we do support IN on > partition keys, I see no reason not to extend it to compound PK properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7855) Genralize use of IN for compound partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117519#comment-14117519 ] Todd Nine commented on CASSANDRA-7855: -- I would argue that the syntax should always have the following format. {code} SELECT * FROM foo WHERE (k1, k2) IN ( (0, 1) , (1, 2) ) {code} Simply because in the use case provided in my ticket, you know all possible combinations of fields to construct partition keys. By grouping them together within the parenthesis, it is clear to both the user and the grammar that all terms within the parens comprise a partition key. By reading the above it is clear that (0, 1) is a partition key, as is (1, 2) > Genralize use of IN for compound partition keys > --- > > Key: CASSANDRA-7855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7855 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Priority: Minor > Labels: cql > Fix For: 2.0.11 > > > When you have a compount partition key, we currently only support to have a > {{IN}} on the last column of that partition key. So given: > {noformat} > CREATE TABLE foo ( > k1 int, > k2 int, > v int, > PRIMARY KEY ((k1, k2)) > ) > {noformat} > we allow > {noformat} > SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2) > {noformat} > but not > {noformat} > SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2) > {noformat} > There is no particular reason for us not supporting the later (to the best of > my knowledge) since it's reasonably straighforward, so we should fix it. > I'll note that using {{IN}} on a partition key is not necessarily a better > idea than parallelizing queries server client side so this syntax, when > introduced, should probably be used sparingly, but given we do support IN on > partition keys, I see no reason not to extend it to compound PK properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7854) Unable to select partition keys directly using IN keyword (no replacement for multi row multiget in thrift)
Todd Nine created CASSANDRA-7854: Summary: Unable to select partition keys directly using IN keyword (no replacement for multi row multiget in thrift) Key: CASSANDRA-7854 URL: https://issues.apache.org/jira/browse/CASSANDRA-7854 Project: Cassandra Issue Type: Bug Reporter: Todd Nine We're converting some old thrift CF's to CQL. We aren't looking to change the underlying physical structure, since this has proven effective in production. In order to migrate, we need full select via multi equivalent. In thrift, the format was as follows. (scopeId, scopeType, nodeId, nodeType){ 0x00, timestamp } Where we have deliberately designed only 1 column per row. To translate this to CQL, I have defined the following table. {code} CREATE TABLE Graph_Marked_Nodes ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, timestamp bigint, PRIMARY KEY(scopeId, scopeType, nodeId, nodeType) ) {code} I then try to select using the IN keyword. {code} select timestamp from Graph_Marked_Nodes WHERE (scopeId , scopeType , nodeId , nodeType) IN ( (5a391596-3181-11e4-a87e-600308a690e2, 'organization', 5a3a2708-3181-11e4-a87e-600308a690e2 ,'test' ),(5a391596-3181-11e4-a87e-600308a690e2, 'organization', 5a3a2709-3181-11e4-a87e-600308a690e2 ,'test' ),(5a391596-3181-11e4-a87e-600308a690e2, 'organization', 5a39fff7-3181-11e4-a87e-600308a690e2 ,'test' ) ) {code} Which results in the following stack trace {code} Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Multi-column relations can only be applied to clustering columns: scopeid at com.datastax.driver.core.Responses$Error.asException(Responses.java:97) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235) at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:367) at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:584) {code} This is still possible via the thrift API. Apologies in advance if I've filed this erroneously. I can't find any examples of this type of query anywhere. Note that our size grows far too large to fit in a single physical partition (row) if we use only scopeId and scopeType, so we need all 4 data elements to be part of our partition key to ensure we have the distribution we need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5062) Support CAS
[ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531169#comment-13531169 ] Todd Nine commented on CASSANDRA-5062: -- Can you elaborate on your timeout question with an example? I think we're on the same page with this, but wanted to be sure. There's currently a small window in which a client can think it has a lock, when it actually doesn't (timeout/2). This is due to not having any way for the lock to receive a notification when it's column has reached it's ttl and is removed because the lock heartbeat failed. > Support CAS > --- > > Key: CASSANDRA-5062 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5062 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Jonathan Ellis > Fix For: 2.0 > > > "Strong" consistency is not enough to prevent race conditions. The classic > example is user account creation: we want to ensure usernames are unique, so > we only want to signal account creation success if nobody else has created > the account yet. But naive read-then-write allows clients to race and both > think they have a green light to create. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-4771) Setting TTL to Integer.MAX causes columns to not be persisted.
Todd Nine created CASSANDRA-4771: Summary: Setting TTL to Integer.MAX causes columns to not be persisted. Key: CASSANDRA-4771 URL: https://issues.apache.org/jira/browse/CASSANDRA-4771 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.12 Reporter: Todd Nine Priority: Blocker When inserting columns via batch mutation, we have an edge case where columns will be set to Integer.MAX. When setting the column expiration time to Integer.MAX, the columns do not appear to be persisted. Fails: Integer.MAX Integer.MAX/2 Works: Integer.MAX/3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4771) Setting TTL to Integer.MAX causes columns to not be persisted.
[ https://issues.apache.org/jira/browse/CASSANDRA-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-4771: - Description: When inserting columns via batch mutation, we have an edge case where columns will be set to Integer.MAX. When setting the column expiration time to Integer.MAX, the columns do not appear to be persisted. Fails: Integer.MAX_VALUE Integer.MAX_VALUE/2 Works: Integer.MAX_VALUE/3 was: When inserting columns via batch mutation, we have an edge case where columns will be set to Integer.MAX. When setting the column expiration time to Integer.MAX, the columns do not appear to be persisted. Fails: Integer.MAX Integer.MAX/2 Works: Integer.MAX/3 > Setting TTL to Integer.MAX causes columns to not be persisted. > -- > > Key: CASSANDRA-4771 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4771 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.12 >Reporter: Todd Nine >Priority: Blocker > > When inserting columns via batch mutation, we have an edge case where columns > will be set to Integer.MAX. When setting the column expiration time to > Integer.MAX, the columns do not appear to be persisted. > Fails: > Integer.MAX_VALUE > Integer.MAX_VALUE/2 > Works: > Integer.MAX_VALUE/3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3228) Add new range scan with clock
[ https://issues.apache.org/jira/browse/CASSANDRA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-3228: - Summary: Add new range scan with clock (was: Add new range scan with optional clock) > Add new range scan with clock > - > > Key: CASSANDRA-3228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3228 > Project: Cassandra > Issue Type: New Feature > Components: Core >Affects Versions: 0.8.5 >Reporter: Todd Nine >Priority: Minor > > Currently, it is not possible to specify minimum clock time on columns when > performing range scans. In some situations, such as custom migration or > batch processing, it would be helpful to allow the client to specify a > minimum clock time. This would only return columns with a clock value >= the > specified. > I.E > range scan (rowKey, startVal, endVal, revered, min clock) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3228) Add new range scan with optional clock
Add new range scan with optional clock -- Key: CASSANDRA-3228 URL: https://issues.apache.org/jira/browse/CASSANDRA-3228 Project: Cassandra Issue Type: New Feature Components: Core Affects Versions: 0.8.5 Reporter: Todd Nine Priority: Minor Currently, it is not possible to specify minimum clock time on columns when performing range scans. In some situations, such as custom migration or batch processing, it would be helpful to allow the client to specify a minimum clock time. This would only return columns with a clock value >= the specified. I.E range scan (rowKey, startVal, endVal, revered, min clock) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3209) CLI does not display error when it is not possible to create a keyspace when schemas in cluster do not agree.
[ https://issues.apache.org/jira/browse/CASSANDRA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105038#comment-13105038 ] Todd Nine commented on CASSANDRA-3209: -- Unfortunately this was on a testing cluster which has since been shut down, so I don't have the system.log output. > CLI does not display error when it is not possible to create a keyspace when > schemas in cluster do not agree. > - > > Key: CASSANDRA-3209 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3209 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 0.8.1 > Environment: Latest brisk beta 2 deb on Ubuntu 10.10 server 64 bit. > (Using chef recipe to install) >Reporter: Todd Nine >Assignee: Pavel Yaskevich > Labels: cli > > Cluster: > 3 nodes. 2 online, 1 offline > describe cluster; displays 2 schema versions. 2 nodes are on 1 version, a > single node is on a different version. > Issue this command in the CLI. > create keyspace TestKeyspace with placement_strategy = > 'org.apache.cassandra.locator.NetworkTopologyStrategy' and > strategy_options=[{Brisk:3, Cassandra:0}]; > What should happen. > An error should be displayed when the keyspace cannot be created. > What actually happens. > The user is presented with "null" as the output. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3209) CLI does not display error when it is not possible to create a keyspace when schemas in cluster do not agree.
CLI does not display error when it is not possible to create a keyspace when schemas in cluster do not agree. - Key: CASSANDRA-3209 URL: https://issues.apache.org/jira/browse/CASSANDRA-3209 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.1 Environment: Latest brisk beta 2 deb on Ubuntu 10.10 server 64 bit. (Using chef recipe to install) Reporter: Todd Nine Cluster: 3 nodes. 2 online, 1 offline describe cluster; displays 2 schema versions. 2 nodes are on 1 version, a single node is on a different version. Issue this command in the CLI. create keyspace TestKeyspace with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{Brisk:3, Cassandra:0}]; What should happen. An error should be displayed when the keyspace cannot be created. What actually happens. The user is presented with "null" as the output. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094116#comment-13094116 ] Todd Nine commented on CASSANDRA-2915: -- I agree that order by could be a performance killer for large data sets. In large data sets I think that users should make use of de-normalization and create their own secondary index for efficient querying. However, on small data sets, which seem to be very common in web systems (ours is about 80% of the data a user sees), order by semantics are very important. Most of our data the user sees has a very small result set, < 100 rows. I think explicitly prohibiting these features limit the user too much. Shouldn't they be supported and ultimately it is up to the user to determine which approach they take in implementing index for their data? > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093344#comment-13093344 ] Todd Nine edited comment on CASSANDRA-2915 at 8/30/11 2:13 AM: --- I think forcing users to install classes for common use cases would cause issues with adoption. What about creating new CQL commands to handle this? When creating an index in a db, you would define the fields and the manner in which they are indexed. Could we do something like the following? create index on [colname] in [colfamily] using [index type 1] as [indexFieldName], [index type 2] as [indexFieldName], [index type n] as [indexFieldName]? drop index [indexFieldName] in [colfamily] on [colname] This way clients such as JPA can update and create indexes, without the need to install custom classes on Cassandra itself. They also have the ability to directly reference the field name when using CQL queries. Assuming that the index class types exist in the Lucene classpath, you get the 1 to many mappings for column to indexing strategy. This would allow more advanced clients such as the JPA plugin to automatically add indexes to the document based on indexes defined on persistent fields, without generating any code the user has to install in the Cassandra runtime. If users want to install custom analyzers, they still have the option to do so, and would gain access to it via CQL. was (Author: tnine): I think forcing users to install classes for common use cases would cause issues with adoption. What about creating new CQL commands to handle this? When creating an index in a db, you would define the fields and the manner in which they are indexed. Could we do something like the following? create index [colname] in [colfamily] using [index type 1] as [indexFieldName], [index type 2] as [indexFieldName], [index type n] as [indexFieldName]? drop index [indexFieldName] in [colfamily] on [colname] This way clients such as JPA can update and create indexes, without the need to install custom classes on Cassandra itself. They also have the ability to directly reference the field name when using CQL queries. Assuming that the index class types exist in the Lucene classpath, you get the 1 to many mappings for column to indexing strategy. This would allow more advanced clients such as the JPA plugin to automatically add indexes to the document based on indexes defined on persistent fields, without generating any code the user has to install in the Cassandra runtime. If users want to install custom analyzers, they still have the option to do so, and would gain access to it via CQL. > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093344#comment-13093344 ] Todd Nine commented on CASSANDRA-2915: -- I think forcing users to install classes for common use cases would cause issues with adoption. What about creating new CQL commands to handle this? When creating an index in a db, you would define the fields and the manner in which they are indexed. Could we do something like the following? create index [colname] in [colfamily] using [index type 1] as [indexFieldName], [index type 2] as [indexFieldName], [index type n] as [indexFieldName]? drop index [indexFieldName] in [colfamily] on [colname] This way clients such as JPA can update and create indexes, without the need to install custom classes on Cassandra itself. They also have the ability to directly reference the field name when using CQL queries. Assuming that the index class types exist in the Lucene classpath, you get the 1 to many mappings for column to indexing strategy. This would allow more advanced clients such as the JPA plugin to automatically add indexes to the document based on indexes defined on persistent fields, without generating any code the user has to install in the Cassandra runtime. If users want to install custom analyzers, they still have the option to do so, and would gain access to it via CQL. > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093263#comment-13093263 ] Todd Nine commented on CASSANDRA-2915: -- Could we also use this feature as a standard way for building our lucene documents? This would accomplish what we want, as well as giving a hook for more user functionality. CASSANDRA-1311 > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092606#comment-13092606 ] Todd Nine edited comment on CASSANDRA-2915 at 8/29/11 4:30 AM: --- I don't necessarily think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Has anyone looked at existing code in ElasticSearch to avoid some of the pitfalls they have already experienced in building something similar? http://www.elasticsearch.org/ Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? was (Author: tnine): I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Has anyone looked at existing code in ElasticSearch to avoid some of the pitfalls they have already experienced in building something similar? http://www.elasticsearch.org/ Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092606#comment-13092606 ] Todd Nine edited comment on CASSANDRA-2915 at 8/29/11 4:29 AM: --- I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Has anyone looked at existing code in ElasticSearch to avoid some of the pitfalls they have already experienced in building something similar? http://www.elasticsearch.org/ Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? was (Author: tnine): I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092606#comment-13092606 ] Todd Nine commented on CASSANDRA-2915: -- I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani >Assignee: Jason Rutherglen > Labels: secondary_index > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1598) Add Boolean Expression to secondary querying
[ https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1598: - Issue Type: New Feature (was: Sub-task) Parent: (was: CASSANDRA-2915) > Add Boolean Expression to secondary querying > > > Key: CASSANDRA-1598 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1598 > Project: Cassandra > Issue Type: New Feature > Components: API >Affects Versions: 0.7 beta 3 >Reporter: Todd Nine > > Add boolean operators similar to Lucene style searches. Currently there is > implicit support for the && operator. It would be helpful to also add > support for ||/Union operators. I would envision this as the client would be > required to construct the expression tree and pass it via the thrift > interface. > BooleanExpression --> BooleanOrIndexExpression > --> BooleanOperator > --> BooleanOrIndexExpression > I'd like to take a crack at this since it will greatly improve my Datanucleus > plugin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1599) Add sort/order support for secondary indexing
[ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1599: - Issue Type: New Feature (was: Sub-task) Parent: (was: CASSANDRA-2915) > Add sort/order support for secondary indexing > - > > Key: CASSANDRA-1599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1599 > Project: Cassandra > Issue Type: New Feature > Components: API >Reporter: Todd Nine >Assignee: Jonathan Ellis > Original Estimate: 32h > Remaining Estimate: 32h > > For a lot of users paging is a standard use case on many web applications. > It would be nice to allow paging as part of a Boolean Expression. > Page -> start index >-> end index >-> page timestamp >-> Sort Order > When sorting, is it possible to sort both ASC and DESC? > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1598) Add Boolean Expression to secondary querying
[ https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1598: - Issue Type: Sub-task (was: New Feature) Parent: CASSANDRA-2915 > Add Boolean Expression to secondary querying > > > Key: CASSANDRA-1598 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1598 > Project: Cassandra > Issue Type: Sub-task > Components: API >Affects Versions: 0.7 beta 3 >Reporter: Todd Nine > > Add boolean operators similar to Lucene style searches. Currently there is > implicit support for the && operator. It would be helpful to also add > support for ||/Union operators. I would envision this as the client would be > required to construct the expression tree and pass it via the thrift > interface. > BooleanExpression --> BooleanOrIndexExpression > --> BooleanOperator > --> BooleanOrIndexExpression > I'd like to take a crack at this since it will greatly improve my Datanucleus > plugin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1599) Add sort/order support for secondary indexing
[ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1599: - Issue Type: Sub-task (was: New Feature) Parent: CASSANDRA-2915 > Add sort/order support for secondary indexing > - > > Key: CASSANDRA-1599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1599 > Project: Cassandra > Issue Type: Sub-task > Components: API >Reporter: Todd Nine >Assignee: Jonathan Ellis > Original Estimate: 32h > Remaining Estimate: 32h > > For a lot of users paging is a standard use case on many web applications. > It would be nice to allow paging as part of a Boolean Expression. > Page -> start index >-> end index >-> page timestamp >-> Sort Order > When sorting, is it possible to sort both ASC and DESC? > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079741#comment-13079741 ] Todd Nine commented on CASSANDRA-2915: -- A couple questions. 1. Will read after write be available? I.E if your mutation for the row key returns to the client, then the row now has an entry in the Lucence index, which can immediately be queried to return the results. 2. What about durability, in the event cassandra crashes, will the Lucene index retain these indexed values, or will they be lost if commit is not invoked on the index? > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani > Labels: secondary_index > Fix For: 1.0 > > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079695#comment-13079695 ] Todd Nine commented on CASSANDRA-2915: -- I'm quite keen to contribute on this issue, as this will greatly enhance the functionality of the hector-jpa project. If I can contribute any work, please let me know. > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani > Labels: secondary_index > Fix For: 1.0 > > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079653#comment-13079653 ] Todd Nine edited comment on CASSANDRA-2915 at 8/4/11 10:33 PM: --- Hey guys. We're doing something similar in the hector JPA plugin. Would using dynamic composites within cassandra alleviate the need for Lucene documents? We're using this in secondary indexing and it gives us order by semantics and AND (Union). The largest issue becomes iteration with OR clauses, AND clauses can be compressed into a single column for efficient range scans, we then use iterators to UNION the OR trees together with order clauses in the composites. The caveat is that the user must define indexes with order semantics up front. However this can easily be added to the existing secondary indexing clauses. was (Author: tnine): Hey guys. We're doing something similar in the hector JPA plugin. Would using dynamic composites within cassandra alleviate the need for Lucene documents? We're using this in secondary indexing and it gives us order by semantics and AND (Union). The largest issue becomes iteration with OR clauses, AND clauses can be compressed into a single column for efficient range scans, we then use iterators to UNION the OR trees together with order clauses in the composites. > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani > Labels: secondary_index > Fix For: 1.0 > > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079653#comment-13079653 ] Todd Nine commented on CASSANDRA-2915: -- Hey guys. We're doing something similar in the hector JPA plugin. Would using dynamic composites within cassandra alleviate the need for Lucene documents? We're using this in secondary indexing and it gives us order by semantics and AND (Union). The largest issue becomes iteration with OR clauses, AND clauses can be compressed into a single column for efficient range scans, we then use iterators to UNION the OR trees together with order clauses in the composites. > Lucene based Secondary Indexes > -- > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: T Jake Luciani > Labels: secondary_index > Fix For: 1.0 > > > Secondary indexes (of type KEYS) suffer from a number of limitations in their > current form: >- Multiple IndexClauses only work when there is a subset of rows under the > highest clause >- One new column family is created per index this means 10 new CFs for 10 > secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one > index per CF, and utilize the Lucene query engine to handle multiple index > clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync > our memtable flushes to lucene flushes. Lucene also has optimize() which > correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the > data can be stored properly, the big win in once this is done we can perform > complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since > documents in Lucene are written as complete documents. For random workloads > with lot's of indexed columns this means we need to read the document from > the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal
[ https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015273#comment-13015273 ] Todd Nine commented on CASSANDRA-2231: -- Hi Sylvain, I seem to have encountered a bug in the comparator. I'm using the composite to perform Cassandra based intersections of fields during queries. For instance, say a user defines an index as this. status + unitId. The write would always contain values of +0++0 when using the -1 0 and 1 fields. If the user enters a query such as this status > 100 && status < 300 && unitId = 10 I would need to construct a column scan of the following to get correct result sets. start => 100+1+10+0 end => 300+0+10+1 However on in the validate function I'm receiving the error "Invalid bytes remaining after an end-of-component at component". This seems incorrect to me. We're ultimately attempting to apply any equality operand and transform it to a range scan for the given field in the composite. This means that -1, or 1 could appear after any component in the composite, not just the last one. Can you please add this functionality/remove this verification check? Thanks, Todd > Add CompositeType comparer to the comparers provided in > org.apache.cassandra.db.marshal > --- > > Key: CASSANDRA-2231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2231 > Project: Cassandra > Issue Type: Improvement > Components: Contrib >Affects Versions: 0.7.3 >Reporter: Ed Anuff >Assignee: Sylvain Lebresne >Priority: Minor > Fix For: 0.7.5 > > Attachments: CompositeType-and-DynamicCompositeType.patch, > edanuff-CassandraCompositeType-1e253c4.zip > > > CompositeType is a custom comparer that makes it possible to create > comparable composite values out of the basic types that Cassandra currently > supports, such as Long, UUID, etc. This is very useful in both the creation > of custom inverted indexes using columns in a skinny row, where each column > name is a composite value, and also when using Cassandra's built-in secondary > index support, where it can be used to encode the values in the columns that > Cassandra indexes. One scenario for the usage of these is documented here: > http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html. Source for > contribution is attached and has been previously maintained on github here: > https://github.com/edanuff/CassandraCompositeType -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal
[ https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008302#comment-13008302 ] Todd Nine commented on CASSANDRA-2231: -- Since this is primarily needed for ordering well defined queries and collections, can we add a bitset to the composite type to represent sort ordering? A lot of queries need the semantics of "order by logindate desc, firstname asc, lastname asc". This would give is the ability to set the descending flag on any one of the composite types allowing us to always have a correctly order result set stored in the column name. In this case I would simply set the bit 0 to signal to the composite it needs to order the first field descending instead of the standard ascending. > Add CompositeType comparer to the comparers provided in > org.apache.cassandra.db.marshal > --- > > Key: CASSANDRA-2231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2231 > Project: Cassandra > Issue Type: Improvement > Components: Contrib >Affects Versions: 0.7.3 >Reporter: Ed Anuff >Priority: Minor > Attachments: 0001-Add-compositeType-and-DynamicCompositeType.patch, > 0001-Add-compositeType.patch, edanuff-CassandraCompositeType-1e253c4.zip > > > CompositeType is a custom comparer that makes it possible to create > comparable composite values out of the basic types that Cassandra currently > supports, such as Long, UUID, etc. This is very useful in both the creation > of custom inverted indexes using columns in a skinny row, where each column > name is a composite value, and also when using Cassandra's built-in secondary > index support, where it can be used to encode the values in the columns that > Cassandra indexes. One scenario for the usage of these is documented here: > http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html. Source for > contribution is attached and has been previously maintained on github here: > https://github.com/edanuff/CassandraCompositeType -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal
[ https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001770#comment-13001770 ] Todd Nine commented on CASSANDRA-2231: -- As per Ed's previous comment. I use over 30 different indexing schemes on our data currently in my JDO code. The ultimate goal for this feature is to support the implementation of a JPA framework that works similarly to GAE. Having the ability to build the indexes that a user specifies without dynamically creating CFs is a must have for us. There are a lot of issues surrounding the complexity of building the index itself in the plugin that are outside the scope of this issue. However we don't really have a comparator mechanism to support these types of indexes. In all use cases we defined, our searches and therefore indexes need an order by clause as well as a query criteria to support the paging that most applications will require. This order could simply be a natural ordering of entity keys, or it could be on specific properties of the related entity. As applications grow in size, so will the complexity and number of indexes to support it, I'm concerned creating this many CFs could cause serious issues. This doesn't have to be a well advertised feature for the end user to create CFs with, but I feel very strongly that a dynamic type for CF's is a must in order to proceed with the JPA plugin we've been designing. > Add CompositeType comparer to the comparers provided in > org.apache.cassandra.db.marshal > --- > > Key: CASSANDRA-2231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2231 > Project: Cassandra > Issue Type: Improvement > Components: Contrib >Affects Versions: 0.7.3 >Reporter: Ed Anuff >Priority: Minor > Attachments: 0001-Add-compositeType.patch, > edanuff-CassandraCompositeType-1e253c4.zip > > > CompositeType is a custom comparer that makes it possible to create > comparable composite values out of the basic types that Cassandra currently > supports, such as Long, UUID, etc. This is very useful in both the creation > of custom inverted indexes using columns in a skinny row, where each column > name is a composite value, and also when using Cassandra's built-in secondary > index support, where it can be used to encode the values in the columns that > Cassandra indexes. One scenario for the usage of these is documented here: > http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html. Source for > contribution is attached and has been previously maintained on github here: > https://github.com/edanuff/CassandraCompositeType -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal
[ https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001656#comment-13001656 ] Todd Nine commented on CASSANDRA-2231: -- I do see your point. However, as a user and a developer of a JPA plugin take this use case for an example of our requriements Entities: User -> Vehicles A User has a collection of vehicles. These can be ordered in several ways. For this example, lets say by time created and name. This would lead to several different column comparators. For properties on the user, first name, last name, these would be UTF8 columns, for the collections, these would be composite. I would end up with columns of the following definitions to allow us to quickly load or them. UTF8(firstname):value UTF8(lastname):value UTF8(vehicletime) LONG(vehicleTime) TIMEUUID(VEHICLEID) UTF8(VehicleProp) : value UTF8(vehiclename) UTF8(vehicleName) TIMEUUID(VEHICLEID) UTF8(VehicpeProp) : value In the event we move this index to an external CF, we lose the serialization scope and guaranteed atomic write of using a single row key during a mutation. While this is a separate issue as on Cassandra in general, we do need the ability to work around this until serialization scope can extend over multiple rows. Without this, to ensure that "lost writes" don't occur on indexing, a user would need to introduce an extra system such as Zookeeper. https://issues.apache.org/jira/browse/CASSANDRA-1684 > Add CompositeType comparer to the comparers provided in > org.apache.cassandra.db.marshal > --- > > Key: CASSANDRA-2231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2231 > Project: Cassandra > Issue Type: Improvement > Components: Contrib >Affects Versions: 0.7.3 >Reporter: Ed Anuff >Priority: Minor > Attachments: 0001-Add-compositeType.patch, > edanuff-CassandraCompositeType-1e253c4.zip > > > CompositeType is a custom comparer that makes it possible to create > comparable composite values out of the basic types that Cassandra currently > supports, such as Long, UUID, etc. This is very useful in both the creation > of custom inverted indexes using columns in a skinny row, where each column > name is a composite value, and also when using Cassandra's built-in secondary > index support, where it can be used to encode the values in the columns that > Cassandra indexes. One scenario for the usage of these is documented here: > http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html. Source for > contribution is attached and has been previously maintained on github here: > https://github.com/edanuff/CassandraCompositeType -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal
[ https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001615#comment-13001615 ] Todd Nine commented on CASSANDRA-2231: -- Enforcing all columns to be the same would break our indexing. Each row key is a different index, and the columns within that index are composed of different composite types. If this was enforced at the CF level, we would require a different CF for each index. Is it possible to allow both static and dynamic types by creating 2 composite index types? > Add CompositeType comparer to the comparers provided in > org.apache.cassandra.db.marshal > --- > > Key: CASSANDRA-2231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2231 > Project: Cassandra > Issue Type: Improvement > Components: Contrib >Affects Versions: 0.7.3 >Reporter: Ed Anuff >Priority: Minor > Attachments: 0001-Add-compositeType.patch, > edanuff-CassandraCompositeType-1e253c4.zip > > > CompositeType is a custom comparer that makes it possible to create > comparable composite values out of the basic types that Cassandra currently > supports, such as Long, UUID, etc. This is very useful in both the creation > of custom inverted indexes using columns in a skinny row, where each column > name is a composite value, and also when using Cassandra's built-in secondary > index support, where it can be used to encode the values in the columns that > Cassandra indexes. One scenario for the usage of these is documented here: > http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html. Source for > contribution is attached and has been previously maintained on github here: > https://github.com/edanuff/CassandraCompositeType -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal
[ https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001615#comment-13001615 ] Todd Nine edited comment on CASSANDRA-2231 at 3/2/11 8:42 PM: -- Enforcing all columns to be the same would break our indexing. Each row key is a different index, and the columns within that index are composed of different composite types. If this was enforced at the CF level, we would require a different CF for each index. Is it possible to allow both static and dynamic types by creating 2 composite index types? I.E StaticComposite using your patch and DynamicComposite using Eds? was (Author: tnine): Enforcing all columns to be the same would break our indexing. Each row key is a different index, and the columns within that index are composed of different composite types. If this was enforced at the CF level, we would require a different CF for each index. Is it possible to allow both static and dynamic types by creating 2 composite index types? > Add CompositeType comparer to the comparers provided in > org.apache.cassandra.db.marshal > --- > > Key: CASSANDRA-2231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2231 > Project: Cassandra > Issue Type: Improvement > Components: Contrib >Affects Versions: 0.7.3 >Reporter: Ed Anuff >Priority: Minor > Attachments: 0001-Add-compositeType.patch, > edanuff-CassandraCompositeType-1e253c4.zip > > > CompositeType is a custom comparer that makes it possible to create > comparable composite values out of the basic types that Cassandra currently > supports, such as Long, UUID, etc. This is very useful in both the creation > of custom inverted indexes using columns in a skinny row, where each column > name is a composite value, and also when using Cassandra's built-in secondary > index support, where it can be used to encode the values in the columns that > Cassandra indexes. One scenario for the usage of these is documented here: > http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html. Source for > contribution is attached and has been previously maintained on github here: > https://github.com/edanuff/CassandraCompositeType -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2179) Secondary indexing of columns with duplicate values does not return all row keys
[ https://issues.apache.org/jira/browse/CASSANDRA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-2179: - Attachment: test.patch Please ignore previous patch, was incorrect version. > Secondary indexing of columns with duplicate values does not return all row > keys > > > Key: CASSANDRA-2179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2179 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 0.7.1 > Environment: Java 1.6 64 bit Ubuntu >Reporter: Todd Nine > Attachments: test.patch, test.patch > > > Create a CF test with a column "value" and "holder". Create an index on the > Value column of type UTF8 and a Keys index. Insert the following values into > the cf > new UUID():{ value: "test", holder: 0x00} > new UUID():{ value: "test", holder: 0x00} > new UUID():{ value: "test", holder: 0x00} > Query the secondary index where value EQ "test" and select column "holder" > You should be returned 3 rows. Instead you are returned one. It seems that > the last row written with the column value is the only row that is returned > when the column value contains duplicates. I'll attempt to create a python > client test that demonstrates the issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2179) Secondary indexing of columns with duplicate values does not return all row keys
[ https://issues.apache.org/jira/browse/CASSANDRA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-2179: - Attachment: test.patch Patch for test case > Secondary indexing of columns with duplicate values does not return all row > keys > > > Key: CASSANDRA-2179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2179 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 0.7.1 > Environment: Java 1.6 64 bit Ubuntu >Reporter: Todd Nine > Attachments: test.patch > > > Create a CF test with a column "value" and "holder". Create an index on the > Value column of type UTF8 and a Keys index. Insert the following values into > the cf > new UUID():{ value: "test", holder: 0x00} > new UUID():{ value: "test", holder: 0x00} > new UUID():{ value: "test", holder: 0x00} > Query the secondary index where value EQ "test" and select column "holder" > You should be returned 3 rows. Instead you are returned one. It seems that > the last row written with the column value is the only row that is returned > when the column value contains duplicates. I'll attempt to create a python > client test that demonstrates the issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2179) Secondary indexing of columns with duplicate values does not return all row keys
Secondary indexing of columns with duplicate values does not return all row keys Key: CASSANDRA-2179 URL: https://issues.apache.org/jira/browse/CASSANDRA-2179 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.1 Environment: Java 1.6 64 bit Ubuntu Reporter: Todd Nine Create a CF test with a column "value" and "holder". Create an index on the Value column of type UTF8 and a Keys index. Insert the following values into the cf new UUID():{ value: "test", holder: 0x00} new UUID():{ value: "test", holder: 0x00} new UUID():{ value: "test", holder: 0x00} Query the secondary index where value EQ "test" and select column "holder" You should be returned 3 rows. Instead you are returned one. It seems that the last row written with the column value is the only row that is returned when the column value contains duplicates. I'll attempt to create a python client test that demonstrates the issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-1646) Using QUORUM and replication factor of 1 now causes a timeout exception
[ https://issues.apache.org/jira/browse/CASSANDRA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1646: - Attachment: test_quorum.patch Patch file to update function tests. > Using QUORUM and replication factor of 1 now causes a timeout exception > --- > > Key: CASSANDRA-1646 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1646 > Project: Cassandra > Issue Type: Bug > Components: API >Affects Versions: 0.7 beta 2 >Reporter: Todd Nine > Fix For: 0.7.0 > > Attachments: test_quorum.patch > > > See the attached path to the python thrift tests. On the source from > 2010-10-14 20:00 UTC this passed. From the latest HEAD as of today this > fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1646) Using QUORUM and replication factor of 1 now causes a timeout exception
Using QUORUM and replication factor of 1 now causes a timeout exception --- Key: CASSANDRA-1646 URL: https://issues.apache.org/jira/browse/CASSANDRA-1646 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 0.7 beta 2 Reporter: Todd Nine Fix For: 0.7.0 See the attached path to the python thrift tests. On the source from 2010-10-14 20:00 UTC this passed. From the latest HEAD as of today this fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing
[ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919687#action_12919687 ] Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM: Consider a query similar to the following. email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). Which start key do I advance, one, both? As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses. While this is not impossible, this becomes very complex for large boolean operand trees. As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided. I'm not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications once union and intersections are implemented in Cassandra. was (Author: tnine): Consider a query similar to the following. email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). Which start key do I advance, one, both? As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses. While this is not impossible, this becomes very complex for large boolean operand trees. As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided. I'm not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications once union and intersections are implemented server side. > Add paging support for secondary indexing > - > > Key: CASSANDRA-1599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1599 > Project: Cassandra > Issue Type: New Feature >Reporter: Todd Nine > Fix For: 0.7.0 > > > For a lot of users paging is a standard use case on many web applications. > It would be nice to allow paging as part of a Boolean Expression. > Page -> start index >-> end index >-> page timestamp >-> Sort Order > When sorting, is it possible to sort both ASC and DESC? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing
[ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919687#action_12919687 ] Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM: Consider a query similar to the following. email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). Which start key do I advance, one, both? As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses. While this is not impossible, this becomes very complex for large boolean operand trees. As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided. I'm not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications once union and intersections are implemented server side. was (Author: tnine): Consider a query similar to the following. email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). Which start key do I advance, one, both? As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses. While this is not impossible, this becomes very complex for large boolean operand trees. As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided. Not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications. > Add paging support for secondary indexing > - > > Key: CASSANDRA-1599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1599 > Project: Cassandra > Issue Type: New Feature >Reporter: Todd Nine > Fix For: 0.7.0 > > > For a lot of users paging is a standard use case on many web applications. > It would be nice to allow paging as part of a Boolean Expression. > Page -> start index >-> end index >-> page timestamp >-> Sort Order > When sorting, is it possible to sort both ASC and DESC? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1599) Add paging support for secondary indexing
Add paging support for secondary indexing - Key: CASSANDRA-1599 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599 Project: Cassandra Issue Type: New Feature Reporter: Todd Nine Fix For: 0.7.0 For a lot of users paging is a standard use case on many web applications. It would be nice to allow paging as part of a Boolean Expression. Page -> start index -> end index -> page timestamp -> Sort Order When sorting, is it possible to sort both ASC and DESC? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1598) Add Boolean Expression to secondary querying
Add Boolean Expression to secondary querying Key: CASSANDRA-1598 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598 Project: Cassandra Issue Type: New Feature Components: Core Affects Versions: 0.7.0 Reporter: Todd Nine Add boolean operators similar to Lucene style searches. Currently there is implicit support for the && operator. It would be helpful to also add support for ||/Union operators. I would envision this as the client would be required to construct the expression tree and pass it via the thrift interface. BooleanExpression --> BooleanOrIndexExpression --> BooleanOperator --> BooleanOrIndexExpression I'd like to take a crack at this since it will greatly improve my Datanucleus plugin -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892487#action_12892487 ] Todd Nine commented on CASSANDRA-1235: -- I'm currently out of the office and will return on 2010-07-27. If this is an urgent request, please mail supp...@spidertracks.com. > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Critical > Fix For: 0.6.5 > > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886429#action_12886429 ] Todd Nine commented on CASSANDRA-1235: -- While I'm in agreement with Uwe, my bigger concern is that two tests that are functionally equivalent return different results based on the mutation operations. Performing a batch mutate with the same insertion data as a single write should insert and the same bytes. Unfortunately batch mutate appears to be randomly dropping bytes. If it were a true UTF8 issue, wouldn't it drop bytes on the single column writes as well batch mutate? > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Critical > Fix For: 0.6.4 > > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883261#action_12883261 ] Todd Nine edited comment on CASSANDRA-1235 at 6/28/10 3:42 PM: --- No worries, sorry about that, I just realized the affected version was incorrect. Where can I look to begin fixing this? Unfortunately this issue has caused our development to a halt since we depend on the functionality of numeric range queries in Lucene/Lucandra. Ideally I'd like to create a patch that applies to 0.6.2 so we can roll our own build with the patch and get running again. I'm assuming it's an issue with the thrift server, but I don't want to start tweaking things without a good idea on where I should be looking for this issue. Here's an example in hex. The left is what I pass as bytes in UTF-8 for the key, the right is what I get back during get_range_slice. http://pastebin.com/KM8Ze794 was (Author: tnine): No worries, sorry about that, I just realized the affected version was incorrect. Where can I look to begin fixing this? Unfortunately this issue has caused our development to a halt since we depend on the functionality of numeric range queries in Lucene/Lucandra. Ideally I'd like to create a patch that applies to 0.6.2 so we can roll our own build with the patch and get running again. I'm assuming it's an issue with the thrift server, but I don't want to start tweaking things without a good idea on where I should be looking for this issue. > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Critical > Fix For: 0.6.4 > > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883261#action_12883261 ] Todd Nine commented on CASSANDRA-1235: -- No worries, sorry about that, I just realized the affected version was incorrect. Where can I look to begin fixing this? Unfortunately this issue has caused our development to a halt since we depend on the functionality of numeric range queries in Lucene/Lucandra. Ideally I'd like to create a patch that applies to 0.6.2 so we can roll our own build with the patch and get running again. I'm assuming it's an issue with the thrift server, but I don't want to start tweaking things without a good idea on where I should be looking for this issue. > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Critical > Fix For: 0.6.4 > > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1235: - Priority: Blocker (was: Critical) > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6.2 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Blocker > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1235: - Affects Version/s: 0.6.2 (was: 0.6) > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6.2 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Blocker > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1235: - Fix Version/s: (was: 0.6.4) > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6.2 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Critical > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped - Key: CASSANDRA-1235 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 Project: Cassandra Issue Type: Bug Affects Versions: 0.6.2 Environment: Java 1.6 sun JDK Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, Ubuntu 10.04 64 bit Reporter: Todd Nine Priority: Blocker When running the two tests, individual column insert works with the values generated. However, batch insert with the same values causes an encoding failure on the key. It appears bytes are dropped from the end of the byte array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
[ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1235: - Attachment: TestEncodedKeys.java This file demonstrates the broken input. Notice that the first test passes with clean input. The second one fails utilizing batch write for the same input keys. > BytesType and batch mutate causes encoded bytes of non-printable characters > to be dropped > - > > Key: CASSANDRA-1235 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1235 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.6.2 > Environment: Java 1.6 sun JDK > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, > Ubuntu 10.04 64 bit >Reporter: Todd Nine >Priority: Blocker > Attachments: TestEncodedKeys.java > > > When running the two tests, individual column insert works with the values > generated. However, batch insert with the same values causes an encoding > failure on the key. It appears bytes are dropped from the end of the byte > array that represents the key value. See the attached unit test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1206) Create the functional equivalent of the != op
Create the functional equivalent of the != op - Key: CASSANDRA-1206 URL: https://issues.apache.org/jira/browse/CASSANDRA-1206 Project: Cassandra Issue Type: New Feature Reporter: Todd Nine Currently, the <, <=, ==, >, >= operands can be used utilizing KeyRanges. However, it is not possible to execute a query which provides all keys except key K with a single call. Please implement this to avoid having to make 2 calls with KeyRange less and KeyRange greater than the given value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.