[jira] [Commented] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster

2014-11-06 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200435#comment-14200435
 ] 

Todd Nine commented on CASSANDRA-8257:
--

[~brandon.williams] I would love to, where is it?

 Opscenter Agent does not properly download target cassandra cluster
 ---

 Key: CASSANDRA-8257
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8257
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Opscenter 5.0.1, Cassandra 1.2.19
Reporter: Todd Nine

 Rather than re-post the issue, it is outlined here.
 http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster.
 Note that when omitting the target Cassandra cluster, using the same cluster 
 as the agent works correctly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster

2014-11-06 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200435#comment-14200435
 ] 

Todd Nine edited comment on CASSANDRA-8257 at 11/6/14 4:53 PM:
---

[~brandon.williams] I would love to, where is it?  I'm assuming it's here.

https://datastax.jira.com

But I can't actually get in to create an issue.


was (Author: tnine):
[~brandon.williams] I would love to, where is it?

 Opscenter Agent does not properly download target cassandra cluster
 ---

 Key: CASSANDRA-8257
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8257
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Opscenter 5.0.1, Cassandra 1.2.19
Reporter: Todd Nine

 Rather than re-post the issue, it is outlined here.
 http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster.
 Note that when omitting the target Cassandra cluster, using the same cluster 
 as the agent works correctly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster

2014-11-06 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200445#comment-14200445
 ] 

Todd Nine commented on CASSANDRA-8257:
--

Already done.  The opscenter tag is on the issue in this ticket.

http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster


 Opscenter Agent does not properly download target cassandra cluster
 ---

 Key: CASSANDRA-8257
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8257
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Opscenter 5.0.1, Cassandra 1.2.19
Reporter: Todd Nine

 Rather than re-post the issue, it is outlined here.
 http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster.
 Note that when omitting the target Cassandra cluster, using the same cluster 
 as the agent works correctly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster

2014-11-05 Thread Todd Nine (JIRA)
Todd Nine created CASSANDRA-8257:


 Summary: Opscenter Agent does not properly download target 
cassandra cluster
 Key: CASSANDRA-8257
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8257
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Opscenter 5.0, Cassandra 1.2.19
Reporter: Todd Nine


Rather than re-post the issue, it is outlined here.

http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster.

Note that when omitting the target Cassandra cluster, using the same cluster as 
the agent works correctly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8257) Opscenter Agent does not properly download target cassandra cluster

2014-11-05 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-8257:
-
Environment: Opscenter 5.0.1, Cassandra 1.2.19  (was: Opscenter 5.0, 
Cassandra 1.2.19)

 Opscenter Agent does not properly download target cassandra cluster
 ---

 Key: CASSANDRA-8257
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8257
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Opscenter 5.0.1, Cassandra 1.2.19
Reporter: Todd Nine

 Rather than re-post the issue, it is outlined here.
 http://stackoverflow.com/questions/26722154/opscenter-wont-use-a-separate-cassandra-cluster.
 Note that when omitting the target Cassandra cluster, using the same cluster 
 as the agent works correctly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7855) Genralize use of IN for compound partition keys

2014-09-01 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117519#comment-14117519
 ] 

Todd Nine commented on CASSANDRA-7855:
--

I would argue that the syntax should always have the following format.

{code}
SELECT * FROM foo WHERE (k1, k2) IN ( (0, 1) , (1, 2) )
{code}

Simply because in the use case provided in my ticket, you know all possible 
combinations of fields to construct partition keys.  By grouping them together 
within the parenthesis, it is clear to both the user and the grammar that all 
terms within the parens comprise a partition key.  By reading the above it is 
clear that (0, 1) is a partition key, as is (1, 2)




 Genralize use of IN for compound partition keys
 ---

 Key: CASSANDRA-7855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7855
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: cql
 Fix For: 2.0.11


 When you have a compount partition key, we currently only support to have a 
 {{IN}} on the last column of that partition key. So given:
 {noformat}
 CREATE TABLE foo (
 k1 int,
 k2 int,
 v int,
 PRIMARY KEY ((k1, k2))
 )
 {noformat}
 we allow
 {noformat}
 SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2)
 {noformat}
 but not
 {noformat}
 SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2)
 {noformat}
 There is no particular reason for us not supporting the later (to the best of 
 my knowledge) since it's reasonably straighforward, so we should fix it.
 I'll note that using {{IN}} on a partition key is not necessarily a better 
 idea than parallelizing queries server client side so this syntax, when 
 introduced, should probably be used sparingly, but given we do support IN on 
 partition keys, I see no reason not to extend it to compound PK properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7855) Genralize use of IN for compound partition keys

2014-09-01 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117519#comment-14117519
 ] 

Todd Nine edited comment on CASSANDRA-7855 at 9/1/14 4:13 PM:
--

I would argue that the syntax should always have the following format.

{code}
SELECT * FROM foo WHERE (k1, k2) IN ( (0, 1) , (1, 2) )
{code}

Simply because in the use case provided in my ticket, you know all possible 
combinations of fields to construct partition keys.  By grouping them together 
within the parenthesis, it is clear to both the user and the grammar that all 
terms within the parens comprise a partition key.  Visually is it clear that 
(0, 1) is a partition key, as is (1, 2)





was (Author: tnine):
I would argue that the syntax should always have the following format.

{code}
SELECT * FROM foo WHERE (k1, k2) IN ( (0, 1) , (1, 2) )
{code}

Simply because in the use case provided in my ticket, you know all possible 
combinations of fields to construct partition keys.  By grouping them together 
within the parenthesis, it is clear to both the user and the grammar that all 
terms within the parens comprise a partition key.  By reading the above it is 
clear that (0, 1) is a partition key, as is (1, 2)




 Genralize use of IN for compound partition keys
 ---

 Key: CASSANDRA-7855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7855
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: cql
 Fix For: 2.0.11


 When you have a compount partition key, we currently only support to have a 
 {{IN}} on the last column of that partition key. So given:
 {noformat}
 CREATE TABLE foo (
 k1 int,
 k2 int,
 v int,
 PRIMARY KEY ((k1, k2))
 )
 {noformat}
 we allow
 {noformat}
 SELECT * FROM foo WHERE k1 = 0 AND k2 IN (1, 2)
 {noformat}
 but not
 {noformat}
 SELECT * FROM foo WHERE k1 IN (0, 1) AND k2 IN (1, 2)
 {noformat}
 There is no particular reason for us not supporting the later (to the best of 
 my knowledge) since it's reasonably straighforward, so we should fix it.
 I'll note that using {{IN}} on a partition key is not necessarily a better 
 idea than parallelizing queries server client side so this syntax, when 
 introduced, should probably be used sparingly, but given we do support IN on 
 partition keys, I see no reason not to extend it to compound PK properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-7854) Unable to select partition keys directly using IN keyword (no replacement for multi row multiget in thrift)

2014-08-31 Thread Todd Nine (JIRA)
Todd Nine created CASSANDRA-7854:


 Summary: Unable to select partition keys directly using IN keyword 
(no replacement for multi row multiget in thrift)
 Key: CASSANDRA-7854
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7854
 Project: Cassandra
  Issue Type: Bug
Reporter: Todd Nine


We're converting some old thrift CF's to CQL.  We aren't looking to change the 
underlying physical structure, since this has proven effective in production.  
In order to migrate, we need full select via multi equivalent.  In thrift, the 
format was as follows.

(scopeId, scopeType, nodeId, nodeType){ 0x00, timestamp }

Where we have deliberately designed only 1 column per row.  To translate this 
to CQL, I have defined the following table.

{code}
CREATE TABLE Graph_Marked_Nodes ( 
 scopeId uuid,
 scopeType varchar,
 nodeId uuid,
 nodeType varchar,
 timestamp bigint,
 PRIMARY KEY(scopeId, scopeType, nodeId, nodeType)
)
{code}

I then try to select using the IN keyword.

{code}
select timestamp from Graph_Marked_Nodes WHERE (scopeId , scopeType , nodeId , 
nodeType)  IN ( (5a391596-3181-11e4-a87e-600308a690e2, 'organization', 
5a3a2708-3181-11e4-a87e-600308a690e2 ,'test' 
),(5a391596-3181-11e4-a87e-600308a690e2, 'organization', 
5a3a2709-3181-11e4-a87e-600308a690e2 ,'test' 
),(5a391596-3181-11e4-a87e-600308a690e2, 'organization', 
5a39fff7-3181-11e4-a87e-600308a690e2 ,'test' ) )
{code}

Which results in the following stack trace

{code}
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: 
Multi-column relations can only be applied to clustering columns: scopeid
at 
com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110)
at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
at 
com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:367)
at 
com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:584)
{code}


This is still possible via the thrift API.  Apologies in advance if I've filed 
this erroneously.  I can't find any examples of this type of query anywhere.

Note that our size grows far too large to fit in a single physical partition 
(row) if we use only scopeId and scopeType, so we need all 4 data elements to 
be part of our partition key to ensure we have the distribution we need.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5062) Support CAS

2012-12-13 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531169#comment-13531169
 ] 

Todd Nine commented on CASSANDRA-5062:
--

Can you elaborate on your timeout question with an example?  I think we're on 
the same page with this, but wanted to be sure.  There's currently a small 
window in which a client can think it has a lock, when it actually doesn't 
(timeout/2).  This is due to not having any way for the lock to receive a 
notification when it's column has reached it's ttl and is removed because the 
lock heartbeat failed.  

 Support CAS
 ---

 Key: CASSANDRA-5062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
 Fix For: 2.0


 Strong consistency is not enough to prevent race conditions.  The classic 
 example is user account creation: we want to ensure usernames are unique, so 
 we only want to signal account creation success if nobody else has created 
 the account yet.  But naive read-then-write allows clients to race and both 
 think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-4771) Setting TTL to Integer.MAX causes columns to not be persisted.

2012-10-05 Thread Todd Nine (JIRA)
Todd Nine created CASSANDRA-4771:


 Summary: Setting TTL to Integer.MAX causes columns to not be 
persisted.
 Key: CASSANDRA-4771
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4771
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.12
Reporter: Todd Nine
Priority: Blocker


When inserting columns via batch mutation, we have an edge case where columns 
will be set to Integer.MAX.  When setting the column expiration time to 
Integer.MAX, the columns do not appear to be persisted.

Fails:

Integer.MAX 
Integer.MAX/2

Works:
Integer.MAX/3



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-3228) Add new range scan with clock

2011-09-18 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-3228:
-

Summary: Add new range scan with clock  (was: Add new range scan with 
optional clock)

 Add new range scan with clock
 -

 Key: CASSANDRA-3228
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3228
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.8.5
Reporter: Todd Nine
Priority: Minor

 Currently, it is not possible to specify minimum clock time on columns when 
 performing range scans.  In some situations, such as custom migration or 
 batch processing, it would be helpful to allow the client to specify a 
 minimum clock time.  This would only return columns with a clock value = the 
 specified. 
 I.E
 range scan (rowKey, startVal, endVal, revered, min clock)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3228) Add new range scan with optional clock

2011-09-18 Thread Todd Nine (JIRA)
Add new range scan with optional clock
--

 Key: CASSANDRA-3228
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3228
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.8.5
Reporter: Todd Nine
Priority: Minor


Currently, it is not possible to specify minimum clock time on columns when 
performing range scans.  In some situations, such as custom migration or batch 
processing, it would be helpful to allow the client to specify a minimum clock 
time.  This would only return columns with a clock value = the specified. 

I.E

range scan (rowKey, startVal, endVal, revered, min clock)



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3209) CLI does not display error when it is not possible to create a keyspace when schemas in cluster do not agree.

2011-09-14 Thread Todd Nine (JIRA)
CLI does not display error when it is not possible to create a keyspace when 
schemas in cluster do not agree.
-

 Key: CASSANDRA-3209
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3209
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Latest brisk beta 2 deb on Ubuntu 10.10 server 64 bit.  
(Using chef recipe to install)
Reporter: Todd Nine


Cluster:

3 nodes.  2 online, 1 offline

describe cluster; displays 2 schema versions.  2 nodes are on 1 version, a 
single node is on a different version.

Issue this command in the CLI.

create keyspace TestKeyspace with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy' and 
strategy_options=[{Brisk:3, Cassandra:0}];

What should happen.

An error should be displayed when the keyspace cannot be created.

What actually happens.

The user is presented with null as the output.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3209) CLI does not display error when it is not possible to create a keyspace when schemas in cluster do not agree.

2011-09-14 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105038#comment-13105038
 ] 

Todd Nine commented on CASSANDRA-3209:
--

Unfortunately this was on a testing cluster which has since been shut down, so 
I don't have the system.log output.

 CLI does not display error when it is not possible to create a keyspace when 
 schemas in cluster do not agree.
 -

 Key: CASSANDRA-3209
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3209
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Latest brisk beta 2 deb on Ubuntu 10.10 server 64 bit.  
 (Using chef recipe to install)
Reporter: Todd Nine
Assignee: Pavel Yaskevich
  Labels: cli

 Cluster:
 3 nodes.  2 online, 1 offline
 describe cluster; displays 2 schema versions.  2 nodes are on 1 version, a 
 single node is on a different version.
 Issue this command in the CLI.
 create keyspace TestKeyspace with placement_strategy = 
 'org.apache.cassandra.locator.NetworkTopologyStrategy' and 
 strategy_options=[{Brisk:3, Cassandra:0}];
 What should happen.
 An error should be displayed when the keyspace cannot be created.
 What actually happens.
 The user is presented with null as the output.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-30 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094116#comment-13094116
 ] 

Todd Nine commented on CASSANDRA-2915:
--

I agree that order by could be a performance killer for large data sets.  In 
large data sets I think that users should make use of de-normalization and 
create their own secondary index for efficient querying.  However, on small 
data sets, which seem to be very common in web systems (ours is about 80% of 
the data a user sees), order by semantics are very important.  Most of our data 
the user sees has a very small result set,  100 rows.  I think explicitly 
prohibiting these features limit the user too much.  Shouldn't they be 
supported and ultimately it is up to the user to determine which approach they 
take in implementing index for their data?

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093263#comment-13093263
 ] 

Todd Nine commented on CASSANDRA-2915:
--

Could we also use this feature as a standard way for building our lucene 
documents?  This would accomplish what we want, as well as giving a hook for 
more user functionality.

CASSANDRA-1311


 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093344#comment-13093344
 ] 

Todd Nine commented on CASSANDRA-2915:
--

I think forcing users to install classes for common use cases would cause 
issues with adoption.  What about creating new CQL commands to handle this?  
When creating an index in a db, you would define the fields and the manner in 
which they are indexed.  Could we do something like the following?


create index [colname] in [colfamily] using [index type 1] as [indexFieldName], 
[index type 2] as [indexFieldName], [index type n] as [indexFieldName]?

drop index [indexFieldName] in [colfamily] on [colname]



This way clients such as JPA can update and create indexes, without the need to 
install custom classes on Cassandra itself.  They also have the ability to 
directly reference the field name when using CQL queries.

Assuming that the index class types exist in the Lucene classpath, you get the 
1 to many mappings for column to indexing strategy.  This would allow more 
advanced clients such as the JPA plugin to automatically add indexes to the 
document based on indexes defined on persistent fields, without generating any 
code the user has to install in the Cassandra runtime.  If users want to 
install custom analyzers, they still have the option to do so, and would gain 
access to it via CQL.

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093344#comment-13093344
 ] 

Todd Nine edited comment on CASSANDRA-2915 at 8/30/11 2:13 AM:
---

I think forcing users to install classes for common use cases would cause 
issues with adoption.  What about creating new CQL commands to handle this?  
When creating an index in a db, you would define the fields and the manner in 
which they are indexed.  Could we do something like the following?


create index on [colname] in [colfamily] using [index type 1] as 
[indexFieldName], [index type 2] as [indexFieldName], [index type n] as 
[indexFieldName]?

drop index [indexFieldName] in [colfamily] on [colname]



This way clients such as JPA can update and create indexes, without the need to 
install custom classes on Cassandra itself.  They also have the ability to 
directly reference the field name when using CQL queries.

Assuming that the index class types exist in the Lucene classpath, you get the 
1 to many mappings for column to indexing strategy.  This would allow more 
advanced clients such as the JPA plugin to automatically add indexes to the 
document based on indexes defined on persistent fields, without generating any 
code the user has to install in the Cassandra runtime.  If users want to 
install custom analyzers, they still have the option to do so, and would gain 
access to it via CQL.

  was (Author: tnine):
I think forcing users to install classes for common use cases would cause 
issues with adoption.  What about creating new CQL commands to handle this?  
When creating an index in a db, you would define the fields and the manner in 
which they are indexed.  Could we do something like the following?


create index [colname] in [colfamily] using [index type 1] as [indexFieldName], 
[index type 2] as [indexFieldName], [index type n] as [indexFieldName]?

drop index [indexFieldName] in [colfamily] on [colname]



This way clients such as JPA can update and create indexes, without the need to 
install custom classes on Cassandra itself.  They also have the ability to 
directly reference the field name when using CQL queries.

Assuming that the index class types exist in the Lucene classpath, you get the 
1 to many mappings for column to indexing strategy.  This would allow more 
advanced clients such as the JPA plugin to automatically add indexes to the 
document based on indexes defined on persistent fields, without generating any 
code the user has to install in the Cassandra runtime.  If users want to 
install custom analyzers, they still have the option to do so, and would gain 
access to it via CQL.
  
 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1599) Add sort/order support for secondary indexing

2011-08-28 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1599:
-

Issue Type: Sub-task  (was: New Feature)
Parent: CASSANDRA-2915

 Add sort/order support for secondary indexing
 -

 Key: CASSANDRA-1599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Todd Nine
Assignee: Jonathan Ellis
   Original Estimate: 32h
  Remaining Estimate: 32h

 For a lot of users paging is a standard use case on many web applications.  
 It would be nice to allow paging as part of a Boolean Expression.
 Page - start index
- end index
- page timestamp 
- Sort Order
 When sorting, is it possible to sort both ASC and DESC? 
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1598) Add Boolean Expression to secondary querying

2011-08-28 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1598:
-

Issue Type: Sub-task  (was: New Feature)
Parent: CASSANDRA-2915

 Add Boolean Expression to secondary querying
 

 Key: CASSANDRA-1598
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Affects Versions: 0.7 beta 3
Reporter: Todd Nine

 Add boolean operators similar to Lucene style searches.  Currently there is 
 implicit support for the  operator.  It would be helpful to also add 
 support for ||/Union operators.  I would envision this as the client would be 
 required to construct the expression tree and pass it via the thrift 
 interface.
 BooleanExpression -- BooleanOrIndexExpression
  -- BooleanOperator
  -- BooleanOrIndexExpression
 I'd like to take a crack at this since it will greatly improve my Datanucleus 
 plugin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1598) Add Boolean Expression to secondary querying

2011-08-28 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1598:
-

Issue Type: New Feature  (was: Sub-task)
Parent: (was: CASSANDRA-2915)

 Add Boolean Expression to secondary querying
 

 Key: CASSANDRA-1598
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 0.7 beta 3
Reporter: Todd Nine

 Add boolean operators similar to Lucene style searches.  Currently there is 
 implicit support for the  operator.  It would be helpful to also add 
 support for ||/Union operators.  I would envision this as the client would be 
 required to construct the expression tree and pass it via the thrift 
 interface.
 BooleanExpression -- BooleanOrIndexExpression
  -- BooleanOperator
  -- BooleanOrIndexExpression
 I'd like to take a crack at this since it will greatly improve my Datanucleus 
 plugin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-28 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606
 ] 

Todd Nine commented on CASSANDRA-2915:
--


I don't necessaryly think there is a 1 to 1 relationship between a column and a 
Lucene document field.  In our case we have the need to index fields in more 
than one manner.  For instance, we index users as straight strings (lowercased) 
with email, first name and last name columns.  However we also want to tokenize 
the email, first and last name columns to allow our customer support people to 
perform partial name matching.  I think a 1 to N mapping is required for column 
to document field to allow this sort of functionality.

As far as expiration on columns, is there a system event that we can hook into 
to just force a document reindex when a column expires rather than add an 
additional field that will need to be sorted from?

As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, 
LIKE etc are a must.  Most users have become accustomed to this functionality 
with RDBMS.  If they cause potential performance problems, I think this should 
be documented so that users have enough information to determine if they can 
rely on the Lucene index or should build their own index directly.


Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help?



 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-28 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606
 ] 

Todd Nine edited comment on CASSANDRA-2915 at 8/29/11 4:29 AM:
---

I don't necessaryly think there is a 1 to 1 relationship between a column and a 
Lucene document field.  In our case we have the need to index fields in more 
than one manner.  For instance, we index users as straight strings (lowercased) 
with email, first name and last name columns.  However we also want to tokenize 
the email, first and last name columns to allow our customer support people to 
perform partial name matching.  I think a 1 to N mapping is required for column 
to document field to allow this sort of functionality.

As far as expiration on columns, is there a system event that we can hook into 
to just force a document reindex when a column expires rather than add an 
additional field that will need to be sorted from?

As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, 
LIKE etc are a must.  Most users have become accustomed to this functionality 
with RDBMS.  If they cause potential performance problems, I think this should 
be documented so that users have enough information to determine if they can 
rely on the Lucene index or should build their own index directly.


Has anyone looked at existing code in ElasticSearch to avoid some of the 
pitfalls they have already experienced in building something similar?

http://www.elasticsearch.org/


Lastly, this is a huge feature for the hector-jpa plugin, what can I do to 
help?  



  was (Author: tnine):

I don't necessaryly think there is a 1 to 1 relationship between a column and a 
Lucene document field.  In our case we have the need to index fields in more 
than one manner.  For instance, we index users as straight strings (lowercased) 
with email, first name and last name columns.  However we also want to tokenize 
the email, first and last name columns to allow our customer support people to 
perform partial name matching.  I think a 1 to N mapping is required for column 
to document field to allow this sort of functionality.

As far as expiration on columns, is there a system event that we can hook into 
to just force a document reindex when a column expires rather than add an 
additional field that will need to be sorted from?

As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, 
LIKE etc are a must.  Most users have become accustomed to this functionality 
with RDBMS.  If they cause potential performance problems, I think this should 
be documented so that users have enough information to determine if they can 
rely on the Lucene index or should build their own index directly.


Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help?


  
 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-28 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606
 ] 

Todd Nine edited comment on CASSANDRA-2915 at 8/29/11 4:30 AM:
---

I don't necessarily think there is a 1 to 1 relationship between a column and a 
Lucene document field.  In our case we have the need to index fields in more 
than one manner.  For instance, we index users as straight strings (lowercased) 
with email, first name and last name columns.  However we also want to tokenize 
the email, first and last name columns to allow our customer support people to 
perform partial name matching.  I think a 1 to N mapping is required for column 
to document field to allow this sort of functionality.

As far as expiration on columns, is there a system event that we can hook into 
to just force a document reindex when a column expires rather than add an 
additional field that will need to be sorted from?

As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, 
LIKE etc are a must.  Most users have become accustomed to this functionality 
with RDBMS.  If they cause potential performance problems, I think this should 
be documented so that users have enough information to determine if they can 
rely on the Lucene index or should build their own index directly.


Has anyone looked at existing code in ElasticSearch to avoid some of the 
pitfalls they have already experienced in building something similar?

http://www.elasticsearch.org/


Lastly, this is a huge feature for the hector-jpa plugin, what can I do to 
help?  



  was (Author: tnine):
I don't necessaryly think there is a 1 to 1 relationship between a column 
and a Lucene document field.  In our case we have the need to index fields in 
more than one manner.  For instance, we index users as straight strings 
(lowercased) with email, first name and last name columns.  However we also 
want to tokenize the email, first and last name columns to allow our customer 
support people to perform partial name matching.  I think a 1 to N mapping is 
required for column to document field to allow this sort of functionality.

As far as expiration on columns, is there a system event that we can hook into 
to just force a document reindex when a column expires rather than add an 
additional field that will need to be sorted from?

As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, 
LIKE etc are a must.  Most users have become accustomed to this functionality 
with RDBMS.  If they cause potential performance problems, I think this should 
be documented so that users have enough information to determine if they can 
rely on the Lucene index or should build their own index directly.


Has anyone looked at existing code in ElasticSearch to avoid some of the 
pitfalls they have already experienced in building something similar?

http://www.elasticsearch.org/


Lastly, this is a huge feature for the hector-jpa plugin, what can I do to 
help?  


  
 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079653#comment-13079653
 ] 

Todd Nine edited comment on CASSANDRA-2915 at 8/4/11 10:33 PM:
---

Hey guys.  We're doing something similar in the hector JPA plugin. 

Would using dynamic composites within cassandra alleviate the need for Lucene 
documents?  We're using this in secondary indexing and it gives us order by 
semantics and AND (Union).  The largest issue becomes iteration with OR 
clauses, AND clauses can be compressed into a single column for efficient range 
scans, we then use iterators to UNION the OR trees together with order clauses 
in the composites.  The caveat is that the user must define indexes with order 
semantics up front.  However this can easily be added to the existing secondary 
indexing clauses. 

  was (Author: tnine):
Hey guys.  We're doing something similar in the hector JPA plugin. 

Would using dynamic composites within cassandra alleviate the need for Lucene 
documents?  We're using this in secondary indexing and it gives us order by 
semantics and AND (Union).  The largest issue becomes iteration with OR 
clauses, AND clauses can be compressed into a single column for efficient range 
scans, we then use iterators to UNION the OR trees together with order clauses 
in the composites.  
  
 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index
 Fix For: 1.0


 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079695#comment-13079695
 ] 

Todd Nine commented on CASSANDRA-2915:
--

I'm quite keen to contribute on this issue, as this will greatly enhance the 
functionality of the hector-jpa project.  If I can contribute any work, please 
let me know.

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index
 Fix For: 1.0


 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-04-03 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015273#comment-13015273
 ] 

Todd Nine commented on CASSANDRA-2231:
--

Hi Sylvain,
  I seem to have encountered a bug in the comparator.  I'm using the composite 
to perform Cassandra based intersections of fields during queries.  For 
instance, say a user defines an index as this.

status + unitId.


The write would always contain values of status+0+unitId+0 when using the 
-1 0 and 1 fields.  If the user enters a query such as this


status  100  status  300  unitId = 10

I would need to construct a column scan of the following to get  correct result 
sets.

start = 100+1+10+0

end = 300+0+10+1


However on in the validate function I'm receiving the error Invalid bytes 
remaining after an end-of-component at component.  This seems incorrect to me. 
 We're ultimately attempting to apply any equality operand and transform it to 
a range scan for the given field in the composite.  This means that -1, or 1 
could appear after any component in the composite, not just the last one.  Can 
you please add this functionality/remove this verification check?

Thanks,
Todd

 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.3
Reporter: Ed Anuff
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.5

 Attachments: CompositeType-and-DynamicCompositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-03-17 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008302#comment-13008302
 ] 

Todd Nine commented on CASSANDRA-2231:
--

Since this is primarily needed for ordering well defined queries and 
collections, can we add a bitset to the composite type to represent sort 
ordering?  A lot of queries need the semantics of order by logindate desc, 
firstname asc, lastname asc.  This would give is the ability to set the 
descending flag on any one of the composite types allowing us to always have a 
correctly order result set stored in the column name.  In this case I would 
simply set the bit 0 to signal to the composite it needs to order the first 
field descending instead of the standard ascending.



 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.3
Reporter: Ed Anuff
Priority: Minor
 Attachments: 0001-Add-compositeType-and-DynamicCompositeType.patch, 
 0001-Add-compositeType.patch, edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-03-02 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001615#comment-13001615
 ] 

Todd Nine commented on CASSANDRA-2231:
--

Enforcing all columns to be the same would break our indexing.  Each row key is 
a different index, and the columns within that index are composed of different 
composite types.  If this was enforced at the CF level, we would require a 
different CF for each index.  Is it possible to allow both static and dynamic 
types by creating 2 composite index types?

 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.3
Reporter: Ed Anuff
Priority: Minor
 Attachments: 0001-Add-compositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Issue Comment Edited: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-03-02 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001615#comment-13001615
 ] 

Todd Nine edited comment on CASSANDRA-2231 at 3/2/11 8:42 PM:
--

Enforcing all columns to be the same would break our indexing.  Each row key is 
a different index, and the columns within that index are composed of different 
composite types.  If this was enforced at the CF level, we would require a 
different CF for each index.  Is it possible to allow both static and dynamic 
types by creating 2 composite index types?  I.E StaticComposite using your 
patch and DynamicComposite using Eds?

  was (Author: tnine):
Enforcing all columns to be the same would break our indexing.  Each row 
key is a different index, and the columns within that index are composed of 
different composite types.  If this was enforced at the CF level, we would 
require a different CF for each index.  Is it possible to allow both static and 
dynamic types by creating 2 composite index types?
  
 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.3
Reporter: Ed Anuff
Priority: Minor
 Attachments: 0001-Add-compositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-03-02 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001656#comment-13001656
 ] 

Todd Nine commented on CASSANDRA-2231:
--

I do see your point.  However, as a user and a developer of a JPA plugin take 
this use case for an example of our requriements

Entities: User - Vehicles

A User has a collection of vehicles.  These can be ordered in several ways.  
For this example, lets say by time created and name.  This would lead to 
several different column comparators.   For properties on the user, first name, 
last name, these would be UTF8 columns, for the collections, these would be 
composite.  I would end up with columns of the following definitions to allow 
us to quickly load or them.

UTF8(firstname):value
UTF8(lastname):value
UTF8(vehicletime) LONG(vehicleTime) TIMEUUID(VEHICLEID) UTF8(VehicleProp) : 
value
UTF8(vehiclename) UTF8(vehicleName) TIMEUUID(VEHICLEID) UTF8(VehicpeProp) : 
value

In the event we move this index to an external CF, we lose the serialization 
scope and guaranteed atomic write of using a single row key during a mutation.  
While this is a separate issue as on Cassandra in general, we do need the 
ability to work around this until serialization scope can extend over multiple 
rows.  Without this, to ensure that lost writes don't occur on indexing, a 
user would need to introduce an extra system such as Zookeeper.

https://issues.apache.org/jira/browse/CASSANDRA-1684



 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.3
Reporter: Ed Anuff
Priority: Minor
 Attachments: 0001-Add-compositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-03-02 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001770#comment-13001770
 ] 

Todd Nine commented on CASSANDRA-2231:
--

As per Ed's previous comment.  I use over 30 different indexing schemes on our 
data currently in my JDO code.  The ultimate goal for this feature is to 
support the implementation of a JPA framework that works similarly to GAE.  

Having the ability to build the indexes that a user specifies without 
dynamically creating CFs is a must have for us.  There are a lot of issues 
surrounding the complexity of building the index itself in the plugin that are 
outside the scope of this issue.  However we don't really have a comparator 
mechanism to support these types of indexes.  

In all use cases we defined, our searches and therefore indexes need an order 
by clause as well as a query criteria to support the paging that most 
applications will require.  This order could simply be a natural ordering of 
entity keys, or it could be on specific properties of the related entity.  As 
applications grow in size, so will the complexity and number of indexes to 
support it, I'm concerned creating this many CFs could cause serious issues.  
This doesn't have to be a well advertised feature for the end user to create 
CFs with, but I feel very strongly that a dynamic type for CF's is a must in 
order to proceed with the JPA plugin we've been designing.

 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.7.3
Reporter: Ed Anuff
Priority: Minor
 Attachments: 0001-Add-compositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2179) Secondary indexing of columns with duplicate values does not return all row keys

2011-02-17 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-2179:
-

Attachment: test.patch

Patch for test case

 Secondary indexing of columns with duplicate values does not return all row 
 keys
 

 Key: CASSANDRA-2179
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2179
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.1
 Environment: Java 1.6 64 bit Ubuntu
Reporter: Todd Nine
 Attachments: test.patch


 Create a CF test with a column value and holder.  Create an index on the 
 Value column of type UTF8 and a Keys index.  Insert the following values into 
 the cf
 new UUID():{ value: test, holder: 0x00}
 new UUID():{ value: test, holder: 0x00}
 new UUID():{ value: test, holder: 0x00}
 Query the secondary index where value EQ test and select column holder
 You should be returned 3 rows.  Instead you are returned one.  It seems that 
 the last row written with the column value is the only row that is returned 
 when the column value contains duplicates.  I'll attempt to create a python 
 client test that demonstrates the issue.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2179) Secondary indexing of columns with duplicate values does not return all row keys

2011-02-17 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-2179:
-

Attachment: test.patch

Please ignore previous patch, was incorrect version.

 Secondary indexing of columns with duplicate values does not return all row 
 keys
 

 Key: CASSANDRA-2179
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2179
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.1
 Environment: Java 1.6 64 bit Ubuntu
Reporter: Todd Nine
 Attachments: test.patch, test.patch


 Create a CF test with a column value and holder.  Create an index on the 
 Value column of type UTF8 and a Keys index.  Insert the following values into 
 the cf
 new UUID():{ value: test, holder: 0x00}
 new UUID():{ value: test, holder: 0x00}
 new UUID():{ value: test, holder: 0x00}
 Query the secondary index where value EQ test and select column holder
 You should be returned 3 rows.  Instead you are returned one.  It seems that 
 the last row written with the column value is the only row that is returned 
 when the column value contains duplicates.  I'll attempt to create a python 
 client test that demonstrates the issue.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (CASSANDRA-2179) Secondary indexing of columns with duplicate values does not return all row keys

2011-02-16 Thread Todd Nine (JIRA)
Secondary indexing of columns with duplicate values does not return all row keys


 Key: CASSANDRA-2179
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2179
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.1
 Environment: Java 1.6 64 bit Ubuntu
Reporter: Todd Nine


Create a CF test with a column value and holder.  Create an index on the 
Value column of type UTF8 and a Keys index.  Insert the following values into 
the cf

new UUID():{ value: test, holder: 0x00}
new UUID():{ value: test, holder: 0x00}
new UUID():{ value: test, holder: 0x00}

Query the secondary index where value EQ test and select column holder

You should be returned 3 rows.  Instead you are returned one.  It seems that 
the last row written with the column value is the only row that is returned 
when the column value contains duplicates.  I'll attempt to create a python 
client test that demonstrates the issue.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (CASSANDRA-1598) Add Boolean Expression to secondary querying

2010-10-10 Thread Todd Nine (JIRA)
Add Boolean Expression to secondary querying


 Key: CASSANDRA-1598
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.7.0
Reporter: Todd Nine


Add boolean operators similar to Lucene style searches.  Currently there is 
implicit support for the  operator.  It would be helpful to also add support 
for ||/Union operators.  I would envision this as the client would be required 
to construct the expression tree and pass it via the thrift interface.



BooleanExpression -- BooleanOrIndexExpression
 -- BooleanOperator
 -- BooleanOrIndexExpression


I'd like to take a crack at this since it will greatly improve my Datanucleus 
plugin


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Todd Nine (JIRA)
Add paging support for secondary indexing
-

 Key: CASSANDRA-1599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
 Project: Cassandra
  Issue Type: New Feature
Reporter: Todd Nine
 Fix For: 0.7.0


For a lot of users paging is a standard use case on many web applications.  It 
would be nice to allow paging as part of a Boolean Expression.

Page - start index
   - end index
   - page timestamp 
   - Sort Order


When sorting, is it possible to sort both ASC and DESC? 






-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919687#action_12919687
 ] 

Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM:


Consider a query similar to the following. 


email == 'b...@gmail.com'  (lastlogindate  today - 5 days || newmessagedate 
 today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and diff it with the expression tree 
provided.  I'm not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications once union and 
intersections are implemented server side.  

  was (Author: tnine):
Consider a query similar to the following. 


email == 'b...@gmail.com'  (lastlogindate  today - 5 days || newmessagedate 
 today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and diff it with the expression tree 
provided.  Not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications.
  
 Add paging support for secondary indexing
 -

 Key: CASSANDRA-1599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
 Project: Cassandra
  Issue Type: New Feature
Reporter: Todd Nine
 Fix For: 0.7.0


 For a lot of users paging is a standard use case on many web applications.  
 It would be nice to allow paging as part of a Boolean Expression.
 Page - start index
- end index
- page timestamp 
- Sort Order
 When sorting, is it possible to sort both ASC and DESC? 
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919687#action_12919687
 ] 

Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM:


Consider a query similar to the following. 


email == 'b...@gmail.com'  (lastlogindate  today - 5 days || newmessagedate 
 today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and diff it with the expression tree 
provided.  I'm not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications once union and 
intersections are implemented in Cassandra.  

  was (Author: tnine):
Consider a query similar to the following. 


email == 'b...@gmail.com'  (lastlogindate  today - 5 days || newmessagedate 
 today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and diff it with the expression tree 
provided.  I'm not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications once union and 
intersections are implemented server side.  
  
 Add paging support for secondary indexing
 -

 Key: CASSANDRA-1599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
 Project: Cassandra
  Issue Type: New Feature
Reporter: Todd Nine
 Fix For: 0.7.0


 For a lot of users paging is a standard use case on many web applications.  
 It would be nice to allow paging as part of a Boolean Expression.
 Page - start index
- end index
- page timestamp 
- Sort Order
 When sorting, is it possible to sort both ASC and DESC? 
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-07-08 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886429#action_12886429
 ] 

Todd Nine commented on CASSANDRA-1235:
--

While I'm in agreement with Uwe, my bigger concern is that two tests that are 
functionally equivalent return different results based on the mutation 
operations.  Performing a batch mutate with the same insertion data as a single 
write should insert and the same bytes.  Unfortunately batch mutate appears to 
be randomly dropping bytes.  If it were a true UTF8 issue, wouldn't it drop 
bytes on the single column writes as well batch mutate?

 BytesType and batch mutate causes encoded bytes of non-printable characters 
 to be dropped
 -

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6
 Environment: Java 1.6 sun JDK 
 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
 Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Critical
 Fix For: 0.6.4

 Attachments: TestEncodedKeys.java


 When running the two tests, individual column insert works with the values 
 generated.  However, batch insert with the same values causes an encoding 
 failure on the key.  It appears bytes are dropped from the end of the byte 
 array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-06-28 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1235:
-

Fix Version/s: (was: 0.6.4)

 BytesType and batch mutate causes encoded bytes of non-printable characters 
 to be dropped
 -

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6.2
 Environment: Java 1.6 sun JDK 
 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
 Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Critical
 Attachments: TestEncodedKeys.java


 When running the two tests, individual column insert works with the values 
 generated.  However, batch insert with the same values causes an encoding 
 failure on the key.  It appears bytes are dropped from the end of the byte 
 array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-06-28 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1235:
-

Priority: Blocker  (was: Critical)

 BytesType and batch mutate causes encoded bytes of non-printable characters 
 to be dropped
 -

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6.2
 Environment: Java 1.6 sun JDK 
 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
 Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Blocker
 Attachments: TestEncodedKeys.java


 When running the two tests, individual column insert works with the values 
 generated.  However, batch insert with the same values causes an encoding 
 failure on the key.  It appears bytes are dropped from the end of the byte 
 array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-06-28 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1235:
-

Affects Version/s: 0.6.2
   (was: 0.6)

 BytesType and batch mutate causes encoded bytes of non-printable characters 
 to be dropped
 -

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6.2
 Environment: Java 1.6 sun JDK 
 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
 Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Blocker
 Attachments: TestEncodedKeys.java


 When running the two tests, individual column insert works with the values 
 generated.  However, batch insert with the same values causes an encoding 
 failure on the key.  It appears bytes are dropped from the end of the byte 
 array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-06-28 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883261#action_12883261
 ] 

Todd Nine commented on CASSANDRA-1235:
--

No worries, sorry about that, I just realized the affected version was 
incorrect.  Where can I look to begin fixing this? Unfortunately this issue has 
caused our development to a halt since we depend on the functionality of 
numeric range queries in Lucene/Lucandra.  Ideally I'd like to create a patch 
that applies to 0.6.2 so we can roll our own build with the patch and get 
running again.  I'm assuming it's an issue with the thrift server, but I don't 
want to start tweaking things without a good idea on where I should be looking 
for this issue.

 BytesType and batch mutate causes encoded bytes of non-printable characters 
 to be dropped
 -

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6
 Environment: Java 1.6 sun JDK 
 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
 Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Critical
 Fix For: 0.6.4

 Attachments: TestEncodedKeys.java


 When running the two tests, individual column insert works with the values 
 generated.  However, batch insert with the same values causes an encoding 
 failure on the key.  It appears bytes are dropped from the end of the byte 
 array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-06-27 Thread Todd Nine (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Nine updated CASSANDRA-1235:
-

Attachment: TestEncodedKeys.java

This file demonstrates the broken input.  Notice that the first test passes 
with clean input.  The second one fails utilizing batch write for the same 
input keys.

 BytesType and batch mutate causes encoded bytes of non-printable characters 
 to be dropped
 -

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6.2
 Environment: Java 1.6 sun JDK 
 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
 Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Blocker
 Attachments: TestEncodedKeys.java


 When running the two tests, individual column insert works with the values 
 generated.  However, batch insert with the same values causes an encoding 
 failure on the key.  It appears bytes are dropped from the end of the byte 
 array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

2010-06-27 Thread Todd Nine (JIRA)
BytesType and batch mutate causes encoded bytes of non-printable characters to 
be dropped
-

 Key: CASSANDRA-1235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.6.2
 Environment: Java 1.6 sun JDK 
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 

Ubuntu 10.04 64 bit
Reporter: Todd Nine
Priority: Blocker


When running the two tests, individual column insert works with the values 
generated.  However, batch insert with the same values causes an encoding 
failure on the key.  It appears bytes are dropped from the end of the byte 
array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1206) Create the functional equivalent of the != op

2010-06-17 Thread Todd Nine (JIRA)
Create the functional equivalent of the != op
-

 Key: CASSANDRA-1206
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1206
 Project: Cassandra
  Issue Type: New Feature
Reporter: Todd Nine


Currently, the , =, ==, , = operands can be used utilizing KeyRanges.  
However, it is not possible to execute a query which provides all keys except 
key K with a single call.  Please implement this to avoid having to make 2 
calls with KeyRange less and KeyRange greater than the given value.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.