[jira] Commented: (CASSANDRA-1601) Refactor index definitions

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919726#action_12919726
 ] 

Stu Hood commented on CASSANDRA-1601:
-

Trippy realization: validators, as implemented in trunk, are already a very 
specific type of UDF. The input is a single untyped column, and the output is a 
single typed column. The content of the index must be typed, so UDFs can 
consume arbitrary input, and will always output typed data.

> Refactor index definitions
> --
>
> Key: CASSANDRA-1601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
> Project: Cassandra
>  Issue Type: Improvement
>  Components: API
>Reporter: Stu Hood
>Priority: Critical
> Fix For: 0.7.0
>
>
> h3. Overview
> There are a few considerations for defining secondary indexes and row 
> validation that I don't think have been brought up yet. While the interface 
> is still malleable pre 0.7.0, we should attempt to make changes that allow 
> for forwards compatibility of index/validator schemas. This is an umbrella 
> ticket for suggesting/debating the changes: other tickets should be opened 
> for quick improvements that can be made before 0.7.0.
> 
> h3. Index output types
> The output (queryable) data from an indexing operation is what actually goes 
> in the index. For a particular row, the output can be either _single-valued_, 
> _multi-valued_ or _compound_:
> * Single-valued
> ** Implemented in trunk (special case of multi-valued)
> * Multi-valued
> ** Multiple index values _of the same type_ can match a single row
> ** Row probably contains a list/set (perhaps in a supercolumn)
> * Compound
> ** Multiple base properties concatenated as one index entry 
> ** Different validators/comparators for each component
> ** (Given the simplicity of performing boolean operations on 1472 indexes, 
> compound local indexes are unlikely to ever be worthwhile, but compound 
> distributed indexes will be: see comments on CASSANDRA-1599)
> h3. Index input types
> The other end of indexing is selection of values from a row to be indexed. 
> Selection can correspond directly to our current {{db.filter.*}} 
> implementations, and may be best implemented by specifying the 
> validator/index using the same Thrift objects you would use for a similar 
> query:
> * Name selection
> ** Implemented in trunk, but should probably just be a special case of list 
> selection below
> ** Corresponds to db.filter.NamesQueryFilter of size 1
> * List selection
> ** Should specify a list of columns of which all values must be of the same 
> type, as defined by the Validator
> ** Corresponds to db.filter.NamesQueryFilter
> * Range (prefix?) selection
> ** Subsets of a row may be interesting for indexing
> ** Range corresponds to db.filter.SliceQueryFilter
> *** (A Prefix might actually be more useful for indexing, but is better 
> implemented by indexing an arbitrarily nested row)
> ** Open question: might the ability to index only the 'top N values' from a 
> row be useful? If so, then this selector should allow N to be specified like 
> it would be for a slice
> h3. Supercolumns/arbitrary-nesting
> Another consideration is that we should be able to support indexing and 
> validation of supercolumns (and hence, arbitrarily nested rows). Since the 
> selection of columns to index is essentially the same as the selection of 
> columns to return for a query, this can probably mirror (and suggest 
> improvements to) our query API.
> h3. UDFs
> This is obviously still an open area, but user defined indexing functions are 
> essentially a transform between the _input_ and _output_ (as defined above), 
> which would normally have equal structures. Leaving room for UDFs in our 
> index definitions makes sense, and will likely lead to a much more general 
> and elegant design.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1601) Refactor index definitions

2010-10-10 Thread Stu Hood (JIRA)
Refactor index definitions
--

 Key: CASSANDRA-1601
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Stu Hood
Priority: Critical
 Fix For: 0.7.0


h3. Overview
There are a few considerations for defining secondary indexes and row 
validation that I don't think have been brought up yet. While the interface is 
still malleable pre 0.7.0, we should attempt to make changes that allow for 
forwards compatibility of index/validator schemas. This is an umbrella ticket 
for suggesting/debating the changes: other tickets should be opened for quick 
improvements that can be made before 0.7.0.



h3. Index output types
The output (queryable) data from an indexing operation is what actually goes in 
the index. For a particular row, the output can be either _single-valued_, 
_multi-valued_ or _compound_:
* Single-valued
** Implemented in trunk (special case of multi-valued)
* Multi-valued
** Multiple index values _of the same type_ can match a single row
** Row probably contains a list/set (perhaps in a supercolumn)
* Compound
** Multiple base properties concatenated as one index entry 
** Different validators/comparators for each component
** (Given the simplicity of performing boolean operations on 1472 indexes, 
compound local indexes are unlikely to ever be worthwhile, but compound 
distributed indexes will be: see comments on CASSANDRA-1599)

h3. Index input types
The other end of indexing is selection of values from a row to be indexed. 
Selection can correspond directly to our current {{db.filter.*}} 
implementations, and may be best implemented by specifying the validator/index 
using the same Thrift objects you would use for a similar query:
* Name selection
** Implemented in trunk, but should probably just be a special case of list 
selection below
** Corresponds to db.filter.NamesQueryFilter of size 1
* List selection
** Should specify a list of columns of which all values must be of the same 
type, as defined by the Validator
** Corresponds to db.filter.NamesQueryFilter
* Range (prefix?) selection
** Subsets of a row may be interesting for indexing
** Range corresponds to db.filter.SliceQueryFilter
*** (A Prefix might actually be more useful for indexing, but is better 
implemented by indexing an arbitrarily nested row)
** Open question: might the ability to index only the 'top N values' from a row 
be useful? If so, then this selector should allow N to be specified like it 
would be for a slice

h3. Supercolumns/arbitrary-nesting
Another consideration is that we should be able to support indexing and 
validation of supercolumns (and hence, arbitrarily nested rows). Since the 
selection of columns to index is essentially the same as the selection of 
columns to return for a query, this can probably mirror (and suggest 
improvements to) our query API.

h3. UDFs
This is obviously still an open area, but user defined indexing functions are 
essentially a transform between the _input_ and _output_ (as defined above), 
which would normally have equal structures. Leaving room for UDFs in our index 
definitions makes sense, and will likely lead to a much more general and 
elegant design.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1598) Add Boolean Expression to secondary querying

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919707#action_12919707
 ] 

Stu Hood commented on CASSANDRA-1598:
-

I don't see any ways to add OR backwards compatibly, which is probably a reason 
to add it sooner rather than later: perhaps in 0.7.

> Add Boolean Expression to secondary querying
> 
>
> Key: CASSANDRA-1598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Affects Versions: 0.7.0
>Reporter: Todd Nine
> Fix For: 0.8
>
>
> Add boolean operators similar to Lucene style searches.  Currently there is 
> implicit support for the && operator.  It would be helpful to also add 
> support for ||/Union operators.  I would envision this as the client would be 
> required to construct the expression tree and pass it via the thrift 
> interface.
> BooleanExpression --> BooleanOrIndexExpression
>  --> BooleanOperator
>  --> BooleanOrIndexExpression
> I'd like to take a crack at this since it will greatly improve my Datanucleus 
> plugin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1339) add support for GT, GTE index expressions

2010-10-10 Thread Stu Hood (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-1339:


Component/s: (was: Core)
 API

Marking as API to get all secondary index issues together in one view.

> add support for GT, GTE index expressions
> -
>
> Key: CASSANDRA-1339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1339
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Jonathan Ellis
> Fix For: 0.7.1
>
>
> this will require hitting every node in the cluster and merging results, 
> unlike with EQ.
> For instance, say we have the following index rows for some hypothetical 
> column C1:
> Node 1, ('' - M]
> 4: A G K 
> 5: B F J M
> Node 2, (M - '']
> 4: N P X
> 5: Q R T
> Because we store the index columns sorted in partitioner order, queries for 
> C1=4 can scan first node 1, then if insufficient data is found, proceed to 
> node 2.  But for GT or GTE queries we have to scan everyone and merge.  
> (Since we don't know what the next value after 4 is.  So an alternative would 
> be for each node to send back, along with the data for the first row, the row 
> key that comes next.  This would be very very messy.)
> Note that since we don't yet support range scans backwards, we can't support 
> LT or LTE queries.  The easiest workaround there would be to add a way to 
> specify that you want to create an index on the comparator, reversed.  This 
> is also worth doing for optimizations within normal columns -- see 
> CASSANDRA-1338.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1599) Add sort/order support for secondary indexing

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919700#action_12919700
 ] 

Stu Hood edited comment on CASSANDRA-1599 at 10/10/10 11:20 PM:


Another issue with local indexes is that implementing sorting would involve a 
clusterwide merge sort. A distributed index is required to efficiently return 
the data in index order. I think this issue should be delayed for 0.8.0 when we 
have distributed indexes available: the indexes available in 0.7.0 are intended 
for filtering data.

As a multi-part solution, (imo) we should:
 # (optionally) Rename local indexes to "filter_indexes" or "filters"
 # Expose 0.8.0 distributed indexes as readonly column families which are 
sorted by the index value, and which are queried using get_range_slices
 # Implement LT/LTE/GT/GTE operations for the key-range in get_range_slices

Outcomes:
 * Your "primary" index expression would be consistently queried using the 
"range" parameter in get_range_slices and would define the sort order
 * "filters" (0.7.0 secondary indexes) would be applied using the IndexClause 
argument as described on CASSANDRA-1600

I'm going to open another ticket to suggest some changes to index definitions 
to make this consistent.


  was (Author: stuhood):
Another issue with local indexes is that implementing sorting would involve 
a clusterwide merge sort. A distributed index is required to efficiently return 
the data in index order. I think this issue should be delayed for 0.8.0 when we 
have distributed indexes available: the indexes available in 0.7.0 are intended 
for filtering data.

As a multi-part solution, (imo) we should:
 # (optionally) Rename local indexes to "filter_indexes" or "filters"
 # Expose 0.8.0 distributed indexes as readonly column families which are 
sorted by the index value, and which are queried using get_range_slices
 # Implement LT/LTE/GT/GTE operations for the key-range in get_range_slices
Outcomes:
 * Your "primary" index expression would be consistently queried using the 
"range" parameter in get_range_slices and would define the sort order
 * "filters" (0.7.0 secondary indexes) would be applied using the IndexClause 
argument as described on CASSANDRA-1600

I'm going to open another ticket to suggest some changes to index definitions 
to make this consistent.

  
> Add sort/order support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Todd Nine
> Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1599) Add sort/order support for secondary indexing

2010-10-10 Thread Stu Hood (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-1599:


Fix Version/s: (was: 0.7.0)
   0.8
  Summary: Add sort/order support for secondary indexing  (was: Add 
paging support for secondary indexing)

Another issue with local indexes is that implementing sorting would involve a 
clusterwide merge sort. A distributed index is required to efficiently return 
the data in index order. I think this issue should be delayed for 0.8.0 when we 
have distributed indexes available: the indexes available in 0.7.0 are intended 
for filtering data.

As a multi-part solution, (imo) we should:
 # (optionally) Rename local indexes to "filter_indexes" or "filters"
 # Expose 0.8.0 distributed indexes as readonly column families which are 
sorted by the index value, and which are queried using get_range_slices
 # Implement LT/LTE/GT/GTE operations for the key-range in get_range_slices
Outcomes:
 * Your "primary" index expression would be consistently queried using the 
"range" parameter in get_range_slices and would define the sort order
 * "filters" (0.7.0 secondary indexes) would be applied using the IndexClause 
argument as described on CASSANDRA-1600

I'm going to open another ticket to suggest some changes to index definitions 
to make this consistent.


> Add sort/order support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Todd Nine
> Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1339) add support for GT, GTE index expressions

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919698#action_12919698
 ] 

Stu Hood commented on CASSANDRA-1339:
-

Actually, I'm not sure why a KEYS index would need to query more nodes for 
GT/GTE/LT/LTE than it does for EQ: locally, the operation is a merge of all 
index values that match the predicate, and the merged values should be 
completely consumed before querying the next node, right?

> add support for GT, GTE index expressions
> -
>
> Key: CASSANDRA-1339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1339
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
> Fix For: 0.7.1
>
>
> this will require hitting every node in the cluster and merging results, 
> unlike with EQ.
> For instance, say we have the following index rows for some hypothetical 
> column C1:
> Node 1, ('' - M]
> 4: A G K 
> 5: B F J M
> Node 2, (M - '']
> 4: N P X
> 5: Q R T
> Because we store the index columns sorted in partitioner order, queries for 
> C1=4 can scan first node 1, then if insufficient data is found, proceed to 
> node 2.  But for GT or GTE queries we have to scan everyone and merge.  
> (Since we don't know what the next value after 4 is.  So an alternative would 
> be for each node to send back, along with the data for the first row, the row 
> key that comes next.  This would be very very messy.)
> Note that since we don't yet support range scans backwards, we can't support 
> LT or LTE queries.  The easiest workaround there would be to add a way to 
> specify that you want to create an index on the comparator, reversed.  This 
> is also worth doing for optimizations within normal columns -- see 
> CASSANDRA-1338.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919695#action_12919695
 ] 

Stu Hood commented on CASSANDRA-1599:
-

This ticket should probably by titled "allow sorting by index value", since 
that is not yet possible, and the paging concerns are not valid until it is 
implemented.

> Add paging support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Todd Nine
> Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919695#action_12919695
 ] 

Stu Hood edited comment on CASSANDRA-1599 at 10/10/10 10:44 PM:


This ticket should probably be titled "allow sorting by index value", since 
that is not yet possible, and the paging concerns are not valid until it is 
implemented.

  was (Author: stuhood):
This ticket should probably by titled "allow sorting by index value", since 
that is not yet possible, and the paging concerns are not valid until it is 
implemented.
  
> Add paging support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Todd Nine
> Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices

2010-10-10 Thread Stu Hood (JIRA)
Merge get_indexed_slices with get_range_slices
--

 Key: CASSANDRA-1600
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Stu Hood
Priority: Critical
 Fix For: 0.7.0


>From a comment on 1157:
{quote}
IndexClause only has a start key for get_indexed_slices, but it would seem that 
the reasoning behind using 'KeyRange' for get_range_slices applies there as 
well, since if you know the range you care about in the primary index, you 
don't want to continue scanning until you exhaust 'count' (or the cluster).

Since it would appear that get_indexed_slices would benefit from a KeyRange, 
why not smash get_(range|indexed)_slices together, and make IndexClause an 
optional field on KeyRange?
{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1339) add support for GT, GTE index expressions

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919694#action_12919694
 ] 

Stu Hood commented on CASSANDRA-1339:
-

Note that CASSANDRA-1472 already implements LT/LTE/GT/GTE locally, and can 
_very_ easily perform boolean operations between them.

> add support for GT, GTE index expressions
> -
>
> Key: CASSANDRA-1339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1339
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
> Fix For: 0.7.1
>
>
> this will require hitting every node in the cluster and merging results, 
> unlike with EQ.
> For instance, say we have the following index rows for some hypothetical 
> column C1:
> Node 1, ('' - M]
> 4: A G K 
> 5: B F J M
> Node 2, (M - '']
> 4: N P X
> 5: Q R T
> Because we store the index columns sorted in partitioner order, queries for 
> C1=4 can scan first node 1, then if insufficient data is found, proceed to 
> node 2.  But for GT or GTE queries we have to scan everyone and merge.  
> (Since we don't know what the next value after 4 is.  So an alternative would 
> be for each node to send back, along with the data for the first row, the row 
> key that comes next.  This would be very very messy.)
> Note that since we don't yet support range scans backwards, we can't support 
> LT or LTE queries.  The easiest workaround there would be to add a way to 
> specify that you want to create an index on the comparator, reversed.  This 
> is also worth doing for optimizations within normal columns -- see 
> CASSANDRA-1338.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Stu Hood (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-1599:


Component/s: API

> Add paging support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Reporter: Todd Nine
> Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1598) Add Boolean Expression to secondary querying

2010-10-10 Thread Stu Hood (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-1598:


Component/s: (was: Core)
 API

> Add Boolean Expression to secondary querying
> 
>
> Key: CASSANDRA-1598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API
>Affects Versions: 0.7.0
>Reporter: Todd Nine
> Fix For: 0.8
>
>
> Add boolean operators similar to Lucene style searches.  Currently there is 
> implicit support for the && operator.  It would be helpful to also add 
> support for ||/Union operators.  I would envision this as the client would be 
> required to construct the expression tree and pass it via the thrift 
> interface.
> BooleanExpression --> BooleanOrIndexExpression
>  --> BooleanOperator
>  --> BooleanOrIndexExpression
> I'd like to take a crack at this since it will greatly improve my Datanucleus 
> plugin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919687#action_12919687
 ] 

Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM:


Consider a query similar to the following. 


email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate 
> today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and "diff" it with the expression tree 
provided.  I'm not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications once union and 
intersections are implemented in Cassandra.  

  was (Author: tnine):
Consider a query similar to the following. 


email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate 
> today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and "diff" it with the expression tree 
provided.  I'm not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications once union and 
intersections are implemented server side.  
  
> Add paging support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Todd Nine
> Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Todd Nine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919687#action_12919687
 ] 

Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM:


Consider a query similar to the following. 


email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate 
> today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and "diff" it with the expression tree 
provided.  I'm not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications once union and 
intersections are implemented server side.  

  was (Author: tnine):
Consider a query similar to the following. 


email == 'b...@gmail.com' && (lastlogindate > today - 5 days || newmessagedate 
> today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate 
over every field in the expression tree to determine what my start key should 
be for two index clauses.  While this is not impossible, this becomes very 
complex for large boolean operand trees.  As a user, this functionality would 
provide a clean interface that abstracts the user from the need to perform an 
analysis of the previous result set and "diff" it with the expression tree 
provided.  Not saying it's an absolute must have, but it would certainly 
provide a lot of appeal to users that are utilizing Cassandra as an eventually 
consistent storage mechanism for web based applications.
  
> Add paging support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Todd Nine
> Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919684#action_12919684
 ] 

Jonathan Ellis commented on CASSANDRA-1599:
---

how is this different from IndexClause.start_key?

> Add paging support for secondary indexing
> -
>
> Key: CASSANDRA-1599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Todd Nine
> Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  
> It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>-> end index
>-> page timestamp 
>-> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1598) Add Boolean Expression to secondary querying

2010-10-10 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1598:
--

Fix Version/s: 0.8

Tagging this fix for 0.8 since it's starting to get late in the game to break 
clients again for 0.7 and OR is trivial to do client-side.

> Add Boolean Expression to secondary querying
> 
>
> Key: CASSANDRA-1598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 0.7.0
>Reporter: Todd Nine
> Fix For: 0.8
>
>
> Add boolean operators similar to Lucene style searches.  Currently there is 
> implicit support for the && operator.  It would be helpful to also add 
> support for ||/Union operators.  I would envision this as the client would be 
> required to construct the expression tree and pass it via the thrift 
> interface.
> BooleanExpression --> BooleanOrIndexExpression
>  --> BooleanOperator
>  --> BooleanOrIndexExpression
> I'd like to take a crack at this since it will greatly improve my Datanucleus 
> plugin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1599) Add paging support for secondary indexing

2010-10-10 Thread Todd Nine (JIRA)
Add paging support for secondary indexing
-

 Key: CASSANDRA-1599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
 Project: Cassandra
  Issue Type: New Feature
Reporter: Todd Nine
 Fix For: 0.7.0


For a lot of users paging is a standard use case on many web applications.  It 
would be nice to allow paging as part of a Boolean Expression.

Page -> start index
   -> end index
   -> page timestamp 
   -> Sort Order


When sorting, is it possible to sort both ASC and DESC? 






-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1598) Add Boolean Expression to secondary querying

2010-10-10 Thread Todd Nine (JIRA)
Add Boolean Expression to secondary querying


 Key: CASSANDRA-1598
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.7.0
Reporter: Todd Nine


Add boolean operators similar to Lucene style searches.  Currently there is 
implicit support for the && operator.  It would be helpful to also add support 
for ||/Union operators.  I would envision this as the client would be required 
to construct the expression tree and pass it via the thrift interface.



BooleanExpression --> BooleanOrIndexExpression
 --> BooleanOperator
 --> BooleanOrIndexExpression


I'd like to take a crack at this since it will greatly improve my Datanucleus 
plugin


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1572) Faster index sampling

2010-10-10 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919640#action_12919640
 ] 

Stu Hood commented on CASSANDRA-1572:
-

> this raises the question of whether key cache saving should be on by default 
> (which it currently is)
I didn't test which portion of the gains came from 1. not allocating the byte 
array for the key, 2. not decoding/decorating the key, 3. the pre-allocation. 
If we decorated the members of the key-cache set as ByteBuffers rather than 
DecoratedKeys, we could still take advantage of (forms of) 1 and 2.

> Faster index sampling
> -
>
> Key: CASSANDRA-1572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1572
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Stu Hood
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: 
> 0001-Split-IndexSummary.maybeAddEntry-into-shouldAddEntry.patch, 
> 0002-Conditionally-decode-row-keys-during-SSTableReader.o.patch, 
> 0003-Add-FBUtilities.skipShortByteArray-and-use-to-minimi.patch, 
> 0004-Pre-allocate-indexPositions-to-minimize-resizing.patch, 
> 0005-Better-key-estimate-for-SSTable-load.patch
>
>
> some discussion on CASSANDRA-1526

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1576) Improve the I/O subsystem for ROW-READ stage

2010-10-10 Thread Chris Goffinet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919580#action_12919580
 ] 

Chris Goffinet commented on CASSANDRA-1576:
---

Apologies, I typo'd! I meant, asynchronous and synchronous. Long day when I 
wrote this.

I observed within the read stage. Yes on (2). I propose we look at using libaio 
so we can send batchs of requests as they come in.

http://www.kernel.org/pub/linux/kernel/people/suparna/aio-linux.pdf
http://lse.sourceforge.net/io/aio.html

> Improve the I/O subsystem for ROW-READ stage
> 
>
> Key: CASSANDRA-1576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.6.5, 0.7 beta 2
>Reporter: Chris Goffinet
>
> I did some profiling awhile ago, and noticed that there is quite a bit of 
> overhead that is happening in the ROW-READ stage of Cassandra. My testing was 
> on 0.6 branch. Jonathan mentioned there is endpoint snitch caching in 0.7. 
> One of the pain points is that we do synchronize I/O in our threads. I have 
> observed through profiling and other benchmarks, that even having a very 
> powerful machine (16-core Nehalem, 32GB of RAM), the amount of overhead of 
> going through to the page cache can still be between 2-3ms (with mmap). I 
> observed at least 800 microseconds more overhead if not using mmap. There is 
> definitely overhead in this stage. I propose we seriously consider moving to 
> doing Asynchronous I/O in each of these threads instead. 
> Imagine the following scenario:
> 3ms with mmap to read from page cache + 1.1ms of function call overhead 
> (observed google iterators in 0.6, could be much better in 0.7)
> That's 4.1ms per message. With 32 threads, at best the machine is only going 
> to be able to serve:
> 7,804 messages/s. 
> This number also means that all your data has to be in page cache. If you 
> start to dip into any set of data that isn't in cache, this number is going 
> to drop substantially, even if your hit rate was 99%.
> Anyone with a serious data set that is greater than the total page cache of 
> the cluster, is going to be victim of major slowdowns as soon as any requests 
> come in needing to fetch data not in cache. If you run without the Direct I/O 
> patch, and you actually have a pretty good write load, you can expect your 
> cluster to fall victim even more with page cache thrashing as new SSTables 
> are read/writen using compaction.
> All of these scenarios mentioned above were seen at Digg with 45-node 
> cluster, 16-core machines with a dataset larger than total page cache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.