Re: Indexes on heterogeneous rows

2011-04-17 Thread David Boxenhorn
Thanks, Jonathan. I think I understand now.

To sum up: Everything would work, but if your only equality is on type
(all the rest inequalities), it could be very inefficient.

Is that right?

On Thu, Apr 14, 2011 at 7:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Thu, Apr 14, 2011 at 6:48 AM, David Boxenhorn da...@taotown.com
 wrote:
  The reason why I put type first is that queries on type will
  always be an exact match, whereas the other clauses might be
 inequalities.

 Expression order doesn't matter, but as you imply, non-equalities
 can't be used in an index lookup and have to be checked in a nested
 loop phase afterwards.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Indexes on heterogeneous rows

2011-04-17 Thread Jonathan Ellis
Right.

On Sun, Apr 17, 2011 at 4:23 AM, David Boxenhorn da...@taotown.com wrote:
 Thanks, Jonathan. I think I understand now.

 To sum up: Everything would work, but if your only equality is on type
 (all the rest inequalities), it could be very inefficient.

 Is that right?

 On Thu, Apr 14, 2011 at 7:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Thu, Apr 14, 2011 at 6:48 AM, David Boxenhorn da...@taotown.com
 wrote:
  The reason why I put type first is that queries on type will
  always be an exact match, whereas the other clauses might be
  inequalities.

 Expression order doesn't matter, but as you imply, non-equalities
 can't be used in an index lookup and have to be checked in a nested
 loop phase afterwards.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Indexes on heterogeneous rows

2011-04-15 Thread Wangpei (Peter)
Does the get_indexed_slice in 0.7.4 version already do thing that way?
It seems always take the 1st indexed column with EQ.
Or is it a new feature of coming 0.7.5 or 0.8?

-邮件原件-
发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
发送时间: 2011年4月15日 0:21
收件人: user@cassandra.apache.org
抄送: David Boxenhorn; aaron morton
主题: Re: Indexes on heterogeneous rows

This should work reasonably well w/ 0.7 indexes. Cassandra tracks
statistics on index selectivity, so it would plan that query as index
lookup on e=5, then iterate over those results and return only rows
that also have type=2.

On Thu, Apr 14, 2011 at 5:33 AM, David Boxenhorn da...@taotown.com wrote:
 Thank you for your answer, and sorry about the sloppy terminology.

 I'm thinking of the scenario where there are a small number of results in
 the result set, but there are billions of rows in the first of your
 secondary indexes.

 That is, I want to do something like (not sure of the CQL syntax):

 select * where type=2 and e=5

 where there are billions of rows of type 2, but some manageable number of
 those rows have e=5.

 As I understand it, secondary indexes are like column families, where each
 value is a column. So the billions of rows where type=2 would go into a
 single row of the secondary index. This sounds like a problem to me, is it?

 I'm assuming that the billions of rows that don't have column e at all
 (those rows of other types) are not a problem at all...

 On Thu, Apr 14, 2011 at 12:12 PM, aaron morton aa...@thelastpickle.com
 wrote:

 Need to clear up some terminology here.
 Rows have a key and can be retrieved by key. This is *sort of* the primary
 index, but not primary in the normal RDBMS sense.
 Rows can have different columns and the column names are sorted and can be
 efficiently selected.
 There are secondary indexes in cassandra 0.7 based on column
 values http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
 So you could create secondary indexes on the a,e, and h columns and get
 rows that have specific values. There are some limitations to secondary
 indexes, read the linked article.
 Or you can make your own secondary indexes using row keys as the index
 values.
 If you have billions of rows, how many do you need to read back at once?
 Hope that helps
 Aaron

 On 14 Apr 2011, at 04:23, David Boxenhorn wrote:

 Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
 different sets of columns?

 For example, let's say you have three types of objects (1, 2, 3) which
 each had three members. If your rows had the following pattern

 type=1 a=? b=? c=?
 type=2 d=? e=? f=?
 type=3 g=? h=? i=?

 could you index type as your primary index, and also index a, e, h
 as secondary indexes, to get the objects of that type that you are looking
 for?

 Would it work if you had billions of rows of each type?






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Indexes on heterogeneous rows

2011-04-14 Thread David Boxenhorn
Thank you for your answer, and sorry about the sloppy terminology.

I'm thinking of the scenario where there are a small number of results in
the result set, but there are billions of rows in the first of your
secondary indexes.

That is, I want to do something like (not sure of the CQL syntax):

select * where type=2 and e=5

where there are billions of rows of type 2, but some manageable number of
those rows have e=5.

As I understand it, secondary indexes are like column families, where each
value is a column. So the billions of rows where type=2 would go into a
single row of the secondary index. This sounds like a problem to me, is it?


I'm assuming that the billions of rows that don't have column e at all
(those rows of other types) are not a problem at all...

On Thu, Apr 14, 2011 at 12:12 PM, aaron morton aa...@thelastpickle.comwrote:

 Need to clear up some terminology here.

 Rows have a key and can be retrieved by key. This is *sort of* the primary
 index, but not primary in the normal RDBMS sense.
 Rows can have different columns and the column names are sorted and can be
 efficiently selected.
 There are secondary indexes in cassandra 0.7 based on column values
 http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

 So you could create secondary indexes on the a,e, and h columns and get
 rows that have specific values. There are some limitations to secondary
 indexes, read the linked article.

 Or you can make your own secondary indexes using row keys as the index
 values.

 If you have billions of rows, how many do you need to read back at once?

 Hope that helps
 Aaron

 On 14 Apr 2011, at 04:23, David Boxenhorn wrote:

 Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
 different sets of columns?

 For example, let's say you have three types of objects (1, 2, 3) which each
 had three members. If your rows had the following pattern

 type=1 a=? b=? c=?
 type=2 d=? e=? f=?
 type=3 g=? h=? i=?

 could you index type as your primary index, and also index a, e, h
 as secondary indexes, to get the objects of that type that you are looking
 for?

 Would it work if you had billions of rows of each type?





Indexes on heterogeneous rows

2011-04-13 Thread David Boxenhorn
Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
different sets of columns?

For example, let's say you have three types of objects (1, 2, 3) which each
had three members. If your rows had the following pattern

type=1 a=? b=? c=?
type=2 d=? e=? f=?
type=3 g=? h=? i=?

could you index type as your primary index, and also index a, e, h
as secondary indexes, to get the objects of that type that you are looking
for?

Would it work if you had billions of rows of each type?