Re: question about secondary index or not

2014-01-30 Thread Edward Capriolo
There is a aubtle difference between work well amd efficient design.

Say you add this index, that is a huge cost on disk just because cql may
not allow the where clause you want.

Shameless plug but this is why i worked on intravert...server side paging
may be the right answer here. I plan on opening that work all up again and
finding a way to get it merged into cassandra.

On Wednesday, January 29, 2014, Mullen, Robert robert.mul...@pearson.com
wrote:
 Thanks for that info ondrej, I've never tested out secondary indexes as
I've avoided them because of all the uncertainty around them, and your
statement just adds to the uncertainty.  Everything I had read said that
secondary indexes were supposed to work well for columns with low
cardinality, but I guess that's not always the case.
 peace,
 Rob

 On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote:

 Hi,
 we had a similar use case. Just do the filtering client-side, the #2
example performs horribly, secondary indexes on something dividing the set
into two roughly the same size subsets just don't work.
 Give it a try on localhost with just a couple of records (150.000), you
will see.
 regards,
 ondrej

 On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 in my #2 example:
 select * from people where company_id='xxx' and gender='male'
 I already specify the first part of the primary key(row key) in my
where clause, so how does the secondary indexed column gender='male help
determine which row to return? It is more like filtering a list of column
from a row(which is exactly I can do that in #1 example).
 But then if I don't create index first, the cql statement will run into
syntax error.



 On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
robert.mul...@pearson.com wrote:

 I would do #2.   Take a look at this blog which talks about secondary
indexes, cardinality, and what it means for cassandra.   Secondary indexes
in cassandra are a different beast, so often old rules of thumb about
indexes don't apply.   http://www.wentnet.com/blog/?p=77

 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo 
edlinuxg...@gmail.com wrote:

 Generally indexes on binary fields true/false male/female are not
terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com
wrote:

 I have a simple column family like the following
 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );
 if I want to find out all the male employee given a company id, I
can do
 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
gender column value equal to male
 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'

 I though #2 seems more appropriate, but I also thought the secondary
index is helping only locating the primary row key, with the select clause
in #2, is it more efficient than #1 where application responsible loop
through the result and filter the right content?
 (
 It totally make sense if I only need to find out all the male
employee(and not within a company) by using
 select * from people where gender='male
 )
 thanks






-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: question about secondary index or not

2014-01-30 Thread Narendra Sharma
I am sure there will be other attributes associated with employee. Reading
and throwing away records on the client is not good.

Better maintain another column family that holds reference to only male
employees. This will make your pagination logic simple on the client side
without wasting resources on server or client side.

My experience with secondary indexes was also not good. My own index CF
gave 100% better performance than secondary index for the same usecase and
result.




On Thu, Jan 30, 2014 at 6:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 There is a aubtle difference between work well amd efficient design.

 Say you add this index, that is a huge cost on disk just because cql may
 not allow the where clause you want.

 Shameless plug but this is why i worked on intravert...server side paging
 may be the right answer here. I plan on opening that work all up again and
 finding a way to get it merged into cassandra.


 On Wednesday, January 29, 2014, Mullen, Robert robert.mul...@pearson.com
 wrote:
  Thanks for that info ondrej, I've never tested out secondary indexes as
 I've avoided them because of all the uncertainty around them, and your
 statement just adds to the uncertainty.  Everything I had read said that
 secondary indexes were supposed to work well for columns with low
 cardinality, but I guess that's not always the case.
  peace,
  Rob
 
  On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com
 wrote:
 
  Hi,
  we had a similar use case. Just do the filtering client-side, the #2
 example performs horribly, secondary indexes on something dividing the set
 into two roughly the same size subsets just don't work.
  Give it a try on localhost with just a couple of records (150.000), you
 will see.
  regards,
  ondrej
 
  On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com
 wrote:
 
  in my #2 example:
  select * from people where company_id='xxx' and gender='male'
  I already specify the first part of the primary key(row key) in my
 where clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
  But then if I don't create index first, the cql statement will run
 into syntax error.
 
 
 
  On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:
 
  I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77
 
  On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo 
 edlinuxg...@gmail.com wrote:
 
  Generally indexes on binary fields true/false male/female are not
 terrible effective.
 
 
  On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com
 wrote:
 
  I have a simple column family like the following
  create table people(
  company_id text,
  employee_id text,
  gender text,
  primary key(company_id, employee_id)
  );
  if I want to find out all the male employee given a company id, I
 can do
  1/
  select * from people where company_id='
  and loop through the result efficiently to pick the employee who
 has gender column value equal to male
  2/
  add a seconday index
  create index gender_index on people(gender)
  select * from people where company_id='xxx' and gender='male'
 
  I though #2 seems more appropriate, but I also thought the
 secondary index is helping only locating the primary row key, with the
 select clause in #2, is it more efficient than #1 where application
 responsible loop through the result and filter the right content?
  (
  It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
  select * from people where gender='male
  )
  thanks
 
 
 
 
 

 --
 Sorry this was sent from mobile. Will do less grammar and spell check than
 usual.




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: question about secondary index or not

2014-01-29 Thread Ondřej Černoš
Hi,

we had a similar use case. Just do the filtering client-side, the #2
example performs horribly, secondary indexes on something dividing the set
into two roughly the same size subsets just don't work.

Give it a try on localhost with just a couple of records (150.000), you
will see.

regards,

ondrej


On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 in my #2 example:
 select * from people where company_id='xxx' and gender='male'

 I already specify the first part of the primary key(row key) in my where
 clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
 But then if I don't create index first, the cql statement will run into
 syntax error.




 On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77


 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Generally indexes on binary fields true/false male/female are not
 terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I can
 do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks







Re: question about secondary index or not

2014-01-29 Thread Mullen, Robert
Thanks for that info ondrej, I've never tested out secondary indexes as
I've avoided them because of all the uncertainty around them, and your
statement just adds to the uncertainty.  Everything I had read said that
secondary indexes were supposed to work well for columns with low
cardinality, but I guess that's not always the case.

peace,
Rob


On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote:

 Hi,

 we had a similar use case. Just do the filtering client-side, the #2
 example performs horribly, secondary indexes on something dividing the set
 into two roughly the same size subsets just don't work.

 Give it a try on localhost with just a couple of records (150.000), you
 will see.

 regards,

 ondrej


 On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 in my #2 example:
 select * from people where company_id='xxx' and gender='male'

 I already specify the first part of the primary key(row key) in my where
 clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
 But then if I don't create index first, the cql statement will run into
 syntax error.




 On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77


 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 Generally indexes on binary fields true/false male/female are not
 terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I
 can do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks








Re: question about secondary index or not

2014-01-28 Thread Edward Capriolo
Generally indexes on binary fields true/false male/female are not terrible
effective.


On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I can do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary index
 is helping only locating the primary row key, with the select clause in #2,
 is it more efficient than #1 where application responsible loop through the
 result and filter the right content?

 (
 It totally make sense if I only need to find out all the male employee(and
 not within a company) by using
 select * from people where gender='male
 )

 thanks



question about secondary index or not

2014-01-28 Thread Jimmy Lin
I have a simple column family like the following

create table people(
company_id text,
employee_id text,
gender text,
primary key(company_id, employee_id)
);

if I want to find out all the male employee given a company id, I can do

1/
select * from people where company_id='
and loop through the result efficiently to pick the employee who has gender
column value equal to male

2/
add a seconday index
create index gender_index on people(gender)
select * from people where company_id='xxx' and gender='male'


I though #2 seems more appropriate, but I also thought the secondary index
is helping only locating the primary row key, with the select clause in #2,
is it more efficient than #1 where application responsible loop through the
result and filter the right content?

(
It totally make sense if I only need to find out all the male employee(and
not within a company) by using
select * from people where gender='male
)

thanks


Re: question about secondary index or not

2014-01-28 Thread Mullen, Robert
I would do #2.   Take a look at this blog which talks about secondary
indexes, cardinality, and what it means for cassandra.   Secondary indexes
in cassandra are a different beast, so often old rules of thumb about
indexes don't apply.   http://www.wentnet.com/blog/?p=77


On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Generally indexes on binary fields true/false male/female are not terrible
 effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I can do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks





Re: question about secondary index or not

2014-01-28 Thread Jimmy Lin
in my #2 example:
select * from people where company_id='xxx' and gender='male'

I already specify the first part of the primary key(row key) in my where
clause, so how does the secondary indexed column gender='male help
determine which row to return? It is more like filtering a list of column
from a row(which is exactly I can do that in #1 example).
But then if I don't create index first, the cql statement will run into
syntax error.




On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77


 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Generally indexes on binary fields true/false male/female are not
 terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I can
 do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks