Re: question about secondary index or not
There is a aubtle difference between work well amd efficient design. Say you add this index, that is a huge cost on disk just because cql may not allow the where clause you want. Shameless plug but this is why i worked on intravert...server side paging may be the right answer here. I plan on opening that work all up again and finding a way to get it merged into cassandra. On Wednesday, January 29, 2014, Mullen, Robert robert.mul...@pearson.com wrote: Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but I guess that's not always the case. peace, Rob On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: question about secondary index or not
I am sure there will be other attributes associated with employee. Reading and throwing away records on the client is not good. Better maintain another column family that holds reference to only male employees. This will make your pagination logic simple on the client side without wasting resources on server or client side. My experience with secondary indexes was also not good. My own index CF gave 100% better performance than secondary index for the same usecase and result. On Thu, Jan 30, 2014 at 6:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: There is a aubtle difference between work well amd efficient design. Say you add this index, that is a huge cost on disk just because cql may not allow the where clause you want. Shameless plug but this is why i worked on intravert...server side paging may be the right answer here. I plan on opening that work all up again and finding a way to get it merged into cassandra. On Wednesday, January 29, 2014, Mullen, Robert robert.mul...@pearson.com wrote: Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but I guess that's not always the case. peace, Rob On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks -- Sorry this was sent from mobile. Will do less grammar and spell check than usual. -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: question about secondary index or not
Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: question about secondary index or not
Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but I guess that's not always the case. peace, Rob On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: question about secondary index or not
Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
question about secondary index or not
I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: question about secondary index or not
I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: question about secondary index or not
in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.comwrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks