Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Michael Segel
Whoa!

BAD BOY. This isn’t a good idea for secondary index. 

You have a row key (primary index) which is time. 
The secondary is a filter… with 3 choices. 

HINT: Do you really want a secondary index based on a field that only has 3 
choices for a value? 

What are they teaching in school these days? 

How about applying a server side filter?  ;-) 



On May 18, 2014, at 12:33 PM, John Hancock jhancock1...@gmail.com wrote:

 Shushant,
 
 Here's one idea, there might be better ways.
 
 Take a look at phoenix it supports secondary indexing:
 http://phoenix.incubator.apache.org/secondary_indexing.html
 
 -John
 
 
 On Sat, May 17, 2014 at 8:34 AM, Shushant Arora
 shushantaror...@gmail.comwrote:
 
 Hi
 
 I have a requirement to query my data base on date and user category.
 User category can be Supreme,Normal,Medium.
 
 I want to query how many new users are there in my table from date range
 (2014-01-01) to (2014-05-16) category wise.
 
 Another requirement is to query how many users of Supreme category are
 there in my table Broken down wise month in which they came.
 
 What should be my key
 1.If i take key as combination of date#category. I cannot query based on
 category?
 2.If I take key as category#date I cannot query based on date.
 
 
 Thanks
 Shushant.
 



Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Shushant Arora
I cannot apply server side filter.
2nd requirement is not just get users with supreme category rather
distribution of users category wise.

1.How many of supreme , how many of normal and how many of medium till date.


On Mon, May 19, 2014 at 12:58 PM, Michael Segel
michael_se...@hotmail.comwrote:

 Whoa!

 BAD BOY. This isn’t a good idea for secondary index.

 You have a row key (primary index) which is time.
 The secondary is a filter… with 3 choices.

 HINT: Do you really want a secondary index based on a field that only has
 3 choices for a value?

 What are they teaching in school these days?

 How about applying a server side filter?  ;-)



 On May 18, 2014, at 12:33 PM, John Hancock jhancock1...@gmail.com wrote:

  Shushant,
 
  Here's one idea, there might be better ways.
 
  Take a look at phoenix it supports secondary indexing:
  http://phoenix.incubator.apache.org/secondary_indexing.html
 
  -John
 
 
  On Sat, May 17, 2014 at 8:34 AM, Shushant Arora
  shushantaror...@gmail.comwrote:
 
  Hi
 
  I have a requirement to query my data base on date and user category.
  User category can be Supreme,Normal,Medium.
 
  I want to query how many new users are there in my table from date range
  (2014-01-01) to (2014-05-16) category wise.
 
  Another requirement is to query how many users of Supreme category are
  there in my table Broken down wise month in which they came.
 
  What should be my key
  1.If i take key as combination of date#category. I cannot query based on
  category?
  2.If I take key as category#date I cannot query based on date.
 
 
  Thanks
  Shushant.
 




Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Michael Segel
The point is that choosing a field that has a small finite set of values is not 
a good candidate for indexing using an inverted table or b-tree etc … 

I’d say that you’re actually going to be better off using a scan with a start 
and stop row, then doing the counts on the client side. 

So as you get back your result set… you process the data. (Either in a M/R job 
or single client thread.) 

HTH

On May 19, 2014, at 8:48 AM, Shushant Arora shushantaror...@gmail.com wrote:

 I cannot apply server side filter.
 2nd requirement is not just get users with supreme category rather
 distribution of users category wise.
 
 1.How many of supreme , how many of normal and how many of medium till date.
 
 
 On Mon, May 19, 2014 at 12:58 PM, Michael Segel
 michael_se...@hotmail.comwrote:
 
 Whoa!
 
 BAD BOY. This isn’t a good idea for secondary index.
 
 You have a row key (primary index) which is time.
 The secondary is a filter… with 3 choices.
 
 HINT: Do you really want a secondary index based on a field that only has
 3 choices for a value?
 
 What are they teaching in school these days?
 
 How about applying a server side filter?  ;-)
 
 
 
 On May 18, 2014, at 12:33 PM, John Hancock jhancock1...@gmail.com wrote:
 
 Shushant,
 
 Here's one idea, there might be better ways.
 
 Take a look at phoenix it supports secondary indexing:
 http://phoenix.incubator.apache.org/secondary_indexing.html
 
 -John
 
 
 On Sat, May 17, 2014 at 8:34 AM, Shushant Arora
 shushantaror...@gmail.comwrote:
 
 Hi
 
 I have a requirement to query my data base on date and user category.
 User category can be Supreme,Normal,Medium.
 
 I want to query how many new users are there in my table from date range
 (2014-01-01) to (2014-05-16) category wise.
 
 Another requirement is to query how many users of Supreme category are
 there in my table Broken down wise month in which they came.
 
 What should be my key
 1.If i take key as combination of date#category. I cannot query based on
 category?
 2.If I take key as category#date I cannot query based on date.
 
 
 Thanks
 Shushant.
 
 
 



Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Shushant Arora
Ok..but what if I have 2 multivalue dimensions on which I have to analyse
no of users. Say Category can have 50 values and another dimension is
country of user(say 100+ values). I need weekly count on category and
country + I need overall distinct user count on category and country.

How to achieve this in Hbase.


On Mon, May 19, 2014 at 3:11 PM, Michael Segel michael_se...@hotmail.comwrote:

 The point is that choosing a field that has a small finite set of values
 is not a good candidate for indexing using an inverted table or b-tree etc …

 I’d say that you’re actually going to be better off using a scan with a
 start and stop row, then doing the counts on the client side.

 So as you get back your result set… you process the data. (Either in a M/R
 job or single client thread.)

 HTH

 On May 19, 2014, at 8:48 AM, Shushant Arora shushantaror...@gmail.com
 wrote:

  I cannot apply server side filter.
  2nd requirement is not just get users with supreme category rather
  distribution of users category wise.
 
  1.How many of supreme , how many of normal and how many of medium till
 date.
 
 
  On Mon, May 19, 2014 at 12:58 PM, Michael Segel
  michael_se...@hotmail.comwrote:
 
  Whoa!
 
  BAD BOY. This isn’t a good idea for secondary index.
 
  You have a row key (primary index) which is time.
  The secondary is a filter… with 3 choices.
 
  HINT: Do you really want a secondary index based on a field that only
 has
  3 choices for a value?
 
  What are they teaching in school these days?
 
  How about applying a server side filter?  ;-)
 
 
 
  On May 18, 2014, at 12:33 PM, John Hancock jhancock1...@gmail.com
 wrote:
 
  Shushant,
 
  Here's one idea, there might be better ways.
 
  Take a look at phoenix it supports secondary indexing:
  http://phoenix.incubator.apache.org/secondary_indexing.html
 
  -John
 
 
  On Sat, May 17, 2014 at 8:34 AM, Shushant Arora
  shushantaror...@gmail.comwrote:
 
  Hi
 
  I have a requirement to query my data base on date and user category.
  User category can be Supreme,Normal,Medium.
 
  I want to query how many new users are there in my table from date
 range
  (2014-01-01) to (2014-05-16) category wise.
 
  Another requirement is to query how many users of Supreme category are
  there in my table Broken down wise month in which they came.
 
  What should be my key
  1.If i take key as combination of date#category. I cannot query based
 on
  category?
  2.If I take key as category#date I cannot query based on date.
 
 
  Thanks
  Shushant.
 
 
 




Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread James Taylor
If you use Phoenix, queries would leverage our Skip Scan:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html

Assuming a row key made up of a low cardinality first value (like a byte
representing an enum), followed by a high cardinality second value (like a
date/time value) you'd get a large benefit from the skip scan when you're
only looking a small sliver of your time range.

Another option would be to create a secondary index over your second+first
column: http://phoenix.incubator.apache.org/secondary_indexing.html

Thanks,
James

On May 19, 2014, at 6:47 AM, Shushant Arora shushantaror...@gmail.com
wrote:

Ok..but what if I have 2 multivalue dimensions on which I have to analyse
no of users. Say Category can have 50 values and another dimension is
country of user(say 100+ values). I need weekly count on category and
country + I need overall distinct user count on category and country.

How to achieve this in Hbase.


On Mon, May 19, 2014 at 3:11 PM, Michael Segel michael_se...@hotmail.com
wrote:

The point is that choosing a field that has a small finite set of values

is not a good candidate for indexing using an inverted table or b-tree etc …


I’d say that you’re actually going to be better off using a scan with a

start and stop row, then doing the counts on the client side.


So as you get back your result set… you process the data. (Either in a M/R

job or single client thread.)


HTH


On May 19, 2014, at 8:48 AM, Shushant Arora shushantaror...@gmail.com

wrote:


I cannot apply server side filter.

2nd requirement is not just get users with supreme category rather

distribution of users category wise.


1.How many of supreme , how many of normal and how many of medium till

date.



On Mon, May 19, 2014 at 12:58 PM, Michael Segel

michael_se...@hotmail.comwrote:


Whoa!


BAD BOY. This isn’t a good idea for secondary index.


You have a row key (primary index) which is time.

The secondary is a filter… with 3 choices.


HINT: Do you really want a secondary index based on a field that only

has

3 choices for a value?


What are they teaching in school these days?


How about applying a server side filter?  ;-)




On May 18, 2014, at 12:33 PM, John Hancock jhancock1...@gmail.com

wrote:


Shushant,


Here's one idea, there might be better ways.


Take a look at phoenix it supports secondary indexing:

http://phoenix.incubator.apache.org/secondary_indexing.html


-John



On Sat, May 17, 2014 at 8:34 AM, Shushant Arora

shushantaror...@gmail.comwrote:


Hi


I have a requirement to query my data base on date and user category.

User category can be Supreme,Normal,Medium.


I want to query how many new users are there in my table from date

range

(2014-01-01) to (2014-05-16) category wise.


Another requirement is to query how many users of Supreme category are

there in my table Broken down wise month in which they came.


What should be my key

1.If i take key as combination of date#category. I cannot query based

on

category?

2.If I take key as category#date I cannot query based on date.



Thanks

Shushant.


RE: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Vladimir Rodionov
 I cannot apply server side filter.

Why is that? Are you using stock HBase or some other, API - compatible 
product?


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com

From: Shushant Arora [shushantaror...@gmail.com]
Sent: Monday, May 19, 2014 12:48 AM
To: user@hbase.apache.org
Subject: Re: hbase key design to efficient query on base of 2 or more column

I cannot apply server side filter.
2nd requirement is not just get users with supreme category rather
distribution of users category wise.

1.How many of supreme , how many of normal and how many of medium till date.



Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Shushant Arora
By server side filter you mean to partition the data across multiple hbase
table one for each category or something else?


On Mon, May 19, 2014 at 11:05 PM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

  I cannot apply server side filter.

 Why is that? Are you using stock HBase or some other, API - compatible
 product?


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 From: Shushant Arora [shushantaror...@gmail.com]
 Sent: Monday, May 19, 2014 12:48 AM
 To: user@hbase.apache.org
 Subject: Re: hbase key design to efficient query on base of 2 or more
 column

 I cannot apply server side filter.
 2nd requirement is not just get users with supreme category rather
 distribution of users category wise.

 1.How many of supreme , how many of normal and how many of medium till
 date.



 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.



RE: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Vladimir Rodionov
Nope. Filter allows you to customize Scan or Get operation. See HBase java-doc 
for org.apache.hadoop.hbase.filter.Filter class

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Shushant Arora [shushantaror...@gmail.com]
Sent: Monday, May 19, 2014 10:44 AM
To: user@hbase.apache.org
Subject: Re: hbase key design to efficient query on base of 2 or more column

By server side filter you mean to partition the data across multiple hbase
table one for each category or something else?


On Mon, May 19, 2014 at 11:05 PM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

  I cannot apply server side filter.

 Why is that? Are you using stock HBase or some other, API - compatible
 product?


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 From: Shushant Arora [shushantaror...@gmail.com]
 Sent: Monday, May 19, 2014 12:48 AM
 To: user@hbase.apache.org
 Subject: Re: hbase key design to efficient query on base of 2 or more
 column

 I cannot apply server side filter.
 2nd requirement is not just get users with supreme category rather
 distribution of users category wise.

 1.How many of supreme , how many of normal and how many of medium till
 date.



 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.


Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


hbase key design to efficient query on base of 2 or more column

2014-05-17 Thread Shushant Arora
Hi

I have a requirement to query my data base on date and user category.
User category can be Supreme,Normal,Medium.

I want to query how many new users are there in my table from date range
(2014-01-01) to (2014-05-16) category wise.

Another requirement is to query how many users of Supreme category are
there in my table Broken down wise month in which they came.

What should be my key
1.If i take key as combination of date#category. I cannot query based on
category?
2.If I take key as category#date I cannot query based on date.


Thanks
Shushant.