Re: Ask performance advice

2017-04-06 Thread NaHeon Kim
Thank you all.

1.
The other query is:

select keyword, sum(cnt) as count
from my_table
where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <=
to_timestamp('2017-04-03 23:00')
and *obj_id in ( obj_ids comes here... )*
and obj_grp = '100'
group by keyword
order by count desc;

This query is pretty fast because of the first two PKs.
The reason obj_id comes at second PK is, we first assumed 'select with
obj_id' will be called much more often, and then 'select with obj_grp' is
also required.

2.
4.8.0-HBase-1.1 version is used.
I'm testing on HBase server for our product, so that upgrading looks not
easy. : (
If there definitely exist index bug fixes, I could try though.

3.
With Jonathan's reply and optional PK, this schema looks worthy to try:
create table my_table (
   obj_grp varchar(50),
   obj_id varchar(50) not null,
   regymdt timestamp not null,
   keyword varchar not null,
   cnt integer,
   post_tp varchar(20)
   constraint pk primary key (obj_grp, obj_id, regymdt, keyword)
)

Thanks,
NaHeon




2017-04-05 15:17 GMT+09:00 Jonathan Leech :

> Also, your schema as defined seems to have a pretty common hbase
> anti-pattern, using a steadily increasing value as the first part of the
> rowkey (assuming your data is also loaded in order of time). This will lead
> to region hotspotting on the load, region splitting, etc. Additionally,
> your queries may not take advantage of the parallelism of the cluster if
> they end up on a small number of regions. If the queries are more likely to
> be on the most recent data, as is often the case, you'll end up with a
> single server doing the bulk the work in the cluster.
>
> - Jonathan
>
> On Apr 4, 2017, at 10:10 PM, James Taylor  wrote:
>
> What other queries do you want to be fast? What version of Phoenix are you
> using? Have you seen our new Tuning Guide [1]?
>
> You could try moving OBJ_ID to end of PK and adding OBJ_GRP as the
> leading PK column (it's fine to include it in PK even if it's optional).
>
> Your index should be used, though. Can you try doing an explain on the
> query with 4.10 and file a  JIRA if the index isn't being used?
>
> Thanks,
> James
>
> [1] http://phoenix.apache.org/tuning_guide.html
>
> On Tue, Apr 4, 2017 at 8:01 PM NaHeon Kim  wrote:
>
>> Hi all,
>>
>> My team has a Phoenix table containing over 30,000,000 rows and try to
>> speed up its query performance.
>> It'll be perfect if all queries could be done within 1~2 seconds.
>>
>> - table schema -
>>
>> CREATE TABLE MY_TABLE (
>>REGYMDT timestamp not null,
>>OBJ_ID varchar(50) not null,
>>KEYWORD varchar not null,
>>OBJ_GRP  varchar(50),
>>CNT integer,
>>POST_TP varchar(20),
>>CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD)
>> ) IMMUTABLE_ROWS = true;
>>
>> create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword)
>> include (cnt);
>>
>> - tuning-needed query -
>>
>> select keyword, sum(cnt) as count
>> from my_table
>> where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <=
>> to_timestamp('2017-04-03 23:00')
>> and obj_grp = '100'
>> group by keyword
>> order by count desc;
>>
>>
>> The reason why obj_grp isn't PK is that it's optional value.
>> I could use obj_grp's default value and propagate it to PK, but still not
>> think it's desired pattern.
>>
>> The query above takes more than 20 seconds!
>> It does *not use* objgrp_reg_kwd index, doing just regymdt range scan.
>>
>> Thanks in advance!
>> NaHeon
>>
>>


Re: Ask performance advice

2017-04-04 Thread James Taylor
What other queries do you want to be fast? What version of Phoenix are you
using? Have you seen our new Tuning Guide [1]?

You could try moving OBJ_ID to end of PK and adding OBJ_GRP as the leading
PK column (it's fine to include it in PK even if it's optional).

Your index should be used, though. Can you try doing an explain on the
query with 4.10 and file a  JIRA if the index isn't being used?

Thanks,
James

[1] http://phoenix.apache.org/tuning_guide.html

On Tue, Apr 4, 2017 at 8:01 PM NaHeon Kim  wrote:

> Hi all,
>
> My team has a Phoenix table containing over 30,000,000 rows and try to
> speed up its query performance.
> It'll be perfect if all queries could be done within 1~2 seconds.
>
> - table schema -
>
> CREATE TABLE MY_TABLE (
>REGYMDT timestamp not null,
>OBJ_ID varchar(50) not null,
>KEYWORD varchar not null,
>OBJ_GRP  varchar(50),
>CNT integer,
>POST_TP varchar(20),
>CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD)
> ) IMMUTABLE_ROWS = true;
>
> create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword) include
> (cnt);
>
> - tuning-needed query -
>
> select keyword, sum(cnt) as count
> from my_table
> where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <=
> to_timestamp('2017-04-03 23:00')
> and obj_grp = '100'
> group by keyword
> order by count desc;
>
>
> The reason why obj_grp isn't PK is that it's optional value.
> I could use obj_grp's default value and propagate it to PK, but still not
> think it's desired pattern.
>
> The query above takes more than 20 seconds!
> It does *not use* objgrp_reg_kwd index, doing just regymdt range scan.
>
> Thanks in advance!
> NaHeon
>
>


Ask performance advice

2017-04-04 Thread NaHeon Kim
Hi all,

My team has a Phoenix table containing over 30,000,000 rows and try to
speed up its query performance.
It'll be perfect if all queries could be done within 1~2 seconds.

- table schema -

CREATE TABLE MY_TABLE (
   REGYMDT timestamp not null,
   OBJ_ID varchar(50) not null,
   KEYWORD varchar not null,
   OBJ_GRP varchar(50),
   CNT integer,
   POST_TP varchar(20),
   CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD)
) IMMUTABLE_ROWS = true;

create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword) include
(cnt);

- tuning-needed query -

select keyword, sum(cnt) as count
from my_table
where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <=
to_timestamp('2017-04-03 23:00')
and obj_grp = '100'
group by keyword
order by count desc;


The reason why obj_grp isn't PK is that it's optional value.
I could use obj_grp's default value and propagate it to PK, but still not
think it's desired pattern.

The query above takes more than 20 seconds!
It does *not use* objgrp_reg_kwd index, doing just regymdt range scan.

Thanks in advance!
NaHeon