Re: Ask performance advice

2017-04-06 Thread NaHeon Kim
Thank you all.

1.
The other query is:

select keyword, sum(cnt) as count
from my_table
where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <=
to_timestamp('2017-04-03 23:00')
and *obj_id in ( obj_ids comes here... )*
and obj_grp = '100'
group by keyword
order by count desc;

This query is pretty fast because of the first two PKs.
The reason obj_id comes at second PK is, we first assumed 'select with
obj_id' will be called much more often, and then 'select with obj_grp' is
also required.

2.
4.8.0-HBase-1.1 version is used.
I'm testing on HBase server for our product, so that upgrading looks not
easy. : (
If there definitely exist index bug fixes, I could try though.

3.
With Jonathan's reply and optional PK, this schema looks worthy to try:
create table my_table (
   obj_grp varchar(50),
   obj_id varchar(50) not null,
   regymdt timestamp not null,
   keyword varchar not null,
   cnt integer,
   post_tp varchar(20)
   constraint pk primary key (obj_grp, obj_id, regymdt, keyword)
)

Thanks,
NaHeon




2017-04-05 15:17 GMT+09:00 Jonathan Leech :

> Also, your schema as defined seems to have a pretty common hbase
> anti-pattern, using a steadily increasing value as the first part of the
> rowkey (assuming your data is also loaded in order of time). This will lead
> to region hotspotting on the load, region splitting, etc. Additionally,
> your queries may not take advantage of the parallelism of the cluster if
> they end up on a small number of regions. If the queries are more likely to
> be on the most recent data, as is often the case, you'll end up with a
> single server doing the bulk the work in the cluster.
>
> - Jonathan
>
> On Apr 4, 2017, at 10:10 PM, James Taylor  wrote:
>
> What other queries do you want to be fast? What version of Phoenix are you
> using? Have you seen our new Tuning Guide [1]?
>
> You could try moving OBJ_ID to end of PK and adding OBJ_GRP as the
> leading PK column (it's fine to include it in PK even if it's optional).
>
> Your index should be used, though. Can you try doing an explain on the
> query with 4.10 and file a  JIRA if the index isn't being used?
>
> Thanks,
> James
>
> [1] http://phoenix.apache.org/tuning_guide.html
>
> On Tue, Apr 4, 2017 at 8:01 PM NaHeon Kim  wrote:
>
>> Hi all,
>>
>> My team has a Phoenix table containing over 30,000,000 rows and try to
>> speed up its query performance.
>> It'll be perfect if all queries could be done within 1~2 seconds.
>>
>> - table schema -
>>
>> CREATE TABLE MY_TABLE (
>>REGYMDT timestamp not null,
>>OBJ_ID varchar(50) not null,
>>KEYWORD varchar not null,
>>OBJ_GRP  varchar(50),
>>CNT integer,
>>POST_TP varchar(20),
>>CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD)
>> ) IMMUTABLE_ROWS = true;
>>
>> create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword)
>> include (cnt);
>>
>> - tuning-needed query -
>>
>> select keyword, sum(cnt) as count
>> from my_table
>> where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <=
>> to_timestamp('2017-04-03 23:00')
>> and obj_grp = '100'
>> group by keyword
>> order by count desc;
>>
>>
>> The reason why obj_grp isn't PK is that it's optional value.
>> I could use obj_grp's default value and propagate it to PK, but still not
>> think it's desired pattern.
>>
>> The query above takes more than 20 seconds!
>> It does *not use* objgrp_reg_kwd index, doing just regymdt range scan.
>>
>> Thanks in advance!
>> NaHeon
>>
>>


Re: Unexpected dynamic column issues

2017-04-06 Thread Samarth Jain
Thanks for reporting the issue, Dave. This has to do with the new column
mapping feature that we rolled out in 4.10. To disable it for your table,
please create your table like this:

create table TMP_SNACKS(k bigint primary key, c1 varchar)
COLUMN_ENCODED_BYTES=0;

I will file a JIRA and get a fix out in the next patch release.

On Wed, Apr 5, 2017 at 11:06 AM, Dave  wrote:

> Hello,
>
> I'm testing Phoenix 4.10 on HBase 1.2.5 and I'm confused about some
> behavior of dynamic columns.
>
> I am seeing different behavior with views than this post <
> https://lists.apache.org/thread.html/a3f16f3fd0c125775b48e7
> f8bab3af161fe3d6c2a0b1a9635780b49e@1456433748@%3Cuser.
> phoenix.apache.org%3E> which use Phoenix 4.7. I expected to be able to
> see the value for "page_title" in the view, but it is blank unless I UPSERT
> directly into the view:
>
> 0: jdbc:phoenix:localhost> create table TMP_SNACKS(k bigint primary key,
> c1 varchar);
> No rows affected (1.304 seconds)
>
> 0: jdbc:phoenix:localhost> upsert into TMP_SNACKS(k, c1, "page_title"
> varchar) values(1,'a','b');
> 1 row affected (0.048 seconds)
>
> 0: jdbc:phoenix:localhost> select * from TMP_SNACKS;
> ++-+
> | K  | C1  |
> ++-+
> | 1  | a   |
> ++-+
> 1 row selected (0.039 seconds)
>
> 0: jdbc:phoenix:localhost> select * from TMP_SNACKS("page_title" varchar);
> ++-+-+
> | K  | C1  | page_title  |
> ++-+-+
> | 1  | a   | b   |
> ++-+-+
> 1 row selected (0.061 seconds)
>
> 0: jdbc:phoenix:localhost> create view MY_VIEW("page_title" varchar) as
> select * from TMP_SNACKS;
> No rows affected (0.11 seconds)
>
> 0: jdbc:phoenix:localhost> select * from MY_VIEW;
> ++-+-+
> | K  | C1  | page_title  |
> ++-+-+
> | 1  | a   | |
> ++-+-+
> 1 row selected (0.056 seconds)
>
> Unexpected ---^
>
> 0: jdbc:phoenix:localhost> upsert into TMP_SNACKS(k, c1, "page_title"
> varchar) values(2,'a','c');
> 1 row affected (0.008 seconds)
>
> 0: jdbc:phoenix:localhost> select * from MY_VIEW;
> ++-+-+
> | K  | C1  | page_title  |
> ++-+-+
> | 1  | a   | |
> | 2  | a   | |
> ++-+-+
> 2 rows selected (0.048 seconds)
>
> Unexpected ---^
>
> 0: jdbc:phoenix:localhost> upsert into MY_VIEW(k, c1, "page_title")
> values(3,'b','c');
> 1 row affected (0.009 seconds)
>
> 0: jdbc:phoenix:localhost> select * from MY_VIEW;
> ++-+-+
> | K  | C1  | page_title  |
> ++-+-+
> | 1  | a   | |
> | 2  | a   | |
> | 3  | b   | c   |
> ++-+-+
> 3 rows selected (0.058 seconds)
>
> Unexpected ---^
>
> 0: jdbc:phoenix:localhost> select * from TMP_SNACKS("page_title" varchar);
> ++-+-+
> | K  | C1  | page_title  |
> ++-+-+
> | 1  | a   | c   |
> | 2  | a   | c   |
> | 3  | b   | |
> ++-+-+
> 3 rows selected (0.048 seconds)
>
> Unexpected ---^
>
>
> Is this the expected behavior for dynamic columns with views?
>
>
> Thanks!
>
> --
>
> Dave
>
>