[
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113186#comment-14113186
]
Maryann Xue commented on PHOENIX-852:
-------------------------------------
Did a local performance test:
*Group1:*
CREATE TABLE T_1000 (
mypk VARCHAR(10) NOT NULL PRIMARY KEY,
CF.column1 varchar(10),
CF.column2 varchar(10),
CF.column3 varchar(10));
CREATE TABLE T_5000000 (
mypk VARCHAR(10) NOT NULL PRIMARY KEY,
CF.column1 varchar(10),
CF.column2 varchar(10),
CF.column3 varchar(10));
T_1000:
mypk: string value of 0 ~ 999
column3: string value of random int between 0 ~ 9999
T_5000000:
mypk: string value of 0 ~ 4999999
||Query||Before||After||Scan Type
|select count(\*) from T_5000000 t1 join T_1000 t2 on t1.mypk =
t2.column3;|8.1|8.2|RANGE|
|select count(\*) from T_5000000 t1 join T_1000 t2 on t1.mypk = t2.column3 and
t2.column3 != '9859';|8.1|0.40|SKIP|
|select count(\*) from T_5000000 t1 join T_1000 t2 on t1.mypk = t2.column3 and
t2.column3 = '9859'; |7.9|0.23|SKIP|
*Group2:*
CREATE TABLE T_1000_INT (
mypk INTEGER NOT NULL PRIMARY KEY,
CF.column1 INTEGER,
CF.column2 INTEGER,
CF.column3 INTEGER);
CREATE TABLE T_5000000_INT (
mypk INTEGER NOT NULL PRIMARY KEY,
CF.column1 INTEGER,
CF.column2 INTEGER,
CF.column3 INTEGER);
T_1000:
mypk: 0 ~ 999
column3: random int between 0 ~ 9999
T_5000000:
mypk: 0 ~ 4999999
||Query||Before||After||Scan Type||
|select count(\*) from T_5000000_INT t1 join T_1000_INT t2 on t1.mypk =
t2.column3;|8.1|0.28|RANGE|
|select count(\*) from T_5000000_INT t1 join T_1000_INT t2 on t1.mypk =
t2.column3 and t2.column3 != 6768;|8.1|0.41|SKIP|
|select count(\*) from T_5000000_INT t1 join T_1000_INT t2 on t1.mypk =
t2.column3 and t2.column3 = 6768;|7.9|0.23|SKIP|
The reason why I did two groups here is, as you can see, the first query in
group 1 did not benefit from the optimization because by default this query
goes range scan optimization by default (due to no filters) and the range turns
out to be close to full range in string order. But with group two, where we
have a natural integer order, the range is narrowed down to just ~10000 rows.
> Optimize child/parent foreign key joins
> ---------------------------------------
>
> Key: PHOENIX-852
> URL: https://issues.apache.org/jira/browse/PHOENIX-852
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Maryann Xue
> Attachments: 852-2.patch, 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm
> would do a full scan of one side or the other. We can do much better than
> that if the HashCache contains the PK (or even part of the PK) from the table
> being joined to. In these cases, we should drive the second scan through a
> skip scan on the server side.
--
This message was sent by Atlassian JIRA
(v6.2#6252)