[
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866083#comment-15866083
]
Hadoop QA commented on PHOENIX-3670:
------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12852592/PHOENIX-3670_v1.patch
against master branch at commit 7567fcd6d569a2ece7556c4e3a966a1baf34c3a5.
ATTACHMENT ID: 12852592
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified tests.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
43 warning messages.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 lineLengths{color}. The patch introduces the following lines
longer than 100:
+ result =
Bytes.BYTES_COMPARATOR.compare(rowKeyRange1.getUpperRange(),
rowKeyRange2.getUpperRange());
+ public static List<KeyRange> intersect(List<KeyRange> rowKeyRanges1,
List<KeyRange> rowKeyRanges2) {
+ private void doTestListIntersectWithOneResultRange(int start1,int end1,int
step1,int start2,int end2,int step2,boolean addEmptyRange) throws Exception {
+
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(i), true,
PInteger.INSTANCE.toBytes(i+step1), true));
+
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(i), true,
PInteger.INSTANCE.toBytes(i+step2), true));
+ private void doTestListIntersectWithMultiResultRange(int start1,int
count1,int step1,int start2,int count2,int step2,boolean addEmptyRange) throws
Exception {
+
listIntersectAndAssert(Arrays.asList(KeyRange.EMPTY_RANGE),Arrays.asList(KeyRange.EVERYTHING_RANGE),Arrays.asList(KeyRange.EMPTY_RANGE));
+ private static void listIntersectAndAssert(List<KeyRange>
rowKeyRanges1,List<KeyRange> rowKeyRanges2,List<KeyRange> expected) {
{color:red}-1 core tests{color}. The patch failed these unit tests:
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT
{color:red}-1 core zombie tests{color}. There are 12 zombie test(s):
Test results:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/770//testReport/
Javadoc warnings:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/770//artifact/patchprocess/patchJavadocWarnings.txt
Console output:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/770//console
This message is automatically generated.
> KeyRange.intersect(List<KeyRange> , List<KeyRange>) is inefficient,especially
> for join dynamic filter
> -----------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.9.0
> Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified),
> fact_table is a fact table, joining dimension table dim_table1 and
> dim_table2 :
> {code:borderStyle=solid}
> select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on
> t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code}
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small
> dataset, the sql executes quickly, but when the dataset is bigger, the sql
> becomes very slowly,when the row count of fact_table is 30
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above
> query costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid}
> select d1.id from dim_table1 d1 where d1.code = 2008
> {code}
> and RHS2 return 23881 rows:
> {code:borderStyle=solid}
> select d2.id from dim_table2 d2 where d2.region='us'
> {code}
> then HashJoinPlan uses KeyRange.intersect(List<KeyRange> , List<KeyRange> )
> method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing
> down fact_table.cust_id should be.
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql
> execution only costs 17s.After I read the code of KeyRange.intersect
> method,I find following two problem:
> (1) The double loop is inefficient in line 521 and line 522,when keyRanges
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my
> example,is 5523*23881:
> {code:borderStyle=solid}
> 519 public static List<KeyRange> intersect(List<KeyRange> keyRanges,
> List<KeyRange> keyRanges2) {
> 520 List<KeyRange> tmp = new ArrayList<KeyRange>();
> 521 for (KeyRange r1 : keyRanges) {
> 522 for (KeyRange r2 : keyRanges2) {
> 523 KeyRange r = r1.intersect(r2);
> 524 if (EMPTY_RANGE != r) {
> 525 tmp.add(r);
> 526 }
> 527 }
> 528 }
> {code}
> (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as
> KeyRange.coalesce method does:
> {code:borderStyle=solid}
> 532 Collections.sort(tmp, KeyRange.COMPARATOR);
> 533 List<KeyRange> tmp2 = new ArrayList<KeyRange>();
> 534 KeyRange r = tmp.get(0);
> 535 for (int i=1; i<tmp.size(); i++) {
> 536 if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537 tmp2.add(r);
> 538 r = tmp.get(i);
> 539 } else {
> 540 r = r.intersect(tmp.get(i));
> 541 }
> 542 }
> {code}
> and it seems that no unit tests for this KeyRange.intersect(List<KeyRange> ,
> List<KeyRange>) method.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)