[ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110890#comment-14110890
 ] 

James Taylor commented on PHOENIX-852:
--------------------------------------

And instead of the code below, it'd be better if you called into WhereOptimizer 
to get this information. There are cases beyond a simple column reference that 
we'd be able to optimize (like a SUBSTR(rowKeyCol, 1, 3), or even a row value 
constructor construct):

{code}
+    private Pair<Expression, Expression> 
extractKeyRangeExpressions(StatementContext context, PTable table, JoinType 
type, final List<Expression> joinExpressions, final List<Expression> 
hashExpressions) {
+        if (type != JoinType.Inner)
+            return new Pair<Expression, Expression>(null, null);
+        
+        List<Integer> rowkeyColumnIndexes = Lists.newArrayList();
+        for (int i = 0; i < joinExpressions.size(); i++) {
+            Expression joinExpression = joinExpressions.get(i);
+            if (joinExpression instanceof RowKeyColumnExpression) {
+                rowkeyColumnIndexes.add(i);
+            }
+        }
+        Collections.sort(rowkeyColumnIndexes, new Comparator<Integer>() {
+            @Override
+            public int compare(Integer l, Integer r) {
+                return ((RowKeyColumnExpression) 
joinExpressions.get(l)).getPosition() - ((RowKeyColumnExpression) 
joinExpressions.get(r)).getPosition();
+            }
+        });
+        int positionOffset = (table.getBucketNum() ==null ? 0 : 1) + 
(context.getConnection().getTenantId() != null && table.isMultiTenant() ? 1 : 
0) + (table.getViewIndexId() == null ? 0 : 1);
+        int position = 0;
+        for (Integer index : rowkeyColumnIndexes) {
+            RowKeyColumnExpression exp = (RowKeyColumnExpression) 
joinExpressions.get(index);
+            if (exp.getPosition() != position + positionOffset) {
+                break;
+            }
+            position++;
+        }
+        
+        if (position == 0)
+            return new Pair<Expression, Expression>(null, null);
+        
+        if (position == 1)
+            return new Pair<Expression, 
Expression>(joinExpressions.get(rowkeyColumnIndexes.get(0)), 
hashExpressions.get(rowkeyColumnIndexes.get(0)));
+        
+        List<Expression> lChildren = Lists.newArrayList();
+        List<Expression> rChildren = Lists.newArrayList();
+        for (int i = 0; i < position; i++) {
+            Integer index = rowkeyColumnIndexes.get(i);
+            lChildren.add(joinExpressions.get(index));
+            rChildren.add(hashExpressions.get(index));
+        }
+        
+        return new Pair<Expression, Expression>(new 
RowValueConstructorExpression(lChildren, false), new 
RowValueConstructorExpression(rChildren, false));
+    }
+    
{code}

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: 852-2.patch, 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to