Re: Is it possible to delegate data joins and filtering to the datasource ?

Muhammad Gelbana Wed, 12 Apr 2017 03:39:45 -0700

I have done it. Thanks a lot Weijie and all of you for your time.

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana


On Thu, Apr 6, 2017 at 3:15 PM, weijie tong <[email protected]> wrote:

> some tips:
> 1. you need to know the RexInputRef index relationship between the
>  JoinRel's  and its inputs's  .
>
> join ( 1,2 ,3,4,5)
>
> left input(1,2,3) right input (1,2)
>
> 1,2,3,  ===> left input (1 ,2,3)
>
> 4,5 ====>right input (1,2)
>
> 2. you capture the index map relationship  when you iterate over your
> JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store
> these index mapping data in your defined BGroupScan( name convention of my
> last example )
> this mapping struct may be:  destination index  ------------->( source
> ScanRel  :  source Index) .
> to 1 example data ,the struct will be:
> 1 ==>(left scan1   : 1)
> 2 ==>(left scan1  : 2)
> 3 ==>(left scan1  : 3)
> 4 ==>(right scan2  : 1)
> 5 ==>(right scan2  : 2)
>
> 3. you define another Rule (match Project RelNode)which depends on the
> index mapping data of your last step . At this rule you pick the final
> output project's index and pick its mapped index by the mapping struct,
> then you find the final output column name and related tables.
>
>
>
>
> On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana <[email protected]>
> wrote:
>
> > I've succeeded, theoretically, in what I wanted to do because I had to
> send
> > the selected columns manually to my datasource. Would someone please tell
> > me how can I identify the selected columns in the join ? I searched a lot
> > without success.
> >
> > *---------------------*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <[email protected]>
> > wrote:
> >
> > > So I intend to use this constructor for the new *RelNode*:
> > *org.apache.drill.exec.planner.logical.DrillScanRel.
> > DrillScanRel(RelOptCluster,
> > > RelTraitSet, RelOptTable, GroupScan, RelDataType, List<SchemaPath>)*
> > >
> > > How can I provide it's parameters ?
> > >
> > >    1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
> > >
> > >    2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
> > >
> > >    3. *RelOptTable*: I assume I can use this factory method
> > (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
> > >    RelDataType, Table, Path)*). Any hints of how I can provide these
> > >    parameters too ? Should I just go ahead and manually create a new
> > instance
> > >    of each parameter ?
> > >
> > >    4. *GroupScan*: I understand I have to create a new implementation
> > >    class for this one so now questions here so far.
> > >
> > >    5. *RelDataType*: This one is confusing. Because I understand that
> for
> > >    *DrillJoinRel.transformTo(newRel)* to work, I have to provide a
> > >    *newRel* instance that has a *RelDataType* instance with the same
> > >    amount of fields and compatible types (i.e. this is mandated by
> > *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
> > >    RelNode, Object)*). Why couldn't I provide a *RelDataType* with
> > >    a different set of fields ? How can I resolve this ?
> > >
> > >    6. *List<SchemaPath>*: I assume I can call this method and pass my
> > >    columns names to it, one by one. (i.e.
> > >    *org.apache.drill.common.expression.SchemaPath.
> > getCompoundPath(String...)*
> > >    )
> > >
> > > Thanks.
> > >
> > > *---------------------*
> > > *Muhammad Gelbana*
> > > http://www.linkedin.com/in/mgelbana
> > >
> > > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <[email protected]>
> > > wrote:
> > >
> > >> your code seems right , just to implement the 'call.transformTo()'
> ,but
> > >> the
> > >> left detail , maybe I think I can't express the left things so
> > precisely,
> > >> just as @Paul Rogers mentioned the plugin detail is a little trivial.
> > >>
> > >> 1.  drillScanRel.getGroupScan  .
> > >> 2. you need to extend the AbstractGroupScan ,and let it holds some
> > >> information about your storage . This defined GroupScan just call it
> > >> AGroupScan corresponds to a joint scan RelNode. Then you can define
> > >> another
> > >> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan
> > acts
> > >> as a aggregate container which holds the two joint AGroupScan.
> > >> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> > >> requirement and exmple of transforming between two different RelNodes
> > can
> > >> be found from other codes. This DrillScanRel's GroupScan is the
> > >> BGroupScan.
> > >> This new DrillScanRel is the one applys to the code
> > >>  `call.transformTo(xxxx)`.
> > >>
> > >> maybe the picture below may help you  understand my idea:
> > >>
> > >>
> > >>          ---Scan (AGroupScan)
> > >> suppose the initial RelNode tree is : Project ----Join --|
> > >>
> > >>   |       ---Scan (AGroupScan)
> > >>
> > >>   |
> > >>
> > >>  \|/
> > >> after applied this rule ,the final tree is: Project-----Scan (
> > BGroupScan
> > >> (
> > >> List(AGroupScan ,AGroupScan) ) )
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >> > *This is my rule class*
> > >> >
> > >> > public class CartesianProductJoinRule extends RelOptRule {
> > >> >
> > >> >     public static final CartesianProductJoinRule INSTANCE = new
> > >> > CartesianProductJoinRule(DrillJoinRel.class);
> > >> >
> > >> >     public CartesianProductJoinRule(Class<DrillJoinRel> clazz) {
> > >> >         super(operand(clazz, operand(RelNode.class, any()),
> > >> > operand(RelNode.class, any())),
> > >> >                 "CartesianProductJoin");
> > >> >     }
> > >> >
> > >> >     @Override
> > >> >     public boolean matches(RelOptRuleCall call) {
> > >> >         DrillJoinRel drillJoin = call.rel(0);
> > >> >         return drillJoin.getJoinType() == JoinRelType.INNER &&
> > >> > drillJoin.getCondition().isAlwaysTrue();
> > >> >     }
> > >> >
> > >> >     @Override
> > >> >     public void onMatch(RelOptRuleCall call) {
> > >> >         DrillJoinRel join = call.rel(0);
> > >> >         RelNode firstRel = call.rel(1);
> > >> >         RelNode secondRel = call.rel(2);
> > >> >         HepRelVertex right = (HepRelVertex) join.getRight();
> > >> >         HepRelVertex left = (HepRelVertex) join.getLeft();
> > >> >
> > >> >         List<RelDataTypeField> firstFields = firstRel.getRowType().
> > >> > getFieldList();
> > >> >         List<RelDataTypeField> secondFields =
> secondRel.getRowType().
> > >> > getFieldList();
> > >> >
> > >> >         RelNode firstTable = ((HepRelVertex)firstRel.
> > >> > getInput(0)).getCurrentRel();
> > >> >         RelNode secondTable = ((HepRelVertex)secondRel.
> > >> > getInput(0)).getCurrentRel();
> > >> >
> > >> >         //call.transformTo(???);
> > >> >     }
> > >> > }
> > >> >
> > >> > *To register the rule*, I overrode the *getOptimizerRules* method in
> > my
> > >> > storage plugin class
> > >> >
> > >> > public Set<? extends RelOptRule> getOptimizerRules(OptimizerRul
> > >> esContext
> > >> > optimizerContext, PlannerPhase phase) {
> > >> >     switch (phase) {
> > >> >     case LOGICAL_PRUNE_AND_JOIN:
> > >> >     case LOGICAL_PRUNE:
> > >> >     case LOGICAL:
> > >> >         return getLogicalOptimizerRules(optimizerContext);
> > >> >     case PHYSICAL:
> > >> >         return getPhysicalOptimizerRules(optimizerContext);
> > >> >     case PARTITION_PRUNING:
> > >> >     case JOIN_PLANNING:
> > >> > *        return ImmutableSet.of(CartesianProductJoinRule.
> INSTANCE);*
> > >> >     default:
> > >> >         return ImmutableSet.of();
> > >> >     }
> > >> >
> > >> > }
> > >> >
> > >> > The rule is firing as expected but I'm lost when it comes to the
> > >> > conversion. Earlier, you said "the new equivalent ScanRel is to have
> > the
> > >> > joined
> > >> > ScanRel nodes's GroupScans", so
> > >> >
> > >> >    1. How can I obtain the left and right tables group scans ?
> > >> >    2. What exactly do you mean by joining them ? Is there a utility
> > >> method
> > >> >    to do so ? Or should I manually create a new single group scan
> and
> > >> add
> > >> > the
> > >> >    information I need there ? Looking into other *GroupScan*
> > >> >    implementations, I found that they have references to some
> runtime
> > >> > objects
> > >> >    such as the storage plugin and the storage plugin configuration.
> At
> > >> this
> > >> >    stage, I don't know how to obtain those !
> > >> >    3. Precisely, what kind of object should I use to represent a
> > >> *RelNode*
> > >> >    that represents the whole join ? I understand that I need to use
> an
> > >> > object
> > >> >    that has implements the *RelNode* interface. Then I should add
> the
> > >> >    created *GroupScan* to that *RelNode* instance and call
> > >> >    *call.transformTo(newRelNode)*, correct ?
> > >> >
> > >> >
> > >> > *---------------------*
> > >> > *Muhammad Gelbana*
> > >> > http://www.linkedin.com/in/mgelbana
> > >> >
> > >> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <
> [email protected]
> > >
> > >> > wrote:
> > >> >
> > >> > > I mean the rule you write could be placed in the
> > >> > PlannerPhase.JOIN_PlANNING
> > >> > > which uses the HepPlanner. This phase is to solve the logical
> > relnode
> > >> .
> > >> > > Hope to help you.
> > >> > > Muhammad Gelbana <[email protected]>于2017年3月30日 周四上午12:07写道：
> > >> > >
> > >> > > > Thanks a lot Weijie, I believe I'm very close now. I hope you
> > don't
> > >> > mind
> > >> > > > few more questions please:
> > >> > > >
> > >> > > >
> > >> > > >    1. The new rule you are mentioning is a physical rule ? So I
> > >> should
> > >> > > >    implement the Prel interface ?
> > >> > > >    2. By "traversing the join to find the ScanRel"
> > >> > > >       - This sounds like I have to "search" for something.
> > >> Shouldn't I
> > >> > > just
> > >> > > >       work on transforming the left (i.e. DrillJoinRel's
> getLeft()
> > >> > > method)
> > >> > > > and
> > >> > > >       right (i.e. DrillJoinRel's getLeft() method) join objects
> ?
> > >> > > >       - The "left" and "right" elements of the DrillJoinRel
> object
> > >> are
> > >> > of
> > >> > > >       type RelSubset, not *ScanRel* and I can't find a type
> called
> > >> > > > *ScanRel*.
> > >> > > >       I suppose you meant *ScanPrel*, specially because it
> > >> implements
> > >> > the
> > >> > > >       *Prel* interface that provides the *getPhysicalOperator*
> > >> method.
> > >> > > >    3. What if multiple physical or logical rules match for a
> > single
> > >> > node,
> > >> > > >    what decides which rule will be applied and which will be
> > >> rejected ?
> > >> > > Is
> > >> > > > it
> > >> > > >    the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method
> ?
> > >> What
> > >> > if
> > >> > > >    more than one rule produces the same cost ?
> > >> > > >
> > >> > > > I'll go ahead and see what I can do for now before hopefully you
> > may
> > >> > > offer
> > >> > > > more guidance. THANKS A LOT.
> > >> > > >
> > >> > > > *---------------------*
> > >> > > > *Muhammad Gelbana*
> > >> > > > http://www.linkedin.com/in/mgelbana
> > >> > > >
> > >> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <
> > >> [email protected]>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > to avoid misunderstanding , the new equivalent ScanRel is to
> > have
> > >> the
> > >> > > > > joined ScanRel nodes's GroupScans, as the GroupScans
> indirectly
> > >> hold
> > >> > > the
> > >> > > > > underlying storage information.
> > >> > > > >
> > >> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <
> > >> > [email protected]
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > >
> > >> > > > > > my suggestion is you define a rule which matches the
> > >> DrillJoinRel
> > >> > > > RelNode
> > >> > > > > > , then at the onMatch method ,you traverse the join children
> > to
> > >> > find
> > >> > > > the
> > >> > > > > > ScanRel nodes . You define a new ScanRel which include the
> > >> ScanRel
> > >> > > > nodes
> > >> > > > > > you find last step. Then transform the JoinRel to this
> > >> equivalent
> > >> > new
> > >> > > > > > ScanRel.
> > >> > > > > > Finally , the plan tree will not have the JoinRel but the
> > >> ScanRel.
> > >> > > >  You
> > >> > > > > > can let your join plan rule  in the
> > PlannerPhase.JOIN_PLANNING.
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Is it possible to delegate data joins and filtering to the datasource ?

Reply via email to