Re: How to generate hash code for each build side one of the hash join columns

weijie tong Fri, 01 Jun 2018 00:36:55 -0700

Some questions about SelectionVector2 and SelectionVector4:

I want to create SelectionVector4 or SelectionVector2 to represent the
filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
not support SelectVector4 . And the SelectionVector2's record count size is
char type size .  So why SelectionVector4 is not supported by the
ProjectBatch ? The same question is to the FilterBatch's SelectVector2
which also only support the 2 Byte size record count.


On Fri, Jun 1, 2018 at 1:40 PM weijie tong <tongweijie...@gmail.com> wrote:

> Hi Boaz:
>
>   Your propose is valuable though I have implemented the dynamic
> generating code logic.  If a  ``` long hash64(int index, long seed) ```
> method is added to the ValueVector , it will also benefit others to
> implement specific storage plugin's filter logic by using the pushed down
> bloom filter.  To HashJoin and HashAggregate , methods ```double
> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
> seed)```  will also be needed to the ValueVector.  If no one else gives
> objection , I will be pleasure to take this work.
>
>    Btw, I will share my thought about the scan side's filter logic by the
> BloomFilter. The scan side filter logic here I supposed to do is to filter
> the materialized ValueVector ,not at the process to construct the
> ValueVector from the original storage format data. The reason is the
> checking logic will break down the performance to materialize the original
> deep storage format data to ValueVector.
>
> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bben-...@mapr.com> wrote:
>
>>  Hi Weijie,
>>
>>     Another option is to totally avoid the generated code.
>> We were considering the idea of replacing the generated code used for
>> computing hash values with “real java” code.
>>
>> This idea is analogous to the usage of the copyEntry() method in the
>> ValueVector interface (that Paul added last year).
>> See an example of using the copyEntry() (via the appendRow() in
>> VectorContainer) in the new Hash-Join-Spill code.
>> Basically no need to generate “type specific” code, as the virtual
>> copyEntry() method does the “type specific” work.
>>
>> Similarly we could have a hash64() method in ValueVector, which would
>> perform the “type specific” computation.
>> (One difference from copyEntry() – the hash64() would also need to take
>> the “seed” parameter, which is the hash value produced by the previous
>> hash).
>> And similar to appendRow(), there would be evalHash() iterating over the
>> key columns.
>> (And one difference from appendRow() – need to iterate only on the key
>> columns; these are the first columns; their number can be found from the
>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>
>>    With such implementation, that evalHash() could be used anywhere
>> (e.g., to match the Bloom filters on the left side of the join).
>>
>>        Thanks,
>>
>>              Boaz
>>
>>
>> On 5/30/18, 7:49 PM, "weijie tong" <tongweijie...@gmail.com> wrote:
>>
>>     Hi Aman:
>>
>>       Thanks for your tips. I have rebased the latest code from the master
>>     branch . Yes, the spill-to-disk feature does changed the original
>>     implementation. I have adjusted my implementation according to the new
>>     feature. But as you say, it will take some challenge to integration
>> as I
>>     noticed the spill-to-disk feature will continue to tune its
>> implementation
>>     performance.
>>
>>       The BloomFilter was implemented natively in Drill , not an external
>>     library. It's implemented the algorithm of the paper which was
>> mentioned by
>>     you.
>>
>>
>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <amansi...@apache.org>
>> wrote:
>>
>>     > Hi Weijie,
>>     > I was hoping you could leverage the existing methods..so its good
>> that you
>>     > found the ones that work for your use case.
>>     > One thing I want to point out (maybe you're already aware) .. the
>> Hash Join
>>     > code has changed significantly in the master branch due to the
>>     > spill-to-disk feature.
>>     > So, this may pose some integration challenges for your run-time join
>>     > pushdown feature.
>>     > Also, one other question/clarification:  for the bloom filter
>> itself are
>>     > you implementing it natively in Drill or using an external library ?
>>     >
>>     > -Aman
>>     >
>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
>> tongweijie...@gmail.com>
>>     > wrote:
>>     >
>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>     > unNestEvalBlock()
>>     > > which has the same effect to what I change to the ClassGenerator.
>> So I
>>     > give
>>     > > up what I change to the ClassGenerator and hope this can help
>> someone
>>     > else.
>>     > >
>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
>> tongweijie...@gmail.com>
>>     > > wrote:
>>     > >
>>     > > > The code formatting is not nice. Put them again:
>>     > > >
>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>     > > MappingSet
>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>> keyExprs,
>>     > > > TypedFieldId[] buildKeyFieldIds)
>>     > > > throws SchemaChangeException {
>>     > > > cg.setMappingSet(incomingMapping);
>>     > > > if (keyExprs == null || keyExprs.length == 0) {
>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > > }
>>     > > > String seedValue = "seedValue";
>>     > > > String fieldId = "fieldId";
>>     > > > LogicalExpression seed =
>>     > > > ValueExpressions.getParameterExpression(seedValue,
>> Types.required(
>>     > > > TypeProtos.MinorType.INT));
>>     > > >
>>     > > > LogicalExpression fieldIdParamExpr =
>>     > > > ValueExpressions.getParameterExpression(fieldId, Types.required(
>>     > > > TypeProtos.MinorType.INT) );
>>     > > > HoldingContainer fieldIdParamHolder =
>> cg.addExpr(fieldIdParamExpr);
>>     > > > int i = 0;
>>     > > >  for (LogicalExpression expr : keyExprs) {
>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>     > > >
>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>     > > > ExpressionPosition.UNKNOWN);
>>     > > >
>>     > > >     JFieldRef targetBuildSideFieldId =
>>     > cg.addExpr(targetBuildFieldIdExp,
>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>     > > >     JBlock ifBlock =
>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>     > > eq(targetBuildSideFieldId))._then();
>>     > > >     //specify a special JBlock which is a inner one of the eval
>> block
>>     > to
>>     > > > the ClassGenerator to substitute the returned JBlock of
>> getEvalBlock()
>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>>     > > >     LogicalExpression hashExpression =
>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>> null);
>>     > > >     LogicalExpression materializedExpr =
>>     > > >
>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>     > > batch,
>>     > > > context.getFunctionRegistry());
>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>     > > >     ifBlock._return(hash.getValue());
>>     > > >     //reset the customized block to null ,so the getEvalBlock()
>> return
>>     > > the
>>     > > > truly eval JBlock
>>     > > >     cg.setCustomizedEvalInnerBlock(null);
>>     > > >     i++;
>>     > > >  }
>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > > }
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>> seedValue,
>>     > > int
>>     > > > fieldId)
>>     > > > throws SchemaChangeException
>>     > > > {
>>     > > > {
>>     > > > IntHolder fieldId12 = new IntHolder();
>>     > > > fieldId12 .value = fieldId;
>>     > > > if (fieldId12 .value == constant14 .value) {
>>     > > >    IntHolder out18 = new IntHolder();
>>     > > >    {
>>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>     > > >    }
>>     > > >    IntHolder seedValue19 = new IntHolder();
>>     > > >    seedValue19 .value = seedValue;
>>     > > >    //---- start of eval portion of hash32AsDouble function.
>> ----//
>>     > > >    IntHolder out20 = new IntHolder();
>>     > > >   {
>>     > > >       final IntHolder out = new IntHolder();
>>     > > >       IntHolder in = out18;
>>     > > >       IntHolder seed = seedValue19;
>>     > > >
>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >       out.value =
>>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>> in.value,
>>     > > > seed.value);
>>     > > >    }
>>     > > >
>>     > > >    out20 = out;
>>     > > > }
>>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>>     > > > return out20 .value;
>>     > > > }
>>     > > > return 0;
>>     > > > }
>>     > > > }
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
>> tongweijie...@gmail.com>
>>     > > > wrote:
>>     > > >
>>     > > >> HI Paul:
>>     > > >>
>>     > > >>  Thanks for your enthusiasm. I have managed this skill as you
>> ever
>>     > > >> mentioned me at another mail thread. It's really helpful
>> ,thanks for
>>     > > your
>>     > > >> valuable work.
>>     > > >>
>>     > > >>   Now I have solved this tough problem by adding a customized
>> JBlock
>>     > > >> member field to the ClassGenerator. So once you want the
>>     > getEvalBlock()
>>     > > of
>>     > > >> the ClassGenerator to return a inner customized JBlock , then
>> you set
>>     > > this
>>     > > >> member, if you want the method to return eval self JBlock ,
>> you reset
>>     > > this
>>     > > >> member to null.
>>     > > >>
>>     > > >>   Here is my changed setup method :
>>     > > >>
>>     > > >>
>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>     > > MappingSet incomingMapping, VectorAccessible batch,
>> LogicalExpression[]
>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>     > > >>   throws SchemaChangeException {
>>     > > >>   cg.setMappingSet(incomingMapping);
>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >>   }
>>     > > >>   String seedValue = "seedValue";
>>     > > >>   String fieldId = "fieldId";
>>     > > >>   LogicalExpression seed =
>>     > ValueExpressions.getParameterExpression(seedValue,
>>     > > Types.required(TypeProtos.MinorType.INT));
>>     > > >>
>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>>     > > getParameterExpression(fieldId, Types.required(
>> TypeProtos.MinorType.INT)
>>     > > );
>>     > > >>   HoldingContainer fieldIdParamHolder =
>> cg.addExpr(fieldIdParamExpr);
>>     > > >>   int i = 0;
>>     > > >>   for (LogicalExpression expr : keyExprs) {
>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>     > > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>     > > ExpressionPosition.UNKNOWN);
>>     > > >>
>>     > > >>     JFieldRef targetBuildSideFieldId =
>>     > cg.addExpr(targetBuildFieldIdExp,
>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>>     > > fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>     > > >>     //specify a special JBlock which is a inner one of the
>> eval block
>>     > > to the ClassGenerator to substitute the returned JBlock of
>> getEvalBlock()
>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>>     > > >>     LogicalExpression hashExpression =
>>     > HashPrelUtil.getHashExpression(expr,
>>     > > seed, incomingProbe != null);
>>     > > >>     LogicalExpression materializedExpr =
>> ExpressionTreeMaterializer.
>>     > > materializeAndCheckErrors(hashExpression, batch,
>>     > > context.getFunctionRegistry());
>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>     > > >>     ifBlock._return(hash.getValue());
>>     > > >>     //reset the customized block to null ,so the
>> getEvalBlock() return
>>     > > the truly eval JBlock
>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>>     > > >>     i++;
>>     > > >>   }
>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >> }
>>     > > >>
>>     > > >>
>>     > > >>  The corresponding generated codes :
>>     > > >>
>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx, int
>>     > > seedValue, int fieldId)
>>     > > >>         throws SchemaChangeException
>>     > > >>     {
>>     > > >>         {
>>     > > >>             IntHolder fieldId12 = new IntHolder();
>>     > > >>             fieldId12 .value = fieldId;
>>     > > >>             if (fieldId12 .value == constant14 .value) {
>>     > > >>                 IntHolder out18 = new IntHolder();
>>     > > >>                 {
>>     > > >>                     out18 .value = vv15 .getAccessor().get((
>>     > > incomingRowIdx));
>>     > > >>                 }
>>     > > >>                 IntHolder seedValue19 = new IntHolder();
>>     > > >>                 seedValue19 .value = seedValue;
>>     > > >>                 //---- start of eval portion of hash32AsDouble
>>     > > function. ----//
>>     > > >>                 IntHolder out20 = new IntHolder();
>>     > > >>                 {
>>     > > >>                     final IntHolder out = new IntHolder();
>>     > > >>                     IntHolder in = out18;
>>     > > >>                     IntHolder seed = seedValue19;
>>     > > >>
>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >>     out.value =
>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>     > > in.value, seed.value);
>>     > > >> }
>>     > > >>
>>     > > >>                     out20 = out;
>>     > > >>                 }
>>     > > >>                 //---- end of eval portion of hash32AsDouble
>> function.
>>     > > ----//
>>     > > >>                 return out20 .value;
>>     > > >>             }
>>     > > >>             return  0;
>>     > > >>         }
>>     > > >>     }
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>   Some other explanation:
>>     > > >>   1st : The if checking won't hurt the performance , as I
>> invoke this
>>     > > >> method column by column , so it's branch predication friendly.
>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since the
>>     > efficient
>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid the
>>     > conflict.
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>>     > <par0...@yahoo.com.invalid
>>     > > >
>>     > > >> wrote:
>>     > > >>
>>     > > >>> Hi Weijie,
>>     > > >>>
>>     > > >>> Seeing the discussion about the details of JCodeModel
>> suggests you
>>     > may
>>     > > >>> be trying to debug your generated code at the level of the
>> code
>>     > > generator.
>>     > > >>>
>>     > > >>> Some time ago we added the ability to step through the
>> generated
>>     > code.
>>     > > >>> Look for the following line in the generator code:
>>     > > >>>
>>     > > >>>
>>     > > >>>     // Uncomment out this line to debug the generated code.
>>     > > >>>
>>     > > >>> //    cg.saveCodeForDebugging(true);
>>     > > >>>
>>     > > >>>
>>     > > >>> Uncomment the code line and Drill will save each generated
>> file to a
>>     > > >>> configured location (which, if I recall correctly, is
>>     > > /tmp/drill/codegen,
>>     > > >>> though it may have changed after Tim's test directory
>> changes.)
>>     > > >>>
>>     > > >>> Then, set a breakpoint in the template setup() method and you
>> can
>>     > step
>>     > > >>> directly into the generated doSetup() method. Same for the
>> eval()
>>     > > method.
>>     > > >>>
>>     > > >>> This way, you can not only see the generated code, you can
>> step
>>     > through
>>     > > >>> it. I've found this to be a far easier way to understand the
>>     > generated
>>     > > code
>>     > > >>> than the older techniques folks have used (look at byte
>> codes, use
>>     > > print
>>     > > >>> statements, brute force reasoning, etc.)
>>     > > >>>
>>     > > >>> Tim, Boaz and others have used this technique more recently
>> and can
>>     > > >>> probably give you additional pointers.
>>     > > >>>
>>     > > >>> Thanks,
>>     > > >>> - Paul
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>>     > > >>> tongweijie...@gmail.com> wrote:
>>     > > >>>
>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need an
>>     > _else()
>>     > > >>> block
>>     > > >>> also ?"  I give a default return logic at the method, so I
>> don't need
>>     > > the
>>     > > >>> _else() block.  I have noticed the IfExpression's evaluation
>> method
>>     > at
>>     > > >>> EvaluationVisitor which also uses the JConditional . But that
>> also
>>     > > >>> doesn't
>>     > > >>> match my requirement. I think the key point here is the
>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
>> put their
>>     > > >>> corresponding generated codes to the eval method's JBlock ,
>> not our
>>     > > >>> specific IfBlock which is a inner block of the eval method's
>> JBlock .
>>     > > >>>
>>     > > >>> So it seems I should make some changes to the ClassGenerator
>> to let
>>     > the
>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
>> JConditional's
>>     > > then
>>     > > >>> block) or implement some special FunctionHolderExpression
>>     > > >>> 、ValueVectorReadExpression and corresponding visiting methods
>> at the
>>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
>> someone
>>     > who
>>     > > >>> are
>>     > > >>> familiar with these part of codes to point out whether there
>> are more
>>     > > >>> easy
>>     > > >>> or different choices to achieve the target.
>>     > > >>>
>>     > > >>> To make discussion more accurate, I put the generated codes
>> of the
>>     > > >>> previous
>>     > > >>> setupGetBuild64Hash method here:
>>     > > >>>
>>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
>> int
>>     > > >>> seedValue, int fieldId)
>>     > > >>>         throws SchemaChangeException
>>     > > >>>     {
>>     > > >>>         {
>>     > > >>>             IntHolder fieldId16 = new IntHolder();
>>     > > >>>             fieldId16 .value = fieldId;
>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>     > > >>>                 return out24 .value;
>>     > > >>>             }
>>     > > >>>             IntHolder out22 = new IntHolder();
>>     > > >>>             {
>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>>     > > incomingRowIdx));
>>     > > >>>             }
>>     > > >>>             IntHolder seedValue23 = new IntHolder();
>>     > > >>>             seedValue23 .value = seedValue;
>>     > > >>>             //---- start of eval portion of hash32AsDouble
>> function.
>>     > > >>> ----//
>>     > > >>>             IntHolder out24 = new IntHolder();
>>     > > >>>             {
>>     > > >>>                 final IntHolder out = new IntHolder();
>>     > > >>>                 IntHolder in = out22;
>>     > > >>>                 IntHolder seed = seedValue23;
>>     > > >>>
>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >>>     out.value =
>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>     > > >>> in.value, seed.value);
>>     > > >>> }
>>     > > >>>
>>     > > >>>                 out24 = out;
>>     > > >>>             }
>>     > > >>>             //---- end of eval portion of hash32AsDouble
>> function.
>>     > > ----//
>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>     > > >>>                 return out26 .value;
>>     > > >>>             }
>>     > > >>>             IntHolder seedValue25 = new IntHolder();
>>     > > >>>             seedValue25 .value = seedValue;
>>     > > >>>             //---- start of eval portion of hash32AsDouble
>> function.
>>     > > >>> ----//
>>     > > >>>             IntHolder out26 = new IntHolder();
>>     > > >>>             {
>>     > > >>>                 final IntHolder out = new IntHolder();
>>     > > >>>                 IntHolder in = out22;
>>     > > >>>                 IntHolder seed = seedValue25;
>>     > > >>>
>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >>>     out.value =
>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>     > > >>> in.value, seed.value);
>>     > > >>> }
>>     > > >>>
>>     > > >>>                 out26 = out;
>>     > > >>>             }
>>     > > >>>             //---- end of eval portion of hash32AsDouble
>> function.
>>     > > ----//
>>     > > >>>             return  0;
>>     > > >>>         }
>>     > > >>>     }
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
>> amansi...@apache.org>
>>     > > >>> wrote:
>>     > > >>>
>>     > > >>> > sorry, the previous email is incomplete.
>>     > > >>> > For the ifBlock, do you need an _else() block also ?
>>     > > >>> >
>>     > > >>> > I have sometimes found that 'JConditional' is a good way to
>> break
>>     > > down
>>     > > >>> the
>>     > > >>> > logic further.  Please see example usages of JConditional
>> here [1].
>>     > > >>> >
>>     > > >>> > -Aman
>>     > > >>> >
>>     > > >>> > [1]
>>     > > >>> >
>>     > > >>> >
>>     > > >>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>> .
>>     > > sun.codemodel.JBlock
>>     > > >>> >
>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
>> amansi...@apache.org>
>>     > > >>> wrote:
>>     > > >>> >
>>     > > >>> > > Hi Weijie,
>>     > > >>> > > It would be a little cumbersome to debug such issues over
>> email
>>     > > >>> since one
>>     > > >>> > > has to look at the generated code output and iteratively
>> debug.
>>     > > >>> > > Couple of thoughts I have that might help:
>>     > > >>> > >
>>     > > >>> > > For this particular if-then block, should you also
>>     > > >>> > > JBlock ifBlock =
>>     > > >>> > >
>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>     > > >>> > > tBuildSideFieldId))._then();
>>     > > >>> > >
>>     > > >>> > >
>>     > > >>> > >
>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>>     > > >>> tongweijie...@gmail.com>
>>     > > >>> > > wrote:
>>     > > >>> > >
>>     > > >>> > >> HI All:
>>     > > >>> > >>  Through implementing the JPPD feature (
>>     > > >>> > >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>> , I was
>>     > blocked
>>     > > >>> by
>>     > > >>> > the
>>     > > >>> > >> problem: how to get the hash code of each build side of
>> the hash
>>     > > >>> join
>>     > > >>> > >> columns through the dynamic generated java code. Hope
>> someone
>>     > can
>>     > > >>> give
>>     > > >>> > >> some
>>     > > >>> > >> advice.
>>     > > >>> > >>
>>     > > >>> > >>    I supposed to add methods as below to the
>> HashTableTemplate :
>>     > > >>> > >>
>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>>     > seedValue,
>>     > > >>> int
>>     > > >>> > >> fieldId) throws SchemaChangeException{
>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
>> seedValue,
>>     > > >>> fieldId);
>>     > > >>> > >> }
>>     > > >>> > >>
>>     > > >>> > >> protected abstract long
>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>>     > > incomingRowIdx,
>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int
>>     > fieldId)
>>     > > >>> > >> throws SchemaChangeException;
>>     > > >>> > >>
>>     > > >>> > >>
>>     > > >>> > >>    The high level code to invoke the getBuild64HashCode
>> method
>>     > is
>>     > > >>> at the
>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>>     > > >>> > >>
>>     > > >>> > >> //create runtime filter
>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>>     > > >>> > >>  //create runtime filter and send out async
>>     > > >>> > >>  int condFieldIndex = 0;
>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>>     > > >>> > >>    //VV
>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>>     > > >>> > >>      long hashCode =
>> partitions[0].getBuild64HashCode(ind,
>>     > > >>> > >> condFieldIndex);
>>     > > >>> > >>      bloomFilter.insert(hashCode);
>>     > > >>> > >>    }
>>     > > >>> > >>    condFieldIndex++;
>>     > > >>> > >>  }
>>     > > >>> > >>  //TODO sered out async
>>     > > >>> > >> }
>>     > > >>> > >>
>>     > > >>> > >>
>>     > > >>> > >>  As you know, the abstract method
>> getBuild64HashCodeInner needs
>>     > to
>>     > > >>> > >> calculate the hash codes of each build side column by the
>>     > fieldId
>>     > > >>> input
>>     > > >>> > >> parameter. In order to achieve this target, I plan to
>> have
>>     > > different
>>     > > >>> > >> solving parts corresponding to different column
>> ValueVector ,
>>     > > using
>>     > > >>> the
>>     > > >>> > if
>>     > > >>> > >> statement to distinguish different solving parts through
>> the id
>>     > of
>>     > > >>> the
>>     > > >>> > >> column.  The corresponding method to generate the
>> dynamic codes
>>     > > is
>>     > > >>> as
>>     > > >>> > >> below:
>>     > > >>> > >>
>>     > > >>> > >> private void
>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>> buildKeyFieldIds)
>>     > > >>> > >>  throws SchemaChangeException {
>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >>> > >>  }
>>     > > >>> > >>  String seedValue = "seedValue";
>>     > > >>> > >>  String fieldId = "fieldId";
>>     > > >>> > >>  LogicalExpression seed =
>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>     > > >>> > >>
>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>>     > > cg.addExpr(fieldIdParamExpr);
>>     > > >>> > >>  int i = 0;
>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>>     > > >>> > >>    TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>     > > >>> > >>    ValueExpressions.IntExpression targetBuildFieldIdExp
>> = new
>>     > > >>> > >>
>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>     > > )[0],
>>     > > >>> > >> ExpressionPosition.UNKNOWN);
>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>     > > >>> > >>    JBlock ifBlock =
>>     > > >>> > >>
>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>     > > >>> > >> tBuildSideFieldId))._then();
>>     > > >>> > >>
>>     > > >>> > >>    LogicalExpression hashExpression =
>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed, incomingProbe
>> !=
>>     > > null);
>>     > > >>> > >>    LogicalExpression materializedExpr =
>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>     > > hashExpression,
>>     > > >>> > >> batch, context.getFunctionRegistry());
>>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>     > > >>> > >>
>>     > > >>> > >>
>>     > > >>> > >>    ifBlock._return(hash.getValue());
>>     > > >>> > >>    i++;
>>     > > >>> > >>  }
>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >>> > >>
>>     > > >>> > >> }
>>     > > >>> > >>
>>     > > >>> > >> But unfortunately, the generated codes are not what I
>> expected.
>>     > > The
>>     > > >>> > codes
>>     > > >>> > >> to read ValueVector , calculate hash code of the read
>> value do
>>     > not
>>     > > >>> stay
>>     > > >>> > in
>>     > > >>> > >> the if block.  So how can I let the related codes stay
>> in the if
>>     > > >>> block ?
>>     > > >>> > >>
>>     > > >>> > >
>>     > > >>> > >
>>     > > >>> >
>>     > > >>
>>     > > >>
>>     > >
>>     >
>>
>>
>>

Re: How to generate hash code for each build side one of the hash join columns

Reply via email to