Re: How to generate hash code for each build side one of the hash join columns

weijie tong Fri, 01 Jun 2018 02:14:40 -0700

I find the answer that RecordBatch's max size is 2^16 which is defined at
RecordBatch's MAX_BATCH_SIZE.


On Fri, Jun 1, 2018 at 3:36 PM weijie tong <[email protected]> wrote:

> Some questions about SelectionVector2 and SelectionVector4:
>
> I want to create SelectionVector4 or SelectionVector2 to represent the
> filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
> not support SelectVector4 . And the SelectionVector2's record count size is
> char type size .  So why SelectionVector4 is not supported by the
> ProjectBatch ? The same question is to the FilterBatch's SelectVector2
> which also only support the 2 Byte size record count.
>
> On Fri, Jun 1, 2018 at 1:40 PM weijie tong <[email protected]>
> wrote:
>
>> Hi Boaz:
>>
>>   Your propose is valuable though I have implemented the dynamic
>> generating code logic.  If a  ``` long hash64(int index, long seed) ```
>> method is added to the ValueVector , it will also benefit others to
>> implement specific storage plugin's filter logic by using the pushed down
>> bloom filter.  To HashJoin and HashAggregate , methods ```double
>> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
>> seed)```  will also be needed to the ValueVector.  If no one else gives
>> objection , I will be pleasure to take this work.
>>
>>    Btw, I will share my thought about the scan side's filter logic by the
>> BloomFilter. The scan side filter logic here I supposed to do is to filter
>> the materialized ValueVector ,not at the process to construct the
>> ValueVector from the original storage format data. The reason is the
>> checking logic will break down the performance to materialize the original
>> deep storage format data to ValueVector.
>>
>> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <[email protected]> wrote:
>>
>>>  Hi Weijie,
>>>
>>>     Another option is to totally avoid the generated code.
>>> We were considering the idea of replacing the generated code used for
>>> computing hash values with “real java” code.
>>>
>>> This idea is analogous to the usage of the copyEntry() method in the
>>> ValueVector interface (that Paul added last year).
>>> See an example of using the copyEntry() (via the appendRow() in
>>> VectorContainer) in the new Hash-Join-Spill code.
>>> Basically no need to generate “type specific” code, as the virtual
>>> copyEntry() method does the “type specific” work.
>>>
>>> Similarly we could have a hash64() method in ValueVector, which would
>>> perform the “type specific” computation.
>>> (One difference from copyEntry() – the hash64() would also need to take
>>> the “seed” parameter, which is the hash value produced by the previous
>>> hash).
>>> And similar to appendRow(), there would be evalHash() iterating over the
>>> key columns.
>>> (And one difference from appendRow() – need to iterate only on the key
>>> columns; these are the first columns; their number can be found from the
>>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>>
>>>    With such implementation, that evalHash() could be used anywhere
>>> (e.g., to match the Bloom filters on the left side of the join).
>>>
>>>        Thanks,
>>>
>>>              Boaz
>>>
>>>
>>> On 5/30/18, 7:49 PM, "weijie tong" <[email protected]> wrote:
>>>
>>>     Hi Aman:
>>>
>>>       Thanks for your tips. I have rebased the latest code from the
>>> master
>>>     branch . Yes, the spill-to-disk feature does changed the original
>>>     implementation. I have adjusted my implementation according to the
>>> new
>>>     feature. But as you say, it will take some challenge to integration
>>> as I
>>>     noticed the spill-to-disk feature will continue to tune its
>>> implementation
>>>     performance.
>>>
>>>       The BloomFilter was implemented natively in Drill , not an external
>>>     library. It's implemented the algorithm of the paper which was
>>> mentioned by
>>>     you.
>>>
>>>
>>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <[email protected]>
>>> wrote:
>>>
>>>     > Hi Weijie,
>>>     > I was hoping you could leverage the existing methods..so its good
>>> that you
>>>     > found the ones that work for your use case.
>>>     > One thing I want to point out (maybe you're already aware) .. the
>>> Hash Join
>>>     > code has changed significantly in the master branch due to the
>>>     > spill-to-disk feature.
>>>     > So, this may pose some integration challenges for your run-time
>>> join
>>>     > pushdown feature.
>>>     > Also, one other question/clarification:  for the bloom filter
>>> itself are
>>>     > you implementing it natively in Drill or using an external library
>>> ?
>>>     >
>>>     > -Aman
>>>     >
>>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
>>> [email protected]>
>>>     > wrote:
>>>     >
>>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>>     > unNestEvalBlock()
>>>     > > which has the same effect to what I change to the
>>> ClassGenerator. So I
>>>     > give
>>>     > > up what I change to the ClassGenerator and hope this can help
>>> someone
>>>     > else.
>>>     > >
>>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
>>> [email protected]>
>>>     > > wrote:
>>>     > >
>>>     > > > The code formatting is not nice. Put them again:
>>>     > > >
>>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>     > > MappingSet
>>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>>> keyExprs,
>>>     > > > TypedFieldId[] buildKeyFieldIds)
>>>     > > > throws SchemaChangeException {
>>>     > > > cg.setMappingSet(incomingMapping);
>>>     > > > if (keyExprs == null || keyExprs.length == 0) {
>>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > > }
>>>     > > > String seedValue = "seedValue";
>>>     > > > String fieldId = "fieldId";
>>>     > > > LogicalExpression seed =
>>>     > > > ValueExpressions.getParameterExpression(seedValue,
>>> Types.required(
>>>     > > > TypeProtos.MinorType.INT));
>>>     > > >
>>>     > > > LogicalExpression fieldIdParamExpr =
>>>     > > > ValueExpressions.getParameterExpression(fieldId,
>>> Types.required(
>>>     > > > TypeProtos.MinorType.INT) );
>>>     > > > HoldingContainer fieldIdParamHolder =
>>> cg.addExpr(fieldIdParamExpr);
>>>     > > > int i = 0;
>>>     > > >  for (LogicalExpression expr : keyExprs) {
>>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>>     > > >
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>     > > > ExpressionPosition.UNKNOWN);
>>>     > > >
>>>     > > >     JFieldRef targetBuildSideFieldId =
>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>     > > >     JBlock ifBlock =
>>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>>     > > eq(targetBuildSideFieldId))._then();
>>>     > > >     //specify a special JBlock which is a inner one of the
>>> eval block
>>>     > to
>>>     > > > the ClassGenerator to substitute the returned JBlock of
>>> getEvalBlock()
>>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>     > > >     LogicalExpression hashExpression =
>>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>>> null);
>>>     > > >     LogicalExpression materializedExpr =
>>>     > > >
>>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>>     > > batch,
>>>     > > > context.getFunctionRegistry());
>>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>     > > >     ifBlock._return(hash.getValue());
>>>     > > >     //reset the customized block to null ,so the
>>> getEvalBlock() return
>>>     > > the
>>>     > > > truly eval JBlock
>>>     > > >     cg.setCustomizedEvalInnerBlock(null);
>>>     > > >     i++;
>>>     > > >  }
>>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > > }
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>>> seedValue,
>>>     > > int
>>>     > > > fieldId)
>>>     > > > throws SchemaChangeException
>>>     > > > {
>>>     > > > {
>>>     > > > IntHolder fieldId12 = new IntHolder();
>>>     > > > fieldId12 .value = fieldId;
>>>     > > > if (fieldId12 .value == constant14 .value) {
>>>     > > >    IntHolder out18 = new IntHolder();
>>>     > > >    {
>>>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>>     > > >    }
>>>     > > >    IntHolder seedValue19 = new IntHolder();
>>>     > > >    seedValue19 .value = seedValue;
>>>     > > >    //---- start of eval portion of hash32AsDouble function.
>>> ----//
>>>     > > >    IntHolder out20 = new IntHolder();
>>>     > > >   {
>>>     > > >       final IntHolder out = new IntHolder();
>>>     > > >       IntHolder in = out18;
>>>     > > >       IntHolder seed = seedValue19;
>>>     > > >
>>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >       out.value =
>>>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>> in.value,
>>>     > > > seed.value);
>>>     > > >    }
>>>     > > >
>>>     > > >    out20 = out;
>>>     > > > }
>>>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>>>     > > > return out20 .value;
>>>     > > > }
>>>     > > > return 0;
>>>     > > > }
>>>     > > > }
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
>>> [email protected]>
>>>     > > > wrote:
>>>     > > >
>>>     > > >> HI Paul:
>>>     > > >>
>>>     > > >>  Thanks for your enthusiasm. I have managed this skill as you
>>> ever
>>>     > > >> mentioned me at another mail thread. It's really helpful
>>> ,thanks for
>>>     > > your
>>>     > > >> valuable work.
>>>     > > >>
>>>     > > >>   Now I have solved this tough problem by adding a customized
>>> JBlock
>>>     > > >> member field to the ClassGenerator. So once you want the
>>>     > getEvalBlock()
>>>     > > of
>>>     > > >> the ClassGenerator to return a inner customized JBlock , then
>>> you set
>>>     > > this
>>>     > > >> member, if you want the method to return eval self JBlock ,
>>> you reset
>>>     > > this
>>>     > > >> member to null.
>>>     > > >>
>>>     > > >>   Here is my changed setup method :
>>>     > > >>
>>>     > > >>
>>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>     > > MappingSet incomingMapping, VectorAccessible batch,
>>> LogicalExpression[]
>>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>>     > > >>   throws SchemaChangeException {
>>>     > > >>   cg.setMappingSet(incomingMapping);
>>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >>   }
>>>     > > >>   String seedValue = "seedValue";
>>>     > > >>   String fieldId = "fieldId";
>>>     > > >>   LogicalExpression seed =
>>>     > ValueExpressions.getParameterExpression(seedValue,
>>>     > > Types.required(TypeProtos.MinorType.INT));
>>>     > > >>
>>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>>>     > > getParameterExpression(fieldId, Types.required(
>>> TypeProtos.MinorType.INT)
>>>     > > );
>>>     > > >>   HoldingContainer fieldIdParamHolder =
>>> cg.addExpr(fieldIdParamExpr);
>>>     > > >>   int i = 0;
>>>     > > >>   for (LogicalExpression expr : keyExprs) {
>>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>>     > >
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>     > > ExpressionPosition.UNKNOWN);
>>>     > > >>
>>>     > > >>     JFieldRef targetBuildSideFieldId =
>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>>>     > >
>>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>>     > > >>     //specify a special JBlock which is a inner one of the
>>> eval block
>>>     > > to the ClassGenerator to substitute the returned JBlock of
>>> getEvalBlock()
>>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>     > > >>     LogicalExpression hashExpression =
>>>     > HashPrelUtil.getHashExpression(expr,
>>>     > > seed, incomingProbe != null);
>>>     > > >>     LogicalExpression materializedExpr =
>>> ExpressionTreeMaterializer.
>>>     > > materializeAndCheckErrors(hashExpression, batch,
>>>     > > context.getFunctionRegistry());
>>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>     > > >>     ifBlock._return(hash.getValue());
>>>     > > >>     //reset the customized block to null ,so the
>>> getEvalBlock() return
>>>     > > the truly eval JBlock
>>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>>>     > > >>     i++;
>>>     > > >>   }
>>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >> }
>>>     > > >>
>>>     > > >>
>>>     > > >>  The corresponding generated codes :
>>>     > > >>
>>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>> int
>>>     > > seedValue, int fieldId)
>>>     > > >>         throws SchemaChangeException
>>>     > > >>     {
>>>     > > >>         {
>>>     > > >>             IntHolder fieldId12 = new IntHolder();
>>>     > > >>             fieldId12 .value = fieldId;
>>>     > > >>             if (fieldId12 .value == constant14 .value) {
>>>     > > >>                 IntHolder out18 = new IntHolder();
>>>     > > >>                 {
>>>     > > >>                     out18 .value = vv15 .getAccessor().get((
>>>     > > incomingRowIdx));
>>>     > > >>                 }
>>>     > > >>                 IntHolder seedValue19 = new IntHolder();
>>>     > > >>                 seedValue19 .value = seedValue;
>>>     > > >>                 //---- start of eval portion of hash32AsDouble
>>>     > > function. ----//
>>>     > > >>                 IntHolder out20 = new IntHolder();
>>>     > > >>                 {
>>>     > > >>                     final IntHolder out = new IntHolder();
>>>     > > >>                     IntHolder in = out18;
>>>     > > >>                     IntHolder seed = seedValue19;
>>>     > > >>
>>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >>     out.value =
>>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>     > > in.value, seed.value);
>>>     > > >> }
>>>     > > >>
>>>     > > >>                     out20 = out;
>>>     > > >>                 }
>>>     > > >>                 //---- end of eval portion of hash32AsDouble
>>> function.
>>>     > > ----//
>>>     > > >>                 return out20 .value;
>>>     > > >>             }
>>>     > > >>             return  0;
>>>     > > >>         }
>>>     > > >>     }
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>   Some other explanation:
>>>     > > >>   1st : The if checking won't hurt the performance , as I
>>> invoke this
>>>     > > >> method column by column , so it's branch predication friendly.
>>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since the
>>>     > efficient
>>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid the
>>>     > conflict.
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>>>     > <[email protected]
>>>     > > >
>>>     > > >> wrote:
>>>     > > >>
>>>     > > >>> Hi Weijie,
>>>     > > >>>
>>>     > > >>> Seeing the discussion about the details of JCodeModel
>>> suggests you
>>>     > may
>>>     > > >>> be trying to debug your generated code at the level of the
>>> code
>>>     > > generator.
>>>     > > >>>
>>>     > > >>> Some time ago we added the ability to step through the
>>> generated
>>>     > code.
>>>     > > >>> Look for the following line in the generator code:
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>     // Uncomment out this line to debug the generated code.
>>>     > > >>>
>>>     > > >>> //    cg.saveCodeForDebugging(true);
>>>     > > >>>
>>>     > > >>>
>>>     > > >>> Uncomment the code line and Drill will save each generated
>>> file to a
>>>     > > >>> configured location (which, if I recall correctly, is
>>>     > > /tmp/drill/codegen,
>>>     > > >>> though it may have changed after Tim's test directory
>>> changes.)
>>>     > > >>>
>>>     > > >>> Then, set a breakpoint in the template setup() method and
>>> you can
>>>     > step
>>>     > > >>> directly into the generated doSetup() method. Same for the
>>> eval()
>>>     > > method.
>>>     > > >>>
>>>     > > >>> This way, you can not only see the generated code, you can
>>> step
>>>     > through
>>>     > > >>> it. I've found this to be a far easier way to understand the
>>>     > generated
>>>     > > code
>>>     > > >>> than the older techniques folks have used (look at byte
>>> codes, use
>>>     > > print
>>>     > > >>> statements, brute force reasoning, etc.)
>>>     > > >>>
>>>     > > >>> Tim, Boaz and others have used this technique more recently
>>> and can
>>>     > > >>> probably give you additional pointers.
>>>     > > >>>
>>>     > > >>> Thanks,
>>>     > > >>> - Paul
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>>>     > > >>> [email protected]> wrote:
>>>     > > >>>
>>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need
>>> an
>>>     > _else()
>>>     > > >>> block
>>>     > > >>> also ?"  I give a default return logic at the method, so I
>>> don't need
>>>     > > the
>>>     > > >>> _else() block.  I have noticed the IfExpression's evaluation
>>> method
>>>     > at
>>>     > > >>> EvaluationVisitor which also uses the JConditional . But
>>> that also
>>>     > > >>> doesn't
>>>     > > >>> match my requirement. I think the key point here is the
>>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
>>> put their
>>>     > > >>> corresponding generated codes to the eval method's JBlock ,
>>> not our
>>>     > > >>> specific IfBlock which is a inner block of the eval method's
>>> JBlock .
>>>     > > >>>
>>>     > > >>> So it seems I should make some changes to the ClassGenerator
>>> to let
>>>     > the
>>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
>>> JConditional's
>>>     > > then
>>>     > > >>> block) or implement some special FunctionHolderExpression
>>>     > > >>> 、ValueVectorReadExpression and corresponding visiting
>>> methods at the
>>>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
>>> someone
>>>     > who
>>>     > > >>> are
>>>     > > >>> familiar with these part of codes to point out whether there
>>> are more
>>>     > > >>> easy
>>>     > > >>> or different choices to achieve the target.
>>>     > > >>>
>>>     > > >>> To make discussion more accurate, I put the generated codes
>>> of the
>>>     > > >>> previous
>>>     > > >>> setupGetBuild64Hash method here:
>>>     > > >>>
>>>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>> int
>>>     > > >>> seedValue, int fieldId)
>>>     > > >>>         throws SchemaChangeException
>>>     > > >>>     {
>>>     > > >>>         {
>>>     > > >>>             IntHolder fieldId16 = new IntHolder();
>>>     > > >>>             fieldId16 .value = fieldId;
>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>     > > >>>                 return out24 .value;
>>>     > > >>>             }
>>>     > > >>>             IntHolder out22 = new IntHolder();
>>>     > > >>>             {
>>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>>>     > > incomingRowIdx));
>>>     > > >>>             }
>>>     > > >>>             IntHolder seedValue23 = new IntHolder();
>>>     > > >>>             seedValue23 .value = seedValue;
>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>> function.
>>>     > > >>> ----//
>>>     > > >>>             IntHolder out24 = new IntHolder();
>>>     > > >>>             {
>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>     > > >>>                 IntHolder in = out22;
>>>     > > >>>                 IntHolder seed = seedValue23;
>>>     > > >>>
>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >>>     out.value =
>>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>     > > >>> in.value, seed.value);
>>>     > > >>> }
>>>     > > >>>
>>>     > > >>>                 out24 = out;
>>>     > > >>>             }
>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>> function.
>>>     > > ----//
>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>     > > >>>                 return out26 .value;
>>>     > > >>>             }
>>>     > > >>>             IntHolder seedValue25 = new IntHolder();
>>>     > > >>>             seedValue25 .value = seedValue;
>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>> function.
>>>     > > >>> ----//
>>>     > > >>>             IntHolder out26 = new IntHolder();
>>>     > > >>>             {
>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>     > > >>>                 IntHolder in = out22;
>>>     > > >>>                 IntHolder seed = seedValue25;
>>>     > > >>>
>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >>>     out.value =
>>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>     > > >>> in.value, seed.value);
>>>     > > >>> }
>>>     > > >>>
>>>     > > >>>                 out26 = out;
>>>     > > >>>             }
>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>> function.
>>>     > > ----//
>>>     > > >>>             return  0;
>>>     > > >>>         }
>>>     > > >>>     }
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
>>> [email protected]>
>>>     > > >>> wrote:
>>>     > > >>>
>>>     > > >>> > sorry, the previous email is incomplete.
>>>     > > >>> > For the ifBlock, do you need an _else() block also ?
>>>     > > >>> >
>>>     > > >>> > I have sometimes found that 'JConditional' is a good way
>>> to break
>>>     > > down
>>>     > > >>> the
>>>     > > >>> > logic further.  Please see example usages of JConditional
>>> here [1].
>>>     > > >>> >
>>>     > > >>> > -Aman
>>>     > > >>> >
>>>     > > >>> > [1]
>>>     > > >>> >
>>>     > > >>> >
>>>     > > >>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>>> .
>>>     > > sun.codemodel.JBlock
>>>     > > >>> >
>>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
>>> [email protected]>
>>>     > > >>> wrote:
>>>     > > >>> >
>>>     > > >>> > > Hi Weijie,
>>>     > > >>> > > It would be a little cumbersome to debug such issues
>>> over email
>>>     > > >>> since one
>>>     > > >>> > > has to look at the generated code output and iteratively
>>> debug.
>>>     > > >>> > > Couple of thoughts I have that might help:
>>>     > > >>> > >
>>>     > > >>> > > For this particular if-then block, should you also
>>>     > > >>> > > JBlock ifBlock =
>>>     > > >>> > >
>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>     > > >>> > > tBuildSideFieldId))._then();
>>>     > > >>> > >
>>>     > > >>> > >
>>>     > > >>> > >
>>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>>>     > > >>> [email protected]>
>>>     > > >>> > > wrote:
>>>     > > >>> > >
>>>     > > >>> > >> HI All:
>>>     > > >>> > >>  Through implementing the JPPD feature (
>>>     > > >>> > >>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>>> , I was
>>>     > blocked
>>>     > > >>> by
>>>     > > >>> > the
>>>     > > >>> > >> problem: how to get the hash code of each build side of
>>> the hash
>>>     > > >>> join
>>>     > > >>> > >> columns through the dynamic generated java code. Hope
>>> someone
>>>     > can
>>>     > > >>> give
>>>     > > >>> > >> some
>>>     > > >>> > >> advice.
>>>     > > >>> > >>
>>>     > > >>> > >>    I supposed to add methods as below to the
>>> HashTableTemplate :
>>>     > > >>> > >>
>>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>>>     > seedValue,
>>>     > > >>> int
>>>     > > >>> > >> fieldId) throws SchemaChangeException{
>>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
>>> seedValue,
>>>     > > >>> fieldId);
>>>     > > >>> > >> }
>>>     > > >>> > >>
>>>     > > >>> > >> protected abstract long
>>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>>>     > > incomingRowIdx,
>>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int
>>>     > fieldId)
>>>     > > >>> > >> throws SchemaChangeException;
>>>     > > >>> > >>
>>>     > > >>> > >>
>>>     > > >>> > >>    The high level code to invoke the getBuild64HashCode
>>> method
>>>     > is
>>>     > > >>> at the
>>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>>>     > > >>> > >>
>>>     > > >>> > >> //create runtime filter
>>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>>>     > > >>> > >>  //create runtime filter and send out async
>>>     > > >>> > >>  int condFieldIndex = 0;
>>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>>>     > > >>> > >>    //VV
>>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>>>     > > >>> > >>      long hashCode =
>>> partitions[0].getBuild64HashCode(ind,
>>>     > > >>> > >> condFieldIndex);
>>>     > > >>> > >>      bloomFilter.insert(hashCode);
>>>     > > >>> > >>    }
>>>     > > >>> > >>    condFieldIndex++;
>>>     > > >>> > >>  }
>>>     > > >>> > >>  //TODO sered out async
>>>     > > >>> > >> }
>>>     > > >>> > >>
>>>     > > >>> > >>
>>>     > > >>> > >>  As you know, the abstract method
>>> getBuild64HashCodeInner needs
>>>     > to
>>>     > > >>> > >> calculate the hash codes of each build side column by
>>> the
>>>     > fieldId
>>>     > > >>> input
>>>     > > >>> > >> parameter. In order to achieve this target, I plan to
>>> have
>>>     > > different
>>>     > > >>> > >> solving parts corresponding to different column
>>> ValueVector ,
>>>     > > using
>>>     > > >>> the
>>>     > > >>> > if
>>>     > > >>> > >> statement to distinguish different solving parts
>>> through the id
>>>     > of
>>>     > > >>> the
>>>     > > >>> > >> column.  The corresponding method to generate the
>>> dynamic codes
>>>     > > is
>>>     > > >>> as
>>>     > > >>> > >> below:
>>>     > > >>> > >>
>>>     > > >>> > >> private void
>>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>>> buildKeyFieldIds)
>>>     > > >>> > >>  throws SchemaChangeException {
>>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >>> > >>  }
>>>     > > >>> > >>  String seedValue = "seedValue";
>>>     > > >>> > >>  String fieldId = "fieldId";
>>>     > > >>> > >>  LogicalExpression seed =
>>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>>     > > >>> > >>
>>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>>>     > > cg.addExpr(fieldIdParamExpr);
>>>     > > >>> > >>  int i = 0;
>>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>>>     > > >>> > >>    TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>     > > >>> > >>    ValueExpressions.IntExpression targetBuildFieldIdExp
>>> = new
>>>     > > >>> > >>
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>>     > > )[0],
>>>     > > >>> > >> ExpressionPosition.UNKNOWN);
>>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>     > > >>> > >>    JBlock ifBlock =
>>>     > > >>> > >>
>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>     > > >>> > >> tBuildSideFieldId))._then();
>>>     > > >>> > >>
>>>     > > >>> > >>    LogicalExpression hashExpression =
>>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed,
>>> incomingProbe !=
>>>     > > null);
>>>     > > >>> > >>    LogicalExpression materializedExpr =
>>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>>     > > hashExpression,
>>>     > > >>> > >> batch, context.getFunctionRegistry());
>>>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>>     > > >>> > >>
>>>     > > >>> > >>
>>>     > > >>> > >>    ifBlock._return(hash.getValue());
>>>     > > >>> > >>    i++;
>>>     > > >>> > >>  }
>>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >>> > >>
>>>     > > >>> > >> }
>>>     > > >>> > >>
>>>     > > >>> > >> But unfortunately, the generated codes are not what I
>>> expected.
>>>     > > The
>>>     > > >>> > codes
>>>     > > >>> > >> to read ValueVector , calculate hash code of the read
>>> value do
>>>     > not
>>>     > > >>> stay
>>>     > > >>> > in
>>>     > > >>> > >> the if block.  So how can I let the related codes stay
>>> in the if
>>>     > > >>> block ?
>>>     > > >>> > >>
>>>     > > >>> > >
>>>     > > >>> > >
>>>     > > >>> >
>>>     > > >>
>>>     > > >>
>>>     > >
>>>     >
>>>
>>>
>>>

Re: How to generate hash code for each build side one of the hash join columns

Reply via email to