Hi Weijie, I was hoping you could leverage the existing methods..so its good that you found the ones that work for your use case. One thing I want to point out (maybe you're already aware) .. the Hash Join code has changed significantly in the master branch due to the spill-to-disk feature. So, this may pose some integration challenges for your run-time join pushdown feature. Also, one other question/clarification: for the bloom filter itself are you implementing it natively in Drill or using an external library ?
-Aman On Tue, May 29, 2018 at 8:23 PM, weijie tong <tongweijie...@gmail.com> wrote: > I found ClassGenerator's nestEvalBlock(JBlock block) and unNestEvalBlock() > which has the same effect to what I change to the ClassGenerator. So I give > up what I change to the ClassGenerator and hope this can help someone else. > > On Tue, May 29, 2018 at 1:53 PM weijie tong <tongweijie...@gmail.com> > wrote: > > > The code formatting is not nice. Put them again: > > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg, > MappingSet > > incomingMapping, VectorAccessible batch, LogicalExpression[] keyExprs, > > TypedFieldId[] buildKeyFieldIds) > > throws SchemaChangeException { > > cg.setMappingSet(incomingMapping); > > if (keyExprs == null || keyExprs.length == 0) { > > cg.getEvalBlock()._return(JExpr.lit(0)); > > } > > String seedValue = "seedValue"; > > String fieldId = "fieldId"; > > LogicalExpression seed = > > ValueExpressions.getParameterExpression(seedValue, Types.required( > > TypeProtos.MinorType.INT)); > > > > LogicalExpression fieldIdParamExpr = > > ValueExpressions.getParameterExpression(fieldId, Types.required( > > TypeProtos.MinorType.INT) ); > > HoldingContainer fieldIdParamHolder = cg.addExpr(fieldIdParamExpr); > > int i = 0; > > for (LogicalExpression expr : keyExprs) { > > TypedFieldId targetTypeFieldId = buildKeyFieldIds[i]; > > ValueExpressions.IntExpression targetBuildFieldIdExp = new > > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0], > > ExpressionPosition.UNKNOWN); > > > > JFieldRef targetBuildSideFieldId = cg.addExpr(targetBuildFieldIdExp, > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue(); > > JBlock ifBlock = > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue(). > eq(targetBuildSideFieldId))._then(); > > //specify a special JBlock which is a inner one of the eval block to > > the ClassGenerator to substitute the returned JBlock of getEvalBlock() > > cg.setCustomizedEvalInnerBlock(ifBlock); > > LogicalExpression hashExpression = > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe != null); > > LogicalExpression materializedExpr = > > ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression, > batch, > > context.getFunctionRegistry()); > > HoldingContainer hash = cg.addExpr(materializedExpr, > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND); > > ifBlock._return(hash.getValue()); > > //reset the customized block to null ,so the getEvalBlock() return > the > > truly eval JBlock > > cg.setCustomizedEvalInnerBlock(null); > > i++; > > } > > cg.getEvalBlock()._return(JExpr.lit(0)); > > } > > > > > > > > > > public long getBuild64HashCodeInner(int incomingRowIdx, int seedValue, > int > > fieldId) > > throws SchemaChangeException > > { > > { > > IntHolder fieldId12 = new IntHolder(); > > fieldId12 .value = fieldId; > > if (fieldId12 .value == constant14 .value) { > > IntHolder out18 = new IntHolder(); > > { > > out18 .value = vv15 .getAccessor().get((incomingRowIdx)); > > } > > IntHolder seedValue19 = new IntHolder(); > > seedValue19 .value = seedValue; > > //---- start of eval portion of hash32AsDouble function. ----// > > IntHolder out20 = new IntHolder(); > > { > > final IntHolder out = new IntHolder(); > > IntHolder in = out18; > > IntHolder seed = seedValue19; > > > > Hash32WithSeedAsDouble$IntHash_eval: { > > out.value = > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) in.value, > > seed.value); > > } > > > > out20 = out; > > } > > //---- end of eval portion of hash32AsDouble function. ----// > > return out20 .value; > > } > > return 0; > > } > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <tongweijie...@gmail.com> > > wrote: > > > >> HI Paul: > >> > >> Thanks for your enthusiasm. I have managed this skill as you ever > >> mentioned me at another mail thread. It's really helpful ,thanks for > your > >> valuable work. > >> > >> Now I have solved this tough problem by adding a customized JBlock > >> member field to the ClassGenerator. So once you want the getEvalBlock() > of > >> the ClassGenerator to return a inner customized JBlock , then you set > this > >> member, if you want the method to return eval self JBlock , you reset > this > >> member to null. > >> > >> Here is my changed setup method : > >> > >> > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg, > MappingSet incomingMapping, VectorAccessible batch, LogicalExpression[] > keyExprs, TypedFieldId[] buildKeyFieldIds) > >> throws SchemaChangeException { > >> cg.setMappingSet(incomingMapping); > >> if (keyExprs == null || keyExprs.length == 0) { > >> cg.getEvalBlock()._return(JExpr.lit(0)); > >> } > >> String seedValue = "seedValue"; > >> String fieldId = "fieldId"; > >> LogicalExpression seed = > >> ValueExpressions.getParameterExpression(seedValue, > Types.required(TypeProtos.MinorType.INT)); > >> > >> LogicalExpression fieldIdParamExpr = ValueExpressions. > getParameterExpression(fieldId, Types.required(TypeProtos.MinorType.INT) > ); > >> HoldingContainer fieldIdParamHolder = cg.addExpr(fieldIdParamExpr); > >> int i = 0; > >> for (LogicalExpression expr : keyExprs) { > >> TypedFieldId targetTypeFieldId = buildKeyFieldIds[i]; > >> ValueExpressions.IntExpression targetBuildFieldIdExp = new > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0], > ExpressionPosition.UNKNOWN); > >> > >> JFieldRef targetBuildSideFieldId = cg.addExpr(targetBuildFieldIdExp, > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue(); > >> JBlock ifBlock = cg.getEvalBlock()._if( > fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then(); > >> //specify a special JBlock which is a inner one of the eval block > to the ClassGenerator to substitute the returned JBlock of getEvalBlock() > >> cg.setCustomizedEvalInnerBlock(ifBlock); > >> LogicalExpression hashExpression = HashPrelUtil.getHashExpression(expr, > seed, incomingProbe != null); > >> LogicalExpression materializedExpr = ExpressionTreeMaterializer. > materializeAndCheckErrors(hashExpression, batch, > context.getFunctionRegistry()); > >> HoldingContainer hash = cg.addExpr(materializedExpr, > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND); > >> ifBlock._return(hash.getValue()); > >> //reset the customized block to null ,so the getEvalBlock() return > the truly eval JBlock > >> cg.setCustomizedEvalInnerBlock(null); > >> i++; > >> } > >> cg.getEvalBlock()._return(JExpr.lit(0)); > >> } > >> > >> > >> The corresponding generated codes : > >> > >> public long getBuild64HashCodeInner(int incomingRowIdx, int > seedValue, int fieldId) > >> throws SchemaChangeException > >> { > >> { > >> IntHolder fieldId12 = new IntHolder(); > >> fieldId12 .value = fieldId; > >> if (fieldId12 .value == constant14 .value) { > >> IntHolder out18 = new IntHolder(); > >> { > >> out18 .value = vv15 .getAccessor().get(( > incomingRowIdx)); > >> } > >> IntHolder seedValue19 = new IntHolder(); > >> seedValue19 .value = seedValue; > >> //---- start of eval portion of hash32AsDouble > function. ----// > >> IntHolder out20 = new IntHolder(); > >> { > >> final IntHolder out = new IntHolder(); > >> IntHolder in = out18; > >> IntHolder seed = seedValue19; > >> > >> Hash32WithSeedAsDouble$IntHash_eval: { > >> out.value = > >> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) > in.value, seed.value); > >> } > >> > >> out20 = out; > >> } > >> //---- end of eval portion of hash32AsDouble function. > ----// > >> return out20 .value; > >> } > >> return 0; > >> } > >> } > >> > >> > >> > >> Some other explanation: > >> 1st : The if checking won't hurt the performance , as I invoke this > >> method column by column , so it's branch predication friendly. > >> 2nd: I will use the murmur3_64 not the murmur3_32 ,since the efficient > >> bloom filter algorithm needs the 64 bit hash code to avoid the conflict. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers <par0...@yahoo.com.invalid > > > >> wrote: > >> > >>> Hi Weijie, > >>> > >>> Seeing the discussion about the details of JCodeModel suggests you may > >>> be trying to debug your generated code at the level of the code > generator. > >>> > >>> Some time ago we added the ability to step through the generated code. > >>> Look for the following line in the generator code: > >>> > >>> > >>> // Uncomment out this line to debug the generated code. > >>> > >>> // cg.saveCodeForDebugging(true); > >>> > >>> > >>> Uncomment the code line and Drill will save each generated file to a > >>> configured location (which, if I recall correctly, is > /tmp/drill/codegen, > >>> though it may have changed after Tim's test directory changes.) > >>> > >>> Then, set a breakpoint in the template setup() method and you can step > >>> directly into the generated doSetup() method. Same for the eval() > method. > >>> > >>> This way, you can not only see the generated code, you can step through > >>> it. I've found this to be a far easier way to understand the generated > code > >>> than the older techniques folks have used (look at byte codes, use > print > >>> statements, brute force reasoning, etc.) > >>> > >>> Tim, Boaz and others have used this technique more recently and can > >>> probably give you additional pointers. > >>> > >>> Thanks, > >>> - Paul > >>> > >>> > >>> > >>> On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong < > >>> tongweijie...@gmail.com> wrote: > >>> > >>> @aman thanks for your reply. "For the ifBlock, do you need an _else() > >>> block > >>> also ?" I give a default return logic at the method, so I don't need > the > >>> _else() block. I have noticed the IfExpression's evaluation method at > >>> EvaluationVisitor which also uses the JConditional . But that also > >>> doesn't > >>> match my requirement. I think the key point here is the > >>> FunctionHolderExpression and ValueVectorReadExpression will put their > >>> corresponding generated codes to the eval method's JBlock , not our > >>> specific IfBlock which is a inner block of the eval method's JBlock . > >>> > >>> So it seems I should make some changes to the ClassGenerator to let the > >>> getEvalBlock return the IfBlock (maybe accurately the JConditional's > then > >>> block) or implement some special FunctionHolderExpression > >>> 、ValueVectorReadExpression and corresponding visiting methods at the > >>> EvaluationVisitor to generate the special code blocks. Hope someone who > >>> are > >>> familiar with these part of codes to point out whether there are more > >>> easy > >>> or different choices to achieve the target. > >>> > >>> To make discussion more accurate, I put the generated codes of the > >>> previous > >>> setupGetBuild64Hash method here: > >>> > >>> public long getBuild64HashCodeInner(int incomingRowIdx, int > >>> seedValue, int fieldId) > >>> throws SchemaChangeException > >>> { > >>> { > >>> IntHolder fieldId16 = new IntHolder(); > >>> fieldId16 .value = fieldId; > >>> if (fieldId16 .value == constant18 .value) { > >>> return out24 .value; > >>> } > >>> IntHolder out22 = new IntHolder(); > >>> { > >>> out22 .value = vv19 .getAccessor().get(( > incomingRowIdx)); > >>> } > >>> IntHolder seedValue23 = new IntHolder(); > >>> seedValue23 .value = seedValue; > >>> //---- start of eval portion of hash32AsDouble function. > >>> ----// > >>> IntHolder out24 = new IntHolder(); > >>> { > >>> final IntHolder out = new IntHolder(); > >>> IntHolder in = out22; > >>> IntHolder seed = seedValue23; > >>> > >>> Hash32WithSeedAsDouble$IntHash_eval: { > >>> out.value = > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) > >>> in.value, seed.value); > >>> } > >>> > >>> out24 = out; > >>> } > >>> //---- end of eval portion of hash32AsDouble function. > ----// > >>> if (fieldId16 .value == constant18 .value) { > >>> return out26 .value; > >>> } > >>> IntHolder seedValue25 = new IntHolder(); > >>> seedValue25 .value = seedValue; > >>> //---- start of eval portion of hash32AsDouble function. > >>> ----// > >>> IntHolder out26 = new IntHolder(); > >>> { > >>> final IntHolder out = new IntHolder(); > >>> IntHolder in = out22; > >>> IntHolder seed = seedValue25; > >>> > >>> Hash32WithSeedAsDouble$IntHash_eval: { > >>> out.value = > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) > >>> in.value, seed.value); > >>> } > >>> > >>> out26 = out; > >>> } > >>> //---- end of eval portion of hash32AsDouble function. > ----// > >>> return 0; > >>> } > >>> } > >>> > >>> > >>> > >>> > >>> > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <amansi...@apache.org> > >>> wrote: > >>> > >>> > sorry, the previous email is incomplete. > >>> > For the ifBlock, do you need an _else() block also ? > >>> > > >>> > I have sometimes found that 'JConditional' is a good way to break > down > >>> the > >>> > logic further. Please see example usages of JConditional here [1]. > >>> > > >>> > -Aman > >>> > > >>> > [1] > >>> > > >>> > > >>> https://www.programcreek.com/java-api-examples/?api=com. > sun.codemodel.JBlock > >>> > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <amansi...@apache.org> > >>> wrote: > >>> > > >>> > > Hi Weijie, > >>> > > It would be a little cumbersome to debug such issues over email > >>> since one > >>> > > has to look at the generated code output and iteratively debug. > >>> > > Couple of thoughts I have that might help: > >>> > > > >>> > > For this particular if-then block, should you also > >>> > > JBlock ifBlock = > >>> > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe > >>> > > tBuildSideFieldId))._then(); > >>> > > > >>> > > > >>> > > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong < > >>> tongweijie...@gmail.com> > >>> > > wrote: > >>> > > > >>> > >> HI All: > >>> > >> Through implementing the JPPD feature ( > >>> > >> https://issues.apache.org/jira/browse/DRILL-6385) , I was blocked > >>> by > >>> > the > >>> > >> problem: how to get the hash code of each build side of the hash > >>> join > >>> > >> columns through the dynamic generated java code. Hope someone can > >>> give > >>> > >> some > >>> > >> advice. > >>> > >> > >>> > >> I supposed to add methods as below to the HashTableTemplate : > >>> > >> > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int seedValue, > >>> int > >>> > >> fieldId) throws SchemaChangeException{ > >>> > >> return getBuild64HashCodeInner(incomingRowIdx, seedValue, > >>> fieldId); > >>> > >> } > >>> > >> > >>> > >> protected abstract long > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int > incomingRowIdx, > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int fieldId) > >>> > >> throws SchemaChangeException; > >>> > >> > >>> > >> > >>> > >> The high level code to invoke the getBuild64HashCode method is > >>> at the > >>> > >> HashJoinBatch's executeBuildPhase() : > >>> > >> > >>> > >> //create runtime filter > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) { > >>> > >> //create runtime filter and send out async > >>> > >> int condFieldIndex = 0; > >>> > >> for (BloomFilter bloomFilter : bloomFilters) { > >>> > >> //VV > >>> > >> for (int ind = 0; ind < currentRecordCount; ind++) { > >>> > >> long hashCode = partitions[0].getBuild64HashCode(ind, > >>> > >> condFieldIndex); > >>> > >> bloomFilter.insert(hashCode); > >>> > >> } > >>> > >> condFieldIndex++; > >>> > >> } > >>> > >> //TODO sered out async > >>> > >> } > >>> > >> > >>> > >> > >>> > >> As you know, the abstract method getBuild64HashCodeInner needs to > >>> > >> calculate the hash codes of each build side column by the fieldId > >>> input > >>> > >> parameter. In order to achieve this target, I plan to have > different > >>> > >> solving parts corresponding to different column ValueVector , > using > >>> the > >>> > if > >>> > >> statement to distinguish different solving parts through the id of > >>> the > >>> > >> column. The corresponding method to generate the dynamic codes > is > >>> as > >>> > >> below: > >>> > >> > >>> > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg, > >>> > >> MappingSet incomingMapping, VectorAccessible batch, > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[] buildKeyFieldIds) > >>> > >> throws SchemaChangeException { > >>> > >> cg.setMappingSet(incomingMapping); > >>> > >> if (keyExprs == null || keyExprs.length == 0) { > >>> > >> cg.getEvalBlock()._return(JExpr.lit(0)); > >>> > >> } > >>> > >> String seedValue = "seedValue"; > >>> > >> String fieldId = "fieldId"; > >>> > >> LogicalExpression seed = > >>> > >> ValueExpressions.getParameterExpression(seedValue, > >>> > >> Types.required(TypeProtos.MinorType.INT)); > >>> > >> > >>> > >> LogicalExpression fieldIdParamExpr = > >>> > >> ValueExpressions.getParameterExpression(fieldId, > >>> > >> Types.required(TypeProtos.MinorType.INT) ); > >>> > >> HoldingContainer fieldIdParamHolder = > cg.addExpr(fieldIdParamExpr); > >>> > >> int i = 0; > >>> > >> for (LogicalExpression expr : keyExprs) { > >>> > >> TypedFieldId targetTypeFieldId = buildKeyFieldIds[i]; > >>> > >> ValueExpressions.IntExpression targetBuildFieldIdExp = new > >>> > >> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds( > )[0], > >>> > >> ExpressionPosition.UNKNOWN); > >>> > >> JFieldRef targetBuildSideFieldId = > >>> > >> cg.addExpr(targetBuildFieldIdExp, > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue(); > >>> > >> JBlock ifBlock = > >>> > >> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe > >>> > >> tBuildSideFieldId))._then(); > >>> > >> > >>> > >> LogicalExpression hashExpression = > >>> > >> HashPrelUtil.getHashExpression(expr, seed, incomingProbe != > null); > >>> > >> LogicalExpression materializedExpr = > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors( > hashExpression, > >>> > >> batch, context.getFunctionRegistry()); > >>> > >> HoldingContainer hash = cg.addExpr(materializedExpr, > >>> > >> ClassGenerator.BlkCreateMode.FALSE); > >>> > >> > >>> > >> > >>> > >> ifBlock._return(hash.getValue()); > >>> > >> i++; > >>> > >> } > >>> > >> cg.getEvalBlock()._return(JExpr.lit(0)); > >>> > >> > >>> > >> } > >>> > >> > >>> > >> But unfortunately, the generated codes are not what I expected. > The > >>> > codes > >>> > >> to read ValueVector , calculate hash code of the read value do not > >>> stay > >>> > in > >>> > >> the if block. So how can I let the related codes stay in the if > >>> block ? > >>> > >> > >>> > > > >>> > > > >>> > > >> > >> >