I find the answer that RecordBatch's max size is 2^16 which is defined at RecordBatch's MAX_BATCH_SIZE.
On Fri, Jun 1, 2018 at 3:36 PM weijie tong <[email protected]> wrote: > Some questions about SelectionVector2 and SelectionVector4: > > I want to create SelectionVector4 or SelectionVector2 to represent the > filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does > not support SelectVector4 . And the SelectionVector2's record count size is > char type size . So why SelectionVector4 is not supported by the > ProjectBatch ? The same question is to the FilterBatch's SelectVector2 > which also only support the 2 Byte size record count. > > On Fri, Jun 1, 2018 at 1:40 PM weijie tong <[email protected]> > wrote: > >> Hi Boaz: >> >> Your propose is valuable though I have implemented the dynamic >> generating code logic. If a ``` long hash64(int index, long seed) ``` >> method is added to the ValueVector , it will also benefit others to >> implement specific storage plugin's filter logic by using the pushed down >> bloom filter. To HashJoin and HashAggregate , methods ```double >> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int >> seed)``` will also be needed to the ValueVector. If no one else gives >> objection , I will be pleasure to take this work. >> >> Btw, I will share my thought about the scan side's filter logic by the >> BloomFilter. The scan side filter logic here I supposed to do is to filter >> the materialized ValueVector ,not at the process to construct the >> ValueVector from the original storage format data. The reason is the >> checking logic will break down the performance to materialize the original >> deep storage format data to ValueVector. >> >> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <[email protected]> wrote: >> >>> Hi Weijie, >>> >>> Another option is to totally avoid the generated code. >>> We were considering the idea of replacing the generated code used for >>> computing hash values with “real java” code. >>> >>> This idea is analogous to the usage of the copyEntry() method in the >>> ValueVector interface (that Paul added last year). >>> See an example of using the copyEntry() (via the appendRow() in >>> VectorContainer) in the new Hash-Join-Spill code. >>> Basically no need to generate “type specific” code, as the virtual >>> copyEntry() method does the “type specific” work. >>> >>> Similarly we could have a hash64() method in ValueVector, which would >>> perform the “type specific” computation. >>> (One difference from copyEntry() – the hash64() would also need to take >>> the “seed” parameter, which is the hash value produced by the previous >>> hash). >>> And similar to appendRow(), there would be evalHash() iterating over the >>> key columns. >>> (And one difference from appendRow() – need to iterate only on the key >>> columns; these are the first columns; their number can be found from the >>> config: e.g., htConfig.getKeyExprsBuild().size() ) >>> >>> With such implementation, that evalHash() could be used anywhere >>> (e.g., to match the Bloom filters on the left side of the join). >>> >>> Thanks, >>> >>> Boaz >>> >>> >>> On 5/30/18, 7:49 PM, "weijie tong" <[email protected]> wrote: >>> >>> Hi Aman: >>> >>> Thanks for your tips. I have rebased the latest code from the >>> master >>> branch . Yes, the spill-to-disk feature does changed the original >>> implementation. I have adjusted my implementation according to the >>> new >>> feature. But as you say, it will take some challenge to integration >>> as I >>> noticed the spill-to-disk feature will continue to tune its >>> implementation >>> performance. >>> >>> The BloomFilter was implemented natively in Drill , not an external >>> library. It's implemented the algorithm of the paper which was >>> mentioned by >>> you. >>> >>> >>> On Thu, May 31, 2018 at 1:56 AM Aman Sinha <[email protected]> >>> wrote: >>> >>> > Hi Weijie, >>> > I was hoping you could leverage the existing methods..so its good >>> that you >>> > found the ones that work for your use case. >>> > One thing I want to point out (maybe you're already aware) .. the >>> Hash Join >>> > code has changed significantly in the master branch due to the >>> > spill-to-disk feature. >>> > So, this may pose some integration challenges for your run-time >>> join >>> > pushdown feature. >>> > Also, one other question/clarification: for the bloom filter >>> itself are >>> > you implementing it natively in Drill or using an external library >>> ? >>> > >>> > -Aman >>> > >>> > On Tue, May 29, 2018 at 8:23 PM, weijie tong < >>> [email protected]> >>> > wrote: >>> > >>> > > I found ClassGenerator's nestEvalBlock(JBlock block) and >>> > unNestEvalBlock() >>> > > which has the same effect to what I change to the >>> ClassGenerator. So I >>> > give >>> > > up what I change to the ClassGenerator and hope this can help >>> someone >>> > else. >>> > > >>> > > On Tue, May 29, 2018 at 1:53 PM weijie tong < >>> [email protected]> >>> > > wrote: >>> > > >>> > > > The code formatting is not nice. Put them again: >>> > > > >>> > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg, >>> > > MappingSet >>> > > > incomingMapping, VectorAccessible batch, LogicalExpression[] >>> keyExprs, >>> > > > TypedFieldId[] buildKeyFieldIds) >>> > > > throws SchemaChangeException { >>> > > > cg.setMappingSet(incomingMapping); >>> > > > if (keyExprs == null || keyExprs.length == 0) { >>> > > > cg.getEvalBlock()._return(JExpr.lit(0)); >>> > > > } >>> > > > String seedValue = "seedValue"; >>> > > > String fieldId = "fieldId"; >>> > > > LogicalExpression seed = >>> > > > ValueExpressions.getParameterExpression(seedValue, >>> Types.required( >>> > > > TypeProtos.MinorType.INT)); >>> > > > >>> > > > LogicalExpression fieldIdParamExpr = >>> > > > ValueExpressions.getParameterExpression(fieldId, >>> Types.required( >>> > > > TypeProtos.MinorType.INT) ); >>> > > > HoldingContainer fieldIdParamHolder = >>> cg.addExpr(fieldIdParamExpr); >>> > > > int i = 0; >>> > > > for (LogicalExpression expr : keyExprs) { >>> > > > TypedFieldId targetTypeFieldId = buildKeyFieldIds[i]; >>> > > > ValueExpressions.IntExpression targetBuildFieldIdExp = new >>> > > > >>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0], >>> > > > ExpressionPosition.UNKNOWN); >>> > > > >>> > > > JFieldRef targetBuildSideFieldId = >>> > cg.addExpr(targetBuildFieldIdExp, >>> > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue(); >>> > > > JBlock ifBlock = >>> > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue(). >>> > > eq(targetBuildSideFieldId))._then(); >>> > > > //specify a special JBlock which is a inner one of the >>> eval block >>> > to >>> > > > the ClassGenerator to substitute the returned JBlock of >>> getEvalBlock() >>> > > > cg.setCustomizedEvalInnerBlock(ifBlock); >>> > > > LogicalExpression hashExpression = >>> > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe != >>> null); >>> > > > LogicalExpression materializedExpr = >>> > > > >>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression, >>> > > batch, >>> > > > context.getFunctionRegistry()); >>> > > > HoldingContainer hash = cg.addExpr(materializedExpr, >>> > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND); >>> > > > ifBlock._return(hash.getValue()); >>> > > > //reset the customized block to null ,so the >>> getEvalBlock() return >>> > > the >>> > > > truly eval JBlock >>> > > > cg.setCustomizedEvalInnerBlock(null); >>> > > > i++; >>> > > > } >>> > > > cg.getEvalBlock()._return(JExpr.lit(0)); >>> > > > } >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > public long getBuild64HashCodeInner(int incomingRowIdx, int >>> seedValue, >>> > > int >>> > > > fieldId) >>> > > > throws SchemaChangeException >>> > > > { >>> > > > { >>> > > > IntHolder fieldId12 = new IntHolder(); >>> > > > fieldId12 .value = fieldId; >>> > > > if (fieldId12 .value == constant14 .value) { >>> > > > IntHolder out18 = new IntHolder(); >>> > > > { >>> > > > out18 .value = vv15 .getAccessor().get((incomingRowIdx)); >>> > > > } >>> > > > IntHolder seedValue19 = new IntHolder(); >>> > > > seedValue19 .value = seedValue; >>> > > > //---- start of eval portion of hash32AsDouble function. >>> ----// >>> > > > IntHolder out20 = new IntHolder(); >>> > > > { >>> > > > final IntHolder out = new IntHolder(); >>> > > > IntHolder in = out18; >>> > > > IntHolder seed = seedValue19; >>> > > > >>> > > > Hash32WithSeedAsDouble$IntHash_eval: { >>> > > > out.value = >>> > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) >>> in.value, >>> > > > seed.value); >>> > > > } >>> > > > >>> > > > out20 = out; >>> > > > } >>> > > > //---- end of eval portion of hash32AsDouble function. ----// >>> > > > return out20 .value; >>> > > > } >>> > > > return 0; >>> > > > } >>> > > > } >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > On Tue, May 29, 2018 at 1:47 PM weijie tong < >>> [email protected]> >>> > > > wrote: >>> > > > >>> > > >> HI Paul: >>> > > >> >>> > > >> Thanks for your enthusiasm. I have managed this skill as you >>> ever >>> > > >> mentioned me at another mail thread. It's really helpful >>> ,thanks for >>> > > your >>> > > >> valuable work. >>> > > >> >>> > > >> Now I have solved this tough problem by adding a customized >>> JBlock >>> > > >> member field to the ClassGenerator. So once you want the >>> > getEvalBlock() >>> > > of >>> > > >> the ClassGenerator to return a inner customized JBlock , then >>> you set >>> > > this >>> > > >> member, if you want the method to return eval self JBlock , >>> you reset >>> > > this >>> > > >> member to null. >>> > > >> >>> > > >> Here is my changed setup method : >>> > > >> >>> > > >> >>> > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg, >>> > > MappingSet incomingMapping, VectorAccessible batch, >>> LogicalExpression[] >>> > > keyExprs, TypedFieldId[] buildKeyFieldIds) >>> > > >> throws SchemaChangeException { >>> > > >> cg.setMappingSet(incomingMapping); >>> > > >> if (keyExprs == null || keyExprs.length == 0) { >>> > > >> cg.getEvalBlock()._return(JExpr.lit(0)); >>> > > >> } >>> > > >> String seedValue = "seedValue"; >>> > > >> String fieldId = "fieldId"; >>> > > >> LogicalExpression seed = >>> > ValueExpressions.getParameterExpression(seedValue, >>> > > Types.required(TypeProtos.MinorType.INT)); >>> > > >> >>> > > >> LogicalExpression fieldIdParamExpr = ValueExpressions. >>> > > getParameterExpression(fieldId, Types.required( >>> TypeProtos.MinorType.INT) >>> > > ); >>> > > >> HoldingContainer fieldIdParamHolder = >>> cg.addExpr(fieldIdParamExpr); >>> > > >> int i = 0; >>> > > >> for (LogicalExpression expr : keyExprs) { >>> > > >> TypedFieldId targetTypeFieldId = buildKeyFieldIds[i]; >>> > > >> ValueExpressions.IntExpression targetBuildFieldIdExp = new >>> > > >>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0], >>> > > ExpressionPosition.UNKNOWN); >>> > > >> >>> > > >> JFieldRef targetBuildSideFieldId = >>> > cg.addExpr(targetBuildFieldIdExp, >>> > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue(); >>> > > >> JBlock ifBlock = cg.getEvalBlock()._if( >>> > > >>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then(); >>> > > >> //specify a special JBlock which is a inner one of the >>> eval block >>> > > to the ClassGenerator to substitute the returned JBlock of >>> getEvalBlock() >>> > > >> cg.setCustomizedEvalInnerBlock(ifBlock); >>> > > >> LogicalExpression hashExpression = >>> > HashPrelUtil.getHashExpression(expr, >>> > > seed, incomingProbe != null); >>> > > >> LogicalExpression materializedExpr = >>> ExpressionTreeMaterializer. >>> > > materializeAndCheckErrors(hashExpression, batch, >>> > > context.getFunctionRegistry()); >>> > > >> HoldingContainer hash = cg.addExpr(materializedExpr, >>> > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND); >>> > > >> ifBlock._return(hash.getValue()); >>> > > >> //reset the customized block to null ,so the >>> getEvalBlock() return >>> > > the truly eval JBlock >>> > > >> cg.setCustomizedEvalInnerBlock(null); >>> > > >> i++; >>> > > >> } >>> > > >> cg.getEvalBlock()._return(JExpr.lit(0)); >>> > > >> } >>> > > >> >>> > > >> >>> > > >> The corresponding generated codes : >>> > > >> >>> > > >> public long getBuild64HashCodeInner(int incomingRowIdx, >>> int >>> > > seedValue, int fieldId) >>> > > >> throws SchemaChangeException >>> > > >> { >>> > > >> { >>> > > >> IntHolder fieldId12 = new IntHolder(); >>> > > >> fieldId12 .value = fieldId; >>> > > >> if (fieldId12 .value == constant14 .value) { >>> > > >> IntHolder out18 = new IntHolder(); >>> > > >> { >>> > > >> out18 .value = vv15 .getAccessor().get(( >>> > > incomingRowIdx)); >>> > > >> } >>> > > >> IntHolder seedValue19 = new IntHolder(); >>> > > >> seedValue19 .value = seedValue; >>> > > >> //---- start of eval portion of hash32AsDouble >>> > > function. ----// >>> > > >> IntHolder out20 = new IntHolder(); >>> > > >> { >>> > > >> final IntHolder out = new IntHolder(); >>> > > >> IntHolder in = out18; >>> > > >> IntHolder seed = seedValue19; >>> > > >> >>> > > >> Hash32WithSeedAsDouble$IntHash_eval: { >>> > > >> out.value = >>> > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) >>> > > in.value, seed.value); >>> > > >> } >>> > > >> >>> > > >> out20 = out; >>> > > >> } >>> > > >> //---- end of eval portion of hash32AsDouble >>> function. >>> > > ----// >>> > > >> return out20 .value; >>> > > >> } >>> > > >> return 0; >>> > > >> } >>> > > >> } >>> > > >> >>> > > >> >>> > > >> >>> > > >> Some other explanation: >>> > > >> 1st : The if checking won't hurt the performance , as I >>> invoke this >>> > > >> method column by column , so it's branch predication friendly. >>> > > >> 2nd: I will use the murmur3_64 not the murmur3_32 ,since the >>> > efficient >>> > > >> bloom filter algorithm needs the 64 bit hash code to avoid the >>> > conflict. >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers >>> > <[email protected] >>> > > > >>> > > >> wrote: >>> > > >> >>> > > >>> Hi Weijie, >>> > > >>> >>> > > >>> Seeing the discussion about the details of JCodeModel >>> suggests you >>> > may >>> > > >>> be trying to debug your generated code at the level of the >>> code >>> > > generator. >>> > > >>> >>> > > >>> Some time ago we added the ability to step through the >>> generated >>> > code. >>> > > >>> Look for the following line in the generator code: >>> > > >>> >>> > > >>> >>> > > >>> // Uncomment out this line to debug the generated code. >>> > > >>> >>> > > >>> // cg.saveCodeForDebugging(true); >>> > > >>> >>> > > >>> >>> > > >>> Uncomment the code line and Drill will save each generated >>> file to a >>> > > >>> configured location (which, if I recall correctly, is >>> > > /tmp/drill/codegen, >>> > > >>> though it may have changed after Tim's test directory >>> changes.) >>> > > >>> >>> > > >>> Then, set a breakpoint in the template setup() method and >>> you can >>> > step >>> > > >>> directly into the generated doSetup() method. Same for the >>> eval() >>> > > method. >>> > > >>> >>> > > >>> This way, you can not only see the generated code, you can >>> step >>> > through >>> > > >>> it. I've found this to be a far easier way to understand the >>> > generated >>> > > code >>> > > >>> than the older techniques folks have used (look at byte >>> codes, use >>> > > print >>> > > >>> statements, brute force reasoning, etc.) >>> > > >>> >>> > > >>> Tim, Boaz and others have used this technique more recently >>> and can >>> > > >>> probably give you additional pointers. >>> > > >>> >>> > > >>> Thanks, >>> > > >>> - Paul >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong < >>> > > >>> [email protected]> wrote: >>> > > >>> >>> > > >>> @aman thanks for your reply. "For the ifBlock, do you need >>> an >>> > _else() >>> > > >>> block >>> > > >>> also ?" I give a default return logic at the method, so I >>> don't need >>> > > the >>> > > >>> _else() block. I have noticed the IfExpression's evaluation >>> method >>> > at >>> > > >>> EvaluationVisitor which also uses the JConditional . But >>> that also >>> > > >>> doesn't >>> > > >>> match my requirement. I think the key point here is the >>> > > >>> FunctionHolderExpression and ValueVectorReadExpression will >>> put their >>> > > >>> corresponding generated codes to the eval method's JBlock , >>> not our >>> > > >>> specific IfBlock which is a inner block of the eval method's >>> JBlock . >>> > > >>> >>> > > >>> So it seems I should make some changes to the ClassGenerator >>> to let >>> > the >>> > > >>> getEvalBlock return the IfBlock (maybe accurately the >>> JConditional's >>> > > then >>> > > >>> block) or implement some special FunctionHolderExpression >>> > > >>> 、ValueVectorReadExpression and corresponding visiting >>> methods at the >>> > > >>> EvaluationVisitor to generate the special code blocks. Hope >>> someone >>> > who >>> > > >>> are >>> > > >>> familiar with these part of codes to point out whether there >>> are more >>> > > >>> easy >>> > > >>> or different choices to achieve the target. >>> > > >>> >>> > > >>> To make discussion more accurate, I put the generated codes >>> of the >>> > > >>> previous >>> > > >>> setupGetBuild64Hash method here: >>> > > >>> >>> > > >>> public long getBuild64HashCodeInner(int incomingRowIdx, >>> int >>> > > >>> seedValue, int fieldId) >>> > > >>> throws SchemaChangeException >>> > > >>> { >>> > > >>> { >>> > > >>> IntHolder fieldId16 = new IntHolder(); >>> > > >>> fieldId16 .value = fieldId; >>> > > >>> if (fieldId16 .value == constant18 .value) { >>> > > >>> return out24 .value; >>> > > >>> } >>> > > >>> IntHolder out22 = new IntHolder(); >>> > > >>> { >>> > > >>> out22 .value = vv19 .getAccessor().get(( >>> > > incomingRowIdx)); >>> > > >>> } >>> > > >>> IntHolder seedValue23 = new IntHolder(); >>> > > >>> seedValue23 .value = seedValue; >>> > > >>> //---- start of eval portion of hash32AsDouble >>> function. >>> > > >>> ----// >>> > > >>> IntHolder out24 = new IntHolder(); >>> > > >>> { >>> > > >>> final IntHolder out = new IntHolder(); >>> > > >>> IntHolder in = out22; >>> > > >>> IntHolder seed = seedValue23; >>> > > >>> >>> > > >>> Hash32WithSeedAsDouble$IntHash_eval: { >>> > > >>> out.value = >>> > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) >>> > > >>> in.value, seed.value); >>> > > >>> } >>> > > >>> >>> > > >>> out24 = out; >>> > > >>> } >>> > > >>> //---- end of eval portion of hash32AsDouble >>> function. >>> > > ----// >>> > > >>> if (fieldId16 .value == constant18 .value) { >>> > > >>> return out26 .value; >>> > > >>> } >>> > > >>> IntHolder seedValue25 = new IntHolder(); >>> > > >>> seedValue25 .value = seedValue; >>> > > >>> //---- start of eval portion of hash32AsDouble >>> function. >>> > > >>> ----// >>> > > >>> IntHolder out26 = new IntHolder(); >>> > > >>> { >>> > > >>> final IntHolder out = new IntHolder(); >>> > > >>> IntHolder in = out22; >>> > > >>> IntHolder seed = seedValue25; >>> > > >>> >>> > > >>> Hash32WithSeedAsDouble$IntHash_eval: { >>> > > >>> out.value = >>> > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double) >>> > > >>> in.value, seed.value); >>> > > >>> } >>> > > >>> >>> > > >>> out26 = out; >>> > > >>> } >>> > > >>> //---- end of eval portion of hash32AsDouble >>> function. >>> > > ----// >>> > > >>> return 0; >>> > > >>> } >>> > > >>> } >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha < >>> [email protected]> >>> > > >>> wrote: >>> > > >>> >>> > > >>> > sorry, the previous email is incomplete. >>> > > >>> > For the ifBlock, do you need an _else() block also ? >>> > > >>> > >>> > > >>> > I have sometimes found that 'JConditional' is a good way >>> to break >>> > > down >>> > > >>> the >>> > > >>> > logic further. Please see example usages of JConditional >>> here [1]. >>> > > >>> > >>> > > >>> > -Aman >>> > > >>> > >>> > > >>> > [1] >>> > > >>> > >>> > > >>> > >>> > > >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e= >>> . >>> > > sun.codemodel.JBlock >>> > > >>> > >>> > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha < >>> [email protected]> >>> > > >>> wrote: >>> > > >>> > >>> > > >>> > > Hi Weijie, >>> > > >>> > > It would be a little cumbersome to debug such issues >>> over email >>> > > >>> since one >>> > > >>> > > has to look at the generated code output and iteratively >>> debug. >>> > > >>> > > Couple of thoughts I have that might help: >>> > > >>> > > >>> > > >>> > > For this particular if-then block, should you also >>> > > >>> > > JBlock ifBlock = >>> > > >>> > > >>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe >>> > > >>> > > tBuildSideFieldId))._then(); >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong < >>> > > >>> [email protected]> >>> > > >>> > > wrote: >>> > > >>> > > >>> > > >>> > >> HI All: >>> > > >>> > >> Through implementing the JPPD feature ( >>> > > >>> > >> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=) >>> , I was >>> > blocked >>> > > >>> by >>> > > >>> > the >>> > > >>> > >> problem: how to get the hash code of each build side of >>> the hash >>> > > >>> join >>> > > >>> > >> columns through the dynamic generated java code. Hope >>> someone >>> > can >>> > > >>> give >>> > > >>> > >> some >>> > > >>> > >> advice. >>> > > >>> > >> >>> > > >>> > >> I supposed to add methods as below to the >>> HashTableTemplate : >>> > > >>> > >> >>> > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int >>> > seedValue, >>> > > >>> int >>> > > >>> > >> fieldId) throws SchemaChangeException{ >>> > > >>> > >> return getBuild64HashCodeInner(incomingRowIdx, >>> seedValue, >>> > > >>> fieldId); >>> > > >>> > >> } >>> > > >>> > >> >>> > > >>> > >> protected abstract long >>> > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int >>> > > incomingRowIdx, >>> > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int >>> > fieldId) >>> > > >>> > >> throws SchemaChangeException; >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> The high level code to invoke the getBuild64HashCode >>> method >>> > is >>> > > >>> at the >>> > > >>> > >> HashJoinBatch's executeBuildPhase() : >>> > > >>> > >> >>> > > >>> > >> //create runtime filter >>> > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) { >>> > > >>> > >> //create runtime filter and send out async >>> > > >>> > >> int condFieldIndex = 0; >>> > > >>> > >> for (BloomFilter bloomFilter : bloomFilters) { >>> > > >>> > >> //VV >>> > > >>> > >> for (int ind = 0; ind < currentRecordCount; ind++) { >>> > > >>> > >> long hashCode = >>> partitions[0].getBuild64HashCode(ind, >>> > > >>> > >> condFieldIndex); >>> > > >>> > >> bloomFilter.insert(hashCode); >>> > > >>> > >> } >>> > > >>> > >> condFieldIndex++; >>> > > >>> > >> } >>> > > >>> > >> //TODO sered out async >>> > > >>> > >> } >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> As you know, the abstract method >>> getBuild64HashCodeInner needs >>> > to >>> > > >>> > >> calculate the hash codes of each build side column by >>> the >>> > fieldId >>> > > >>> input >>> > > >>> > >> parameter. In order to achieve this target, I plan to >>> have >>> > > different >>> > > >>> > >> solving parts corresponding to different column >>> ValueVector , >>> > > using >>> > > >>> the >>> > > >>> > if >>> > > >>> > >> statement to distinguish different solving parts >>> through the id >>> > of >>> > > >>> the >>> > > >>> > >> column. The corresponding method to generate the >>> dynamic codes >>> > > is >>> > > >>> as >>> > > >>> > >> below: >>> > > >>> > >> >>> > > >>> > >> private void >>> setupGetBuild64Hash(ClassGenerator<HashTable> cg, >>> > > >>> > >> MappingSet incomingMapping, VectorAccessible batch, >>> > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[] >>> buildKeyFieldIds) >>> > > >>> > >> throws SchemaChangeException { >>> > > >>> > >> cg.setMappingSet(incomingMapping); >>> > > >>> > >> if (keyExprs == null || keyExprs.length == 0) { >>> > > >>> > >> cg.getEvalBlock()._return(JExpr.lit(0)); >>> > > >>> > >> } >>> > > >>> > >> String seedValue = "seedValue"; >>> > > >>> > >> String fieldId = "fieldId"; >>> > > >>> > >> LogicalExpression seed = >>> > > >>> > >> ValueExpressions.getParameterExpression(seedValue, >>> > > >>> > >> Types.required(TypeProtos.MinorType.INT)); >>> > > >>> > >> >>> > > >>> > >> LogicalExpression fieldIdParamExpr = >>> > > >>> > >> ValueExpressions.getParameterExpression(fieldId, >>> > > >>> > >> Types.required(TypeProtos.MinorType.INT) ); >>> > > >>> > >> HoldingContainer fieldIdParamHolder = >>> > > cg.addExpr(fieldIdParamExpr); >>> > > >>> > >> int i = 0; >>> > > >>> > >> for (LogicalExpression expr : keyExprs) { >>> > > >>> > >> TypedFieldId targetTypeFieldId = buildKeyFieldIds[i]; >>> > > >>> > >> ValueExpressions.IntExpression targetBuildFieldIdExp >>> = new >>> > > >>> > >> >>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds( >>> > > )[0], >>> > > >>> > >> ExpressionPosition.UNKNOWN); >>> > > >>> > >> JFieldRef targetBuildSideFieldId = >>> > > >>> > >> cg.addExpr(targetBuildFieldIdExp, >>> > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue(); >>> > > >>> > >> JBlock ifBlock = >>> > > >>> > >> >>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe >>> > > >>> > >> tBuildSideFieldId))._then(); >>> > > >>> > >> >>> > > >>> > >> LogicalExpression hashExpression = >>> > > >>> > >> HashPrelUtil.getHashExpression(expr, seed, >>> incomingProbe != >>> > > null); >>> > > >>> > >> LogicalExpression materializedExpr = >>> > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors( >>> > > hashExpression, >>> > > >>> > >> batch, context.getFunctionRegistry()); >>> > > >>> > >> HoldingContainer hash = cg.addExpr(materializedExpr, >>> > > >>> > >> ClassGenerator.BlkCreateMode.FALSE); >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> ifBlock._return(hash.getValue()); >>> > > >>> > >> i++; >>> > > >>> > >> } >>> > > >>> > >> cg.getEvalBlock()._return(JExpr.lit(0)); >>> > > >>> > >> >>> > > >>> > >> } >>> > > >>> > >> >>> > > >>> > >> But unfortunately, the generated codes are not what I >>> expected. >>> > > The >>> > > >>> > codes >>> > > >>> > >> to read ValueVector , calculate hash code of the read >>> value do >>> > not >>> > > >>> stay >>> > > >>> > in >>> > > >>> > >> the if block. So how can I let the related codes stay >>> in the if >>> > > >>> block ? >>> > > >>> > >> >>> > > >>> > > >>> > > >>> > > >>> > > >>> > >>> > > >> >>> > > >> >>> > > >>> > >>> >>> >>>
