[ https://issues.apache.org/jira/browse/ASTERIXDB-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15698708#comment-15698708 ]
Taewoo Kim commented on ASTERIXDB-1736: --------------------------------------- Removed Grace Hash Join. > Grace Hash Join and Hybrid Hash Join are not being used. > -------------------------------------------------------- > > Key: ASTERIXDB-1736 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736 > Project: Apache AsterixDB > Issue Type: Improvement > Reporter: Taewoo Kim > Assignee: Taewoo Kim > > As the title says, Grace Hash Join and Hybrid Hash Join are not being used. I > suggest that we remove these two join methods. Here are my findings for these > two joins. > 1) Grace Hash Join > GraceHashJoinOperatorDescriptor is only called from two places: > org.apache.hyracks.examples.tpch.client.join and > TPCHCustomerOrderHashJoinTest. > One is a Hyracks example (tpch.client) and the other is a unit test. This > join is not used currently (not chosen during the compilation). > 2) Hybrid Hash Join > During the compilation, the optimizer decides whether it will use Hybrid Hash > Join or Optimized Hybrid Hash Join. > If the hash function family for each key variable is set, then we use the > optimized hybrid hash join. > If not, we use the hybrid hash join. However, in fact, this path - hybrid > hash join path will never be chosen. Let's check the code. > {code:title=HybridHashJoinPOperator.java|borderStyle=solid} > IBinaryHashFunctionFamily[] hashFunFamilies = > JobGenHelper.variablesToBinaryHashFunctionFamilies(keysLeftBranch, > env, context); > > ... > > boolean optimizedHashJoin = true; > for (IBinaryHashFunctionFamily family : hashFunFamilies) { > if (family == null) { > optimizedHashJoin = false; > break; > } > } > if (optimizedHashJoin) { > opDesc = generateOptimizedHashJoinRuntime(context, inputSchemas, > keysLeft, keysRight, hashFunFamilies, > comparatorFactories, predEvaluatorFactory, recDescriptor, > spec); > } else { > opDesc = generateHashJoinRuntime(context, inputSchemas, keysLeft, > keysRight, hashFunFactories, > comparatorFactories, predEvaluatorFactory, recDescriptor, > spec); > } > {code} > > As we can see, optimizedHashJoin is set to false only when the hash family is > null. > Then, how do we assign the hashfamily for each key variable? > {code:title=JobGenHelper.java|borderStyle=solid} > public static IBinaryHashFunctionFamily[] > variablesToBinaryHashFunctionFamilies( > Collection<LogicalVariable> varLogical, IVariableTypeEnvironment > env, JobGenContext context) > throws AlgebricksException { > IBinaryHashFunctionFamily[] funFamilies = new > IBinaryHashFunctionFamily[varLogical.size()]; > int i = 0; > IBinaryHashFunctionFamilyProvider bhffProvider = > context.getBinaryHashFunctionFamilyProvider(); > for (LogicalVariable var : varLogical) { > Object type = env.getVarType(var); > funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(type); > } > return funFamilies; > } > {code} > For each variable type, we try to get hash function family. In the current > codebase, AqlBinaryHashFunctionFamilyProvider is the only class that > implements IBinaryHashFunctionFamilyProvider. > And for any type, it returns AMurmurHash3BinaryHashFunctionFamily. > So, there is no way that the hash function family is null. > {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid} > public class AqlBinaryHashFunctionFamilyProvider implements > IBinaryHashFunctionFamilyProvider, Serializable { > private static final long serialVersionUID = 1L; > public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new > AqlBinaryHashFunctionFamilyProvider(); > private AqlBinaryHashFunctionFamilyProvider() { > } > @Override > public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object type) > throws AlgebricksException { > // AMurmurHash3BinaryHashFunctionFamily converts numeric type to > double type before doing hash() > return AMurmurHash3BinaryHashFunctionFamily.INSTANCE; > } > } > {code} > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)