[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 23: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. Also, codegen functions will not return error status should the IR function fails the LLVM verification step. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Reviewed-on: http://gerrit.cloudera.org:8080/5483 Reviewed-by: Michael Ho Tested-by: Impala Public Jenkins --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 23: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/754/ -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 23: Code-Review+2 Rebase. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Marcel Kornacker, Impala Public Jenkins, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#23). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. Also, codegen functions will not return error status should the IR function fails the LLVM verification step. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 22: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 22: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/753/ -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 22: Code-Review+2 Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 22: Fix clang-tidy errors: 1. Missing override in function declarations 2. Remove unused variables in HdfsPartitionDescriptor 3. Handles missing status check in ScalarFnCall::GetCodegendComputeFn(). Re-ran exhaustive build and clang-tidy to verify things are still sane. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Marcel Kornacker, Impala Public Jenkins, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#22). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. Also, codegen functions will not return error status should the IR function fails the LLVM verification step. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 21: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/743/ -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 21: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/743/ -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 21: Rebase. Fix some bugs found during stress tests. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#21). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/e
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 19: (8 comments) http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 304: /// analytic_eval_fns_. When enough input rows have been consumed to produce the analytic > long lines Done http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 498: // %eval_ptr = getelementptr inbounds %"class.impala::ScalarExprEvaluator"*, > trailing Done http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: Line 250: if (EvalConjuncts(&conjunct_evals_[0], conjuncts_.size(), row)) { > .data() Done http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1238: if (ExecNode::EvalConjuncts(&evals[0], evals.size(), row)) { > .data() Done http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1410: // %input0 = call { i8, double } @GetSlotRef(%"class.impala::ScalarExprEvaluator"* > trailing Done http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 218: for (int i = 0; i < evals.size(); ++i) { > single line Done http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/runtime/data-stream-test.cc File be/src/runtime/data-stream-test.cc: Line 307: is_asc_, nulls_first_)); > tab TupleRowComparator will keep a reference to is_asc_ so it needs to exist after calling the constructor. http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: Line 141: std::vector nulls_first_; > const& here as well? Not feasible given the way it's set up in the constructor. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 20: Code-Review+2 Carry +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#20). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/e
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 19: Code-Review+2 (9 comments) http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 229: /// Analytic functions and their evaluators. ownership? http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 498: // %eval_ptr = getelementptr inbounds %"class.impala::ScalarExprEvaluator"*, trailing http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 290: boost::scoped_ptr expr_mem_pool_; > We were passing the expr_mem_tracker directly into the ExprContext so there i guess it's going in the right direction :) http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: Line 250: if (EvalConjuncts(&conjunct_evals_[0], conjuncts_.size(), row)) { .data() http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1238: if (ExecNode::EvalConjuncts(&evals[0], evals.size(), row)) { .data() http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1410: // %input0 = call { i8, double } @GetSlotRef(%"class.impala::ScalarExprEvaluator"* trailing http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 218: for (int i = 0; i < evals.size(); ++i) { single line http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/runtime/data-stream-test.cc File be/src/runtime/data-stream-test.cc: Line 307: is_asc_, nulls_first_)); tab can't you specify the initializer list inline, as in: .., {true}, {true})? if that doesn't work, shouldn't vector{1, true} work? http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: Line 141: std::vector nulls_first_; const& here as well? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 16: (2 comments) http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 304: /// evaluators_. When enough input rows have been consumed to produce the analytic long lines http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: Line 51: opened_ = false; > The problem is that the ordering_expr_evaluators_rhs_ is cloned from orderi Turns out that there is still a hidden dependency for codegen to happen first before opening ordering_expr_evaluators_*. This dependency will be removed once we move to the next step in which codegen is done before creating fragment instances. Keeping this function as Open() for now but I managed to rewrite it to not need the opened_ flag. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#19). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/18/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1707: // TODO: Add IR function dump. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 16: (61 comments) http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 79: DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size()); > move to c'tor Done Line 124: &agg_fn_evaluators_)); > nice, the separation is making the code a lot crisper. Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 92: new SlotRef(desc) : new SlotRef(desc, TYPE_BOOLEAN); > Tab? Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 53: class AggregationNode : public ExecNode { > it's sad that this still needs to be maintained. please apply my comments t Done Line 80: /// FunctionContexts objects are stored in ObjectPool of RuntimeState. > so what is agg_fn_pool_ for then? Stale comment. It's the backing MemPools of 'agg_fn_evaluators_'. Line 98: boost::scoped_ptr intermediate_row_desc_; > any reason this can't go in state->obj_pool()? Done Line 150: /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed code. > "for agg_fn_evaluators_.data()" is more idiomatic. Done Line 165: /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs). > update comment Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 892: AggFn::Close(analytic_fns_); > having an expr::close() and having to call that here is a bit counterintuit Expr::Close() will be called in PlanNode::Close() in the future. This is done here as a transitionary step. Also, I don't see Expr or evaluators any different from other data structures in the exec node. You can extend the argument to any structures you have in an exec node and argue that you can add them to a container which automatically call Close() on them. I don't see the need to create a special case for Expr. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 210: ScalarExpr* partition_by_eq_expr_; > initialize inline Done Line 215: ScalarExpr* order_by_eq_expr_; > same Done Line 216: ScalarExprEvaluator* order_by_eq_expr_evaluator_; > same Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 100: RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs)); > why not just call create() directly here? Not all data_sink have output_exprs. Some of them have partition_exprs only. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 123: boost::scoped_ptr expr_mem_pool_; > do we really need a separate mempool for that? What other MemPool can we use then ? Line 131: virtual Status Init(RuntimeState* state, const TDataSink& tsink, > move in/out param to back Done http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc File be/src/exec/exchange-node.cc: Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) { > why did we not need this before? This was not needed before as the exchange node owns the evaluators before and was added to vector of evaluators to free periodically. The evaluator is not stored in the tuple row comparator. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 460: if (!EvalConjunct(evaluators[i], row)) return false; > can't you just inline the call here? Not sure I understand the comment. EvalConjunct() is already inlined. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 263: /// object pool. > why? evaluators are execution related stuff so they are stored in the exec nodes' obj_pool (similar to other stuff which live in the obj_pool_). exprs are fragment level entity so it makes sense for it to live in fragment level's obj_pool. Line 290: boost::scoped_ptr expr_mem_pool_; > what mem pool did we use for that before? in general, we want to get away f We were passing the expr_mem_tracker directly into the ExprContext so there is one MemPool per ExprContext before. Now all exprs in an exec node share the expr_mem_pool of the exec node. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 102: Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, > state contains pool Caller can pass in pool with shorter life span than the entire query (e.g. scanner thread). Line 103: const FilterContext& from); > move const args to front Done Line 114
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#18). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 17: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 92:new SlotRef(desc) : new SlotRef(desc, TYPE_BOOLEAN)); Tab? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: Line 96: for (const TExpr& texpr: texprs) { I was even thinking it might be less error-prone if, on failure, this closed all of the exprs it created and cleared the vector. Removes an edge case from the caller. Ok to ignore. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 17: please fix the pervasive non-const vector& or switch to vector* and explain why it can get updated -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 17: (62 comments) http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 79: DCHECK_EQ(intermediate_tuple_desc_->slots().size(), output_tuple_desc_->slots().size()); move to c'tor Line 124: DCHECK_EQ(agg_fns_.size(), agg_fn_evaluators_.size()); nice, the separation is making the code a lot crisper. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 53: class AggregationNode : public ExecNode { it's sad that this still needs to be maintained. please apply my comments to the partitioined-*.{cc,h} as well, where it makes sense Line 80: /// FunctionContexts objects are stored in ObjectPool of RuntimeState. so what is agg_fn_pool_ for then? Line 98: boost::scoped_ptr intermediate_row_desc_; any reason this can't go in state->obj_pool()? Line 150: /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the codegen'ed code. "for agg_fn_evaluators_.data()" is more idiomatic. are you going to do a rename of "evaluator[s]" suffixes to "evl[s]" or "eval[s]" at the end? Line 165: /// slot_idx is the idx into aggregate_exprs_ (does not include grouping exprs). update comment http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 892: AggFn::Close(analytic_fns_); having an expr::close() and having to call that here is a bit counterintuitive (and manual, since we need to do that for every expr). would be nice to move that to a utility function in runtimestate (track the cache entries in some ancillary structure and then free them all at once instead of piecemeal). that would reduce the likelihood of forgetting this somewhere. not necessary to do it as part of this patch, but a todo would be nice. http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 210: ScalarExpr* partition_by_eq_expr_; initialize inline Line 215: ScalarExpr* order_by_eq_expr_; same Line 216: ScalarExprEvaluator* order_by_eq_expr_evaluator_; same http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 100: RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, thrift_output_exprs)); why not just call create() directly here? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 87: static Status Create(ObjectPool* pool, RuntimeState* state, state already contains a pool Line 123: boost::scoped_ptr expr_mem_pool_; do we really need a separate mempool for that? Line 131: virtual Status Init(RuntimeState* state, const TDataSink& tsink, move in/out param to back http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc File be/src/exec/exchange-node.cc: Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) { why did we not need this before? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 460: if (!EvalConjunct(evaluators[i], row)) return false; can't you just inline the call here? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 149: static bool EvalConjunct(ScalarExprEvaluator* evaluator, TupleRow* row); evalpredicate a predicate is a conjunct if it's part of a conjunctive clause, ie, a predicate on its own can't be a conjunct. Line 151: /// Evaluate the conjuncts in 'evaluators' over 'row'. add convenience function for a vector? Line 263: /// object pool. why? Line 290: boost::scoped_ptr expr_mem_pool_; what mem pool did we use for that before? in general, we want to get away from having mempools all over the place http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 102: Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, state contains pool Line 103: const FilterContext& from); move const args to front Line 114: /// Codegen Eval() by codegen'ing the expression evaluations and replacing the type update comment Line 117: static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* filter_expr, does filter_expr get updated in this function? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 129: void FreeBuildLocalAllocations(); FreeLocal<>Allocations()? http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: Line 249: DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size()); this would benefit from a convenience function for a vecto
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#17). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 17: (4 comments) http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 93: build_exprs_.push_back(build_expr); > Wrap pool_->Add() around the above line. Makes it harder to reintroduce the Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS16, Line 163: > tab? Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 87: RETURN_IF_ERROR(ScalarExpr::Create(eq_join_conjunct.left, child(0)->row_desc(), > It would be nice if we had a way to make this cleanup more automatic to mak Done http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/anyval-util.h File be/src/exprs/anyval-util.h: Line 32: using impala_udf::FunctionContext; > Using the whole namespace seems fine to me too so long as it's scoped to im Keep it as is for now. Using the whole namespace in the header seems to bad practice in general. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 16: (6 comments) Looking good, just had a handful of comments. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 93: pool_->Add(build_expr); Wrap pool_->Add() around the above line. Makes it harder to reintroduce the bug by adding a line in-between. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ false)); > Good point. I think heap allocated may be safer in case the expression keep I think we should be explicit in the ScalarExpr comment about whether Create() can hold onto a reference, or whether it has to copy. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS16, Line 163: tab? http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 87: Status status = ScalarExpr::Create(eq_join_conjunct.left, child(0)->row_desc(), It would be nice if we had a way to make this cleanup more automatic to make it harder to introduce bugs like this. Maybe ScalarExpr::Create() should close the expr and return NULL if it fails? Similarly maybe the vector version of ScalarExpr::Create() should close all the exprs and clear the vector on failure. http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/anyval-util.h File be/src/exprs/anyval-util.h: Line 32: using impala_udf::FunctionContext; Using the whole namespace seems fine to me too so long as it's scoped to impala (this is also fine). http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 157: /// or its sub-expressions. 'start_idx' and 'end_idx' correspond to the range > Yes, it'd work out but not sure the difference in overhead for not reusing I'm not too concerned about that. There's two possible levels of caching: * In the MemPool if we don't transfer and call Clear() * In TCMalloc or the new buffer pool, where there's thread-local caching or core-local caching respectively. TCMalloc's caching should be pretty effective for small allocations. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#16). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 14: (70 comments) http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 93: RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), state)); > build_expr is leaked if this returns. Good catch. Fixed here and everywhere. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: > This all works out a lot cleaner - nice! Done Line 124: RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i], > I think there's a potential resource leak if the expr Init() function fails Done Line 166: fn_pool_.reset(new MemPool(expr_mem_tracker()));; > extra semicolon Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 181: // TODO: codegen table sink > I'm not sure what this means. Is it something to be fixed in this patch? Ot I think this is IMPALA-4356. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 126: /// Not used in DataStreamSender and PHJBuilder. > Or NljBuilder. Maybe easier to just say that they're not used in all subcla Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: PS14, Line 451: inline > Is the inline needed here? The inline hint is for inlining into EvalConjuncts(). http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 261: /// Conjuncts and their evaluators in this node. > Maybe mention who owns the pointers? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: PS14, Line 86: expr_evaluator->root() > Is the plan to eventually pass around the Expr itself and avoid going via t Really depends on the context. I imagine most cases we will use Expr directly. In some cases when Expr is not directly available we may still go through the evaluator. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 120: FilterContext() { } > nit: can just remove this and rely on the default constructor Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 115: /// computing the size of a scratch row. > Maybe mention what 'pool' is for Done Line 415: Status Init(ObjectPool* pool, RuntimeState* state, int num_build_tuples); > What is pool used for? Comments added. Line 513: MemPool* mem_pool_; > Mention that it's not owned? I suppose that's implied. Comments added. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS14, Line 294: > tab? Done Line 733: vector& output_expr_evaluators) > Can't this be a const& still? The vector isn't being modified is it? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: Line 26: #include "exprs/scalar-expr.h" > Maybe not needed? Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: PS14, Line 122: DCHECK(conjuncts_map_[entry.first].empty()); Use find() instead. Line 147: // TODO: Move this to Prepare() > Is this TODO in this patch, or does it depend on another change? I am thinking of moving it when PlanNode is introduced. Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ false)); > This is a little inconsistent - above on line 121 we create the RowDescript Good point. I think heap allocated may be safer in case the expression keeps a reference to it. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 146: const std::vector min_max_conjunct_evaluators() const { > Shouldn't this return a const&, rather than copying the vector? Yes, right. Looks like the existing code is broken. Line 393: /// the per-scanner ScannerContext.. > Add something like "These correspond to the exprs in 'filter_exprs_'" Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 362: MemPool filter_mem_pool(expr_mem_tracker()); > This is a little subtle - could do with a comment. Done Line 388: for (auto& ctx: filter_ctxs) { > Maybe use a goto and an exit block at the bottom of the function to enforce Done http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/kudu-scanner.h File be/src/exec/kudu-scanner.h: Line 95: boost::scoped_ptr expr_mem_pool_; > Put it in 'obj_pool_'? Appears to be more consi
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#15). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 14: (32 comments) Made it through, finally... Overall looks way saner and easier to understand than the previous state of things. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 694: AggFnEvaluator::ShallowClone(parent->pool_, agg_fn_pool.get(), This should go into 'partition_pool_'. Otherwise if we have an Agg in a subplan these will accumulate every Reset(). E.g. there would be a bug like https://gerrit.cloudera.org/#/c/1141/ http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 84: AggFnEvaluator::~AggFnEvaluator() { Can we add a DCHECK() that it was closed? PS14, Line 87: inline probably not needed PS14, Line 91: inline probably not needed http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 250 Consider leaving these as inline functions in the header - AnalyticEvalNode is still interpreted so the function call overhead may actually affect perf there. Line 72: /// Convenience functions for creating evaluators for multiple aggregate functions. Mention that the caller is responsible for closing the created evaluators regardless of success or failure. That is subtle. Or we could just have Create() handle that internally. This comment may also apply to ScalarExprEvaluator. Line 174: /// True if this evaluator is created from a Clone() call. ShallowClone() PS14, Line 175: is_clone_ Maybe is_shallow_clone_ to be more precise - although there is only one type of cloning supported so arguably not necessary to disambiguate. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 87: RETURN_IF_ERROR(LibCache::instance()->GetSoFunctionPtr(fn_.hdfs_location, Nit: I don't think the remainder of this function needs all the vertical whitespace - it's essentially double-spaced code now http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.h File be/src/exprs/agg-fn.h: PS14, Line 52: state value?. "state" could be taken to mean that it initialises the evaluator or some such thing. Line 81: /// TODO: The aggregation node has custom codegen paths for a few of the builtins. Consider removing the TODO - it's a bit vague and unclear what if anything we want to do. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 53: /// --- Expr overview Nice overview Line 81: static Status CreateTree(const TExpr& texpr, ObjectPool* pool, Expr* root); I think this should be protected since it's only called from ScalarExpr and AggFn. Line 94: bool is_constant() const { return is_constant_; } I think is_constant() should maybe only be defined in the ScalarExpr sub-hierarchy. It doesn't really mean anything for AggFn. Line 100: /// Simple debug string that provides no expr subclass-specific information. Comment seems stale since there's no implementation. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/hive-udf-call.h File be/src/exprs/hive-udf-call.h: Line 28: using namespace impala_udf; I guess some of these 'using namespace' instances were pre-existing - still seems like it contradicts our coding standards. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/literal.h File be/src/exprs/literal.h: Line 28: using namespace impala_udf; See other comment about using namespace in headers. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 154: // Owner of mem_pool_ needs to call FreeAll() on it. We don't really know what the owner wants to do. Maybe something like "Memory allocated by 'fn_ctx_' is still living in 'mem_pool_'. The owner of 'mem_pool_' is responsible for freeing it when appropriate." Line 436: // TODO: is there a better way to do this? Consider removing TODO - I don't think we're likely to fix it. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 29: using namespace impala_udf; I'm not sure if this works, but maybe if we just move this "using" into the impala namespace it won't pollute any other namespaces. That would still avoid having impala_udf:: everywhere. PS14, Line 43: types type Line 122: void* GetValue(const ScalarExpr& e, const TupleRow* row); Maybe GetSubExprValue()? I don't feel strongly but might make the intent clearer. Line 144: Status GetError(int start_idx = 0, int end_idx = -1) const WARN_UNUSED_RESULT; Arguments maybe need explanation. Line 147: void PrintValue(const TupleRow* row, std::string* str); Does this evaluate
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 14: (14 comments) Next batch - made it through everything except the expr/ subdirectory. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 164: pool_->Add(build_expr); Maybe stick with the usual idiom of wrapping pool_->Add() around the allocation? Line 1574: // TODO: consider moving the following codegen logic to AggFn. This would make sense (not in this patch though). http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 109: // TODO: Move to Prepare(). Is this to do in this patch or are you planning to do it as part of a future JIRA? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: Line 243: /// Init() function inherited from DataSink. Overridden to be a no-op for now. This is a bit of a red flag to me - I think we should structure things so that we always call Init() on the parent class. Otherwise it's hard to reason about the possible states of the member variables of DataSink. Line 381: /// List of filters to build. Mention that there's a correspondence with 'filter_exprs_'? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: PS14, Line 95: nd and http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 49: /// TODO: Move this to the equivalent of PlanNode class for DataSink. Is there a JIRA for this? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 486: DCHECK(partition_expr->IsLiteral()); Can you add a comment in CatalogObjects.thrift that these are always literal exprs? That fact is explicit in the frontend but it wasn't obvious that the exprs sent from the frontend are always literals just by looking at the thrift structures. Line 491: // literals Partition exprs are not used in the codegen case. Don't codegen Missing . http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: Line 93: const TRuntimeFilterDesc& filter_desc_; Comment what owns the filter? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/sorter.h File be/src/runtime/sorter.h: Line 105: Status Init(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool); Comment what the arguments are for? The names could maybe be more descriptive too. E.g. to disambiguate from 'obj_pool_'. E.g. 'expr_evaluator_pool' and 'expr_mem_pool'. I think passing in 'state' is redundant too, since it's passed into the constructor. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/service/fe-support.cc File be/src/service/fe-support.cc: PS14, Line 194: mem_pool expr_mem_pool? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/udf/udf.h File be/src/udf/udf.h: Line 111: /// TODO: Move FRAGMENT_LOCAL states to query_state for multi-threading. Do we have a JIRA trackign this? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: Line 82: Status Open(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool); Similar to my comment in Sorter - it's unclear what these different arguments do and what the requirements are. E.g. what is allocated from 'mem_pool'. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 14: (26 comments) First batch of comments - made it through exec/kudu* http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 93: RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), state)); build_expr is leaked if this returns. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: This all works out a lot cleaner - nice! Line 124: RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i], I think there's a potential resource leak if the expr Init() function fails in certain ways. E.g. if it successfully got a lib cache entry, but failed later. We don't add it to 'analytic_fns_' so I don't think it can be closed. Line 166: fn_pool_.reset(new MemPool(expr_mem_tracker()));; extra semicolon http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 181: // TODO: codegen table sink I'm not sure what this means. Is it something to be fixed in this patch? Otherwise could be just remove the TODO and file a JIRA if it's something that needs to be tracked. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 126: /// Not used in DataStreamSender and PHJBuilder. Or NljBuilder. Maybe easier to just say that they're not used in all subclasses. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: PS14, Line 451: inline Is the inline needed here? Line 465: void ExecNode::FreeLocalAllocations() { This helper function doesn't seem to help that much - consider inlining it. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 261: /// Conjuncts and their evaluators in this node. Maybe mention who owns the pointers? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: PS14, Line 86: expr_evaluator->root() Is the plan to eventually pass around the Expr itself and avoid going via the evaluator, or will we continue using this pattern? I don't feel that strongly either way, just curious. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 120: FilterContext() { } nit: can just remove this and rely on the default constructor http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 115: /// computing the size of a scratch row. Maybe mention what 'pool' is for Line 415: Status Init(ObjectPool* pool, RuntimeState* state, int num_build_tuples); What is pool used for? Line 513: MemPool* mem_pool_; Mention that it's not owned? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS14, Line 294: tab? Line 733: vector& output_expr_evaluators) Can't this be a const& still? The vector isn't being modified is it? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: Line 26: #include "exprs/scalar-expr.h" Maybe not needed? http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 147: // TODO: Move this to Prepare() Is this TODO in this patch, or does it depend on another change? Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ false)); This is a little inconsistent - above on line 121 we create the RowDescriptor on the stack but here it's heap-allocated. I think the stack allocation pattern is probably simpler. PS14, Line 186: conjunct_evaluators_map_[entry.first] [] has a side-effect of creating the element - maybe use find() instead for the DCHECK. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 146: const std::vector min_max_conjunct_evaluators() const { Shouldn't this return a const&, rather than copying the vector? Line 393: /// the per-scanner ScannerContext.. Add something like "These correspond to the exprs in 'filter_exprs_'" http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 362: MemPool filter_mem_pool(expr_mem_tracker()); This is a little subtle - could do with a comment. Line 388: for (auto& ctx: filter_ctxs) { Maybe use a goto and an exit block at the bottom of the function to enforce that the cleanup happens on both paths. http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/kudu-scanner.h File be/src/exec/kudu-scanner.h:
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#14). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/5483/13/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS13, Line 1104: const vector& filter_ctxs Should avoid using FilterContext for codegen as it's runtime state. http://gerrit.cloudera.org:8080/#/c/5483/13/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS13, Line 366: expr_mem_pool() not thread safe to use expr_mem_pool(). -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/13/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 126: ScalarExprEvaluator::Close(conjunct_evaluators_, state_); Missing expr_mem_pool_->FreeAll(); here. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 156: friend class AndPredicate; > The goal is to hide both Init() and Get*Val() from irrelevant classes to en Removed AggregationNode and PartitionedAggregationNode from friend class list in PS13. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#13). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 285: boost::scoped_ptr expr_mem_tracker_; > if this is per querystate, the mempool would need to be thread-safe, not su Yes, there should be one MemPool per fragment instance. TODO removed. Keep the name the same as it's clear enough already. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#12). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#11). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 10: (57 comments) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 66: output_tuple_desc_(descs.GetTupleDescriptor(output_tuple_id_)), > initialize in .h where possible While I appreciate the suggestion, I prefer not to add more unrelated clean-up to this already gigantic patch. Line 88: desc->type() == grouping_exprs_[i]->type()); > i would intuitively think that the build row is the input row. renamed to intermediate_row_desc_. Line 97: > it feels dicey to pass a reference to an automatic variable, Init() might h Done Line 232: int count = 0; > isn't that the same as const X**? Not really :-). const X** is pointer to pointer to constant X. X* const * is pointer to constant pointer to X. The latter fits the bill of vector::data(). http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 122: /// Output expressions to convert row batches onto output values. > It's confusing that these aren't used in all subclasses of DataSink (e.g. D Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/hash-table.h File be/src/exec/hash-table.h: PS9, Line 485: ? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS9, Line 69: > Does this mean that multiple scanner threads share a single MemPool? That d Nice catch. I hit similar bugs in HDFS scanners before. Those scanners also have a per-scanner MemPool. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h File be/src/exec/sort-node.h: PS9, Line 68: :vector materialize_tuple_ doesn't seem to be defined. I think we always need these Done Line 69: > See my comment about the name in sorter.h. Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h File be/src/exec/subplan-node.h: Line 65: /// tree rooted at 'node' and does any initialization that is required as a result of > * and do any initialization that is required as a result of setting the sub Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS9, Line 78: output_tuple_exprs_; > output_tuple_exprs_? Usually when we have exprs materializing tuples it's n Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 96: MemPool* mem_pool, AggFnEvaluator** result) { > move input params to front Done Line 161: Status AggFnEvaluator::Open( > why not const&? Done Line 514: > single line, here and subsequent functions, where possible Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// and caches all constant input arguments. > regarding the caching: is there already a todo to move that into the expr s TODO added. PS9, Line 94: o, it > Maybe this should be called ShallowClone() to make it more explicit. Withou Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 68: } > scalarexpr does that in create. do the same for this class? Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 87: for (int i = 0; i < num_children; ++i) { > we don't use 'const ..' elsewhere for automatic variables. Done Line 88: *child_node_idx += 1; > formatting Done Line 92: return Status::OK(); > what is this referring to? Stale comment. Removed. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 82: > this is really CreateChildren, not CreateTree. Moved in/out params to end. As for the naming, this function actually builds out the entire expression tree rooted at 'root'. CreateChildren() doesn't seem to fit. May be I misunderstood your comment. Line 134: /// Called recursively to create children expr trees for sub-expressions. > misleading: the root of the created tree is not 'parent' (the root is insta I refactored this function and CreateTree() a bit so it's easier to follow now. Line 135: /// > "for subexpressions" Done Line 140: /// root: root of the new tree. Created and initialized by the caller. > let's call this root_idx, makes it easier to follow It's called child_node_idx in the new patch. Line 146: /// !status.ok() if tree is inconsistent or corrupt > while we're cleaning up, move in/out params to end Done http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 69: > initialize vars in .h where practical Done Line 82: (*evaluator)->fn_ctxs_ptr_ = (*evaluator)->fn_ctxs_.data(); > why do you need that (for a newly cre
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#10). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-tabl
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 66: singleton_intermediate_tuple_(nullptr), initialize in .h where possible Line 88: RowDescriptor build_row_desc(intermediate_tuple_desc_, false); i would intuitively think that the build row is the input row. Line 97: RETURN_IF_ERROR(build_expr->Init(state, build_row_desc)); it feels dicey to pass a reference to an automatic variable, Init() might hold on to it (and there is no reason it shouldn't - descriptors should last for the lifetime of the query). this row descriptor should be member var and created in the c'tor. Line 232: ScalarExprEvaluator* const* evaluators = &conjunct_evaluators_[0]; isn't that the same as const X**? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 156: friend class AggFnEvaluator; why are these all friends? it looks like more of the "protected" functions need to be public. (why is Init() protected?) -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 9: (10 comments) Still trying to come to grips with it all but did an initial pass over the query execution changes. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 122: /// Output expressions to convert row batches onto output values. It's confusing that these aren't used in all subclasses of DataSink (e.g. DataStreamSender, JoinBuildSink). Clarify that not all sinks use these? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS9, Line 69: expr_mem_pool Does this mean that multiple scanner threads share a single MemPool? That doesn't seem safe. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h File be/src/exec/sort-node.h: PS9, Line 68: materialize_tuple_ materialize_tuple_ doesn't seem to be defined. I think we always need these exprs anyway, right? Line 69: std::vector slot_materialize_exprs_; See my comment about the name in sorter.h. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h File be/src/exec/subplan-node.h: Line 65: /// tree rooted at 'node'. * and do any initialization that is required as a result of setting the subplan. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS9, Line 78: slot_materialize_exprs_ output_tuple_exprs_? Usually when we have exprs materializing tuples it's named after the tuple it's materializing. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: PS9, Line 94: Clone Maybe this should be called ShallowClone() to make it more explicit. Without reading the function comment, I would assume it did a deep clone. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS9, Line 182: slot_materialize_exprs_ Maybe something like 'sort_tuple_exprs_'? I think the important thing is that there is one expr per slot in the sort tuple. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: Line 49: ScalarExprEvaluator::Close(key_expr_evaluators_lhs_, state); opened_ = false? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: PS9, Line 141: key_expr_evaluators_lhs_ ordering_expr_evaluators_lhs_ to match the expr name? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 9: (45 comments) first half http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 285: boost::scoped_ptr expr_mem_tracker_; > There will be one per PlanNode in the future. Currently, it still counts to if this is per querystate, the mempool would need to be thread-safe, not sure that's worth it (and if this is for evaluators, why have it per plannode?). also, since we now clearly distinguish exprs and their evaluators, and this is specifically for evaluators, let's express that in the variable name. expr_eval_mem_pool_? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 96: const AggFn& agg_fn, AggFnEvaluator** result) { move input params to front Line 161: Status AggFnEvaluator::Open(vector& evaluators, RuntimeState* state) { why not const&? Line 514: for (int i = 0; i < evaluators.size(); ++i) { single line, here and subsequent functions, where possible http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// and caches all constant input arguments. regarding the caching: is there already a todo to move that into the expr setup (as opposed to evaluator setup)? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 68: input_expr->AssignFnCtxIdx(&fn_ctx_idx); scalarexpr does that in create. do the same for this class? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 87: const int num_children = nodes[*node_idx].num_children; we don't use 'const ..' elsewhere for automatic variables. Line 88: const bool isAggFn = root->IsAggFn(); formatting Line 92: // Reset the function context index for each input expression. what is this referring to? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 81: /// status on failure. > 'fn_ctx_idx_ptr' removed. It doesn't take Expr** as 'root' is expected to b should this be a private function and the two subclasses be friends (to call this)? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 82: static Status CreateTree(ObjectPool* pool, const TExpr& texpr, Expr* root); this is really CreateChildren, not CreateTree. move in/out params to end Line 134: /// Creates an expr tree with root 'parent' via depth-first traversal. misleading: the root of the created tree is not 'parent' (the root is installed as a new child into 'parent') Line 135: /// Called recursively to create expr trees of sub-expressions. "for subexpressions" Line 140: /// node_idx: index in 'nodes' of the root of the next child TExprNode tree let's call this root_idx, makes it easier to follow Line 146: static Status CreateTreeFromThrift(ObjectPool* pool, while we're cleaning up, move in/out params to end http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 1: // Licensed to the Apache Software Foundation (ASF) under one could you flag what's new in this file that i should look at? Line 69: inited_(false), initialize vars in .h where practical Line 82: (*evaluator)->fn_ctxs_.clear(); why do you need that (for a newly created object)? Line 86: for (FunctionContext* fn_ctx : (*evaluator)->fn_ctxs_) DCHECK(fn_ctx != nullptr); do that in CreateFnCtxs? Line 133: if (opened_) return Status::OK(); what's the need for guarding against multiple open calls? (how would that happen?) Line 138: is_clone_? FunctionContext::THREAD_LOCAL : FunctionContext::FRAGMENT_LOCAL; is this really instance_local? if so, leave todo in the right places (presumably at least udf.h) Line 155: delete fn_ctxs_[i]; instead put a unique_ptr into fn_ctxs? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 72: std::vector* evaluators) WARN_UNUSED_RESULT; > Yes, ideally, that should be done in Expr::Init(). This is being tracked by todo? Line 111: /// should only be called after Open() has been called on this expr. Returns an error if > There is subtlety in this. "expr" is potentially a sub-expression of root_ but why not just call GetValue(nullptr) on the subexpr? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 55: /// rooted at a node is defind by [fn_ctx_idx_start_, fn_ctx_idx_end_). defined Line 65: static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, move in/out params to back Line 81:
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 62: static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, > There are places in which the evaluator will have shorter life span than th +1 putting things in the RuntimeState object pool with the wrong lifetime has been a source of some nasty bugs. We should be really careful not to just blindly move things to ObjectPools. Especially if things are re-created inside subplans, per-partition, etc. I can think of at least 3-4 different cases where we leaked a lot of memory as a result of this kind of mistake. It's hard to test for since you generally need a very long-running query. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#9). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-table
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 8: (56 comments) http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 84: std::vector grouping_exprs_; > where are the evaluators for these? In old-hash-table.cc. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 181: /// ProcessChildBatch(). > rename variables ending in _ctx if they're referring to now-obsolete class Done http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 49: /// flag gets set. > leave todo to outline very briefly what's going to happen with PlanNode. Done Line 285: boost::scoped_ptr expr_mem_tracker_; > is there a need to have one per exec node? how about one per fragment insta There will be one per PlanNode in the future. Currently, it still counts towards the ExecNode's MemTracker. TODO added. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 54: /// AggFnEvaluator contains runtime state and implements wrapper functions which convert > "runtime state" Done Line 55: /// the input TupleRow into AnyVal format expected by UDAF functions defined in AggFn. > stick to singular or plural Done Line 70: const AggFn& agg_fn, AggFnEvaluator** evaluator) WARN_UNUSED_RESULT; > why not make agg_fn a const&? Done Line 80: /// and caches all constant input arguments. > what does agg_fn_ctx mean? The 'agg_fn_ctx' part of the comment is removed. Line 88: /// Avoid the overhead of re-initializing an evaluator (e.g. calling GetConstVal() > how big exactly is that overhead? Depends on how number and size of the input expressions. Line 94: void Clone( > do we really need this? It's used in PAGG::Partition logic for resource isolation. Line 117: /// called either to drive the UDA's Update() or Merge() function, depending on whether > function, depending Done Line 136: /// merge phases. > comment on the types of src and dst. also, those aren't great names, they s Done Line 159: > function comment missing Done Line 160: /// Free local allocations made in UDA functions and input arguments' evaluators. > why not a const&? Done Line 205: inline const ColumnType& intermediate_type() const; > long lines Done http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn.h File be/src/exprs/agg-fn.h: Line 21: > .. Done Line 28: class MemPool; > the comment about "static states"/compile-time information applies to all e Done Line 97: /// of the output value. Returns error status on failure. > can intermediate and output slodesc be null? No. Changed to const ref instead. Line 101: WARN_UNUSED_RESULT; > who owns agg_fn? It lives in state->obj_pool(). Comments added. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/case-expr.h File be/src/exprs/case-expr.h: Line 38: protected: > this is needed? Yes so we can call CaseExpr() constructor. Line 41: > it would be nice to keep the fnctx index out of every class. is it possible Done http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 19: #ifndef IMPALA_EXPRS_EXPR_H > move directly above class, not sure why the class comment was up here to st Done Line 34: using namespace impala_udf; > explain function context index concept here The latest patch avoids exposing function context index outside scalar-expr-* so the comments will be kept in scalar-expr-* instead. Line 34: using namespace impala_udf; > also, it would be good to have an explanation of the general class hierarch Done Line 81: /// status on failure. > why does this not take an Expr**? 'fn_ctx_idx_ptr' removed. It doesn't take Expr** as 'root' is expected to be created by the caller (constructors of ScalarExpr or AggFn). Line 113: /// some ScalarExpr such as ScalarFnCall. NULL if it's not used. > initialize all pointers to nullptr here, that way you can't forget it in th Done Line 134: /// Creates an expr tree with root 'parent' via depth-first traversal. > doesn't look like a useful function (if you see it being called you probabl Removed. Line 137: /// pool: Object pool in which Expr created from nodes are stored > "to create expr trees for subexpressions" Done http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 40: /// reference to the root of a ScalarExpr tree, runtime state (e.g. FunctionContexts) > runtime state Done Line 62: /// FunctionContexts needed during evaluation. Allocations from this evaluator will > runtimestate already has an objectpool There are places in which the evaluator will have shorter life span than the e
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#8). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-table
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 7: (25 comments) some more comments. i haven't looked at the exec nodes in detail yet. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/aggregation-node.h File be/src/exec/aggregation-node.h: Line 84: std::vector grouping_exprs_; where are the evaluators for these? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: Line 181: bool PrevRowCompare(ScalarExprEvaluator* pred_ctx); rename variables ending in _ctx if they're referring to now-obsolete class names. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 54: /// AggFnEvaluator contains runtime states and implements wrapper functions which "runtime state" Line 55: /// converts the input TupleRow into AnyVal format expected by UDAF functions defined stick to singular or plural apply to this class comment and all other comments. Line 70: const AggFn* agg_fn, AggFnEvaluator** evaluator) WARN_UNUSED_RESULT; why not make agg_fn a const&? Line 80: /// and caches constant input arguments in 'agg_fn_ctx'. what does agg_fn_ctx mean? is the caching behavior relevant for the caller? Line 88: /// Avoid the overhead of re-initializing an evaluator (e.g. calling GetConstVal() how big exactly is that overhead? Line 94: void Clone( do we really need this? Line 117: /// called either to drive the UDA's Update() or Merge() function depending on whether function, depending Line 136: void Finalize(Tuple* src, Tuple* dst); comment on the types of src and dst. also, those aren't great names, they sound like you're referring to Add() Line 159: void FreeLocalAllocations(); function comment missing Line 160: static void FreeLocalAllocations(std::vector& evaluators); why not a const&? Line 205: /// The interpreted path for the UDA's Update() function. It sets up the arguments to call long lines http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 34: /// Please see the headers of ScalarExpr and AggFn for details. > explain function context index concept here also, it would be good to have an explanation of the general class hierarchy somewhere. this description is also unclear on what differentiates the classes in this hierarchy. exprs and their subclasses contain compile-time information and the code to evaluate the expr (represented by the specific class). evaluators contain the general runtime state needed for the actual evaluation, but they don't need to be subclassed, because the expr-specific code sits in the expr subclasses. "embodies" doesn't really mean anything, that could also be applied to the analyzer (java) classes. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 40: /// reference to the root of a ScalarExpr tree, runtime states (e.g. FunctionContexts) runtime state apply elsewhere Line 62: static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool, runtimestate already has an objectpool Line 63: const ScalarExpr* expr, ScalarExprEvaluator** evaluator) WARN_UNUSED_RESULT; why not a const&? Line 72: /// also the location in which constant arguments to functions are computed. Does not the location? do you mean "this also computes constant function arguments"? why here and not in expr::init()? it sounds like static setup. Line 73: /// need to be called on clones. Idempotent (this allows exprs to be opened multiple this clone business is tricky, would be nice to remove it. Line 111: Status GetConstValue( can't this be subsumed by GetValue(nullptr)? Line 112: RuntimeState* state, const ScalarExpr* expr, AnyVal** const_val) WARN_UNUSED_RESULT; why pass in expr? how is that different from the expr of Create()? Line 179: /// and owned by this ScalarExprEvaluator. shouldn't they go into an objectpool? Line 208: int output_scale_; comment why this is here and not in expr Line 210: /// Call Create() instead. reconsider whether this comment is useful http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 113: std::vector* exprs); could you also add warn_unused_result to all other .h files that you're touching? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Ti
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 7: (31 comments) initial comments http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 49: /// flag gets set. leave todo to outline very briefly what's going to happen with PlanNode. Line 285: /// Created in Prepare(). is there a need to have one per exec node? how about one per fragment instance? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn.h File be/src/exprs/agg-fn.h: Line 21: /// An aggregate function generates an output over a set of tuple rows.. .. Line 28: /// AggFn contains static states about an aggregate function such as the aggregation the comment about "static states"/compile-time information applies to all exprs and should move in the class comment for Expr (adapted, of course). Line 97: /// 'output_slot_desc' is the slot descriptor of the output value. Returns error can intermediate and output slodesc be null? Line 101: const SlotDescriptor* output_slot_desc, const TExpr& texpr, AggFn** agg_fn); who owns agg_fn? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/case-expr.h File be/src/exprs/case-expr.h: Line 38: friend class ScalarExpr; this is needed? Line 41: CaseExpr(const TExprNode& node, int fn_context_index); it would be nice to keep the fnctx index out of every class. is it possible to have only generic logic in Expr deal with this? (could createtree set this in a final pass over the tree?) http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 19: /// --- Expr overview move directly above class, not sure why the class comment was up here to start with Line 34: /// Please see the headers of ScalarExpr and AggFn for details. explain function context index concept here Line 81: ObjectPool* pool, const TExpr& texpr, Expr* root, int* fn_ctx_idx_ptr); why does this not take an Expr**? unclear what the significance of the idx_ptr is at this point. Line 113: LibCacheEntry* cache_entry_; initialize all pointers to nullptr here, that way you can't forget it in the c'tor. Line 134: void AddChild(ScalarExpr* expr) { children_.push_back(expr); } doesn't look like a useful function (if you see it being called you probably want to look at the implementation to figure out if it does anything beyond the obvious). Line 137: /// Called recursively to create expr tree of sub-expression. "to create expr trees for subexpressions" http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 28: /// ScalarExpr implements compute functions which given a row, performs the computation stick to singular or plural in this and the following sentence. which, Line 40: /// in FunctionContext in ScalarExprEvaluator which is the interface to evaluate a ..evaluator, Line 44: /// initializes itself and frees its resources by calling OpenEvaluator() and move comments about evaluator to evaluator class comment. Line 50: /// GetCodegendComputeFn() to either generate custom IR compute functions using IRBuilder ..builder, which inlines ...functions, or... Line 107: static Status Create(ObjectPool* pool, RuntimeState* state, why do we need this and Expr::CreateTree? Line 155: static void Close(std::vector& exprs); move to Expr? Line 223: /// Initializes the static states of all nodes in the expr tree. what is the non-static state in the expr tree? Line 280: /// evaluator and initialized by calling ScalarExpr::OpenEvaluator(). would be good to have that in the Expr class comment, maybe with some more context. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: Line 379: const vector& thrift_output_exprs) { unused parameter http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 133: /// per-row partition values for shuffling exchange; also leave todo that the exprs go into a separate class. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/data-stream-test.cc File be/src/runtime/data-stream-test.cc: Line 198: scoped_ptr mem_pool_; more mem pools? Line 207: vector ordering_exprs_; ? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 288: /// TODO: Move these into the new query-wide state, indexed by partition id. does this todo still apply, given that the descriptor tbl itself moves into the querystate? Line 291: std::vector partition_key_value_evaluators_; why is there an evaluator in here, instead of the expr? this is shared by all instance executors. http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/sorter.h File be/src/runt
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#7). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change separates Expr and ExprContext. This is a preparatory step for factoring out static data (e.g. Exprs) of plan fragments to be shared by multiple plan fragment instances. This change includes the followings: 1. Include aggregate functions (AggFn) as Expr. This separates AggFn from its evaluator. AggFn is similar to existing Expr as both are represented as a tree of Expr nodes but it doesn't really make sense to call Get*Val() on AggFn. This change restructures the class hierarchy: much of the existing Expr class is now renamed to ScalarExpr. Expr is the parent class of both AggFn and ScalarExpr. Expr is defined to be a tree with root of either AggFn or ScalarExpr and all descendants being ScalarExpr. 2. ExprContext is renamed to ScalarExprEvaluator which is the interface for evaluating ScalarExpr; AggFnEvaluator is the interface for evaluating AggFn. Multiple evaluators can be instantiated per Expr. Expr contains static states of an expression while evaluator contains runtime states needed for execution (i.e. evaluating the expression). 3. Update all exec nodes to instantiate Expr and their evaluators separately. ExecNode::Init() will be responsible for creating all the Exprs in an ExecNode while their evaluators are created in ExecNode::Prepare(). Certain evaluators are also moved into the data structures which actually utilize them. For instance, HashTableCtx now owns the build and probe expression evaluators. Similarly, TupleRowComparator and Sorter also own the evaluators. ExecNode which utilizes these data structures are only responsible for creating the expressions used by these data structures. 4. All codegen functions take Exprs instead of evaluators. 5. The assignment of index into the FunctionContext vector is now done during the construction of ScalarExpr. Evaluators are only responsible for allocating and initializing the FunctionContexts. 6. Open(), Prepare() are now removed from Expr classes. The interface for creating any Expr is via either ScalarExpr::Create() or AggFn::Create() which will convert a thrift Expr into an initialized Expr object. Similarly, Create() interface is used for creating evaluators from an intialized Expr object. This separation allows the future change to introduce PlanNode data structures. The plan is to move all ExecNode::Init() logic to PlanNode and call them once per plan fragment. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/init.cc M be/src/exec/CMakeLists.txt M be/src/exec/aggregation-node-ir.cc M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/hdfs-text-table
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: I will post a new patch once the draft has gotten further along. Please feel free to wait till then to review. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: Is this superseded by the other review? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 159: virtual BooleanVal GetBooleanVal(ExprContext* context, const TupleRow*); > i wrote that comment before i reacquainted myself with the approach to spec I don't see it conflicting with the abstractions: Expr stores static data which can be _shared_ among fragment instances while the latter two have to be created per fragment instance for execution. These interfaces don't modify the states of an Expr. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 159: virtual BooleanVal GetBooleanVal(ExprContext* context, const TupleRow*); > Not sure I understand this comment. These need to be overridden by subclass i wrote that comment before i reacquainted myself with the approach to specialization in expr/exprctx/fnctx (all specialization is in expr). this conflicts with the abstractions we had in mind (expr for static data, exprevaluator/fnctx for execution). i think we should have another, hopefully shorter, discussion about what we ultimately want these classes to look like. we don't want to make the changes are part of this patch, but it's good to know where we're headed. -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 159: virtual BooleanVal GetBooleanVal(ExprContext* context, const TupleRow*); > these don't make sense here anymore, execution-related functions should be Not sure I understand this comment. These need to be overridden by subclasses of Exprs, no ? Not sure if it makes sense to put both ScalarFnCall::GetBooleanVal() or Literal::GetBooleanVal() in ExprCtx. Can you please elaborate ? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: (38 comments) http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/case-expr.cc File be/src/exprs/case-expr.cc: Line 76: fn_ctx->Free(reinterpret_cast(case_state)); is this really case-specific? doesn't look like it. (is there a need for an expr-specific 'close'? or do we simply free thread-local state?) http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr-context.cc File be/src/exprs/expr-context.cc: Line 73: RETURN_IF_ERROR(root_->Init(state, row_desc)); init should happen independently of the exprctxs (and certainly not once per exprctx) http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr-context.h File be/src/exprs/expr-context.h: Line 41: /// An ExprContext contains thread private states for the evaluation of an Expr Tree. 'thread private' sounds mysterious/vague. isn't this 'the state necessary for the evaluation of an expr tree' (ie, all of it)? Line 52: /// Allocates all FunctionContexts for this ExprContext. Also initialize the expr tree what does 'expr tree' refer to? the tree of expr objects? (if so, why would that be initialized here?) Line 58: /// Initializes the FunctionContexts in the ExprContext on all nodes in the Expr tree. is there a need to separate prepare() and open()? Line 66: /// originals but have their own MemPool and thread-local state. Clone() should be used this seems convoluted, and they shouldn't all need a separate mempool. unless you think this cloning stuff is a good idea, leave todo in class comment. Line 77: /// call is a no-op. The new ExprContexts are created in state->obj_pool(). do we really need this? it's bizarre. Line 124: void EvaluateWithoutRow(TColumnValue* col_val); awkward name. getconstantvalue? Line 200: /// Pool backing fn_contexts_. Counts against the runtime state's UDF mem tracker. is there any need to have a separate pool for each exprctx? if not, leave todo to remove. also leave todos for other specific clean-up items. http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 112: Expr::~Expr() { > I think we've been trying to move away from destructors releasing resources related note: we should figure out what differentiates prepare from open and whether we need both (and if the counterpart should be close or releaseresources) Line 119: void Expr::CloseContext(RuntimeState* state, ExprContext* context, this shouldn't be here Line 131: return Status("Expression tree only partially reconstructed. Not all thrift " \ why does this make sense? http://gerrit.cloudera.org:8080/#/c/5483/3/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 41: /// The Expr superclass defines a virtual Get*Val() compute function for each possible a lot of this is outdated. http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 60: /// Only short-circuited operators (e.g. &&, ||) and other special functions like duplicated paragraph Line 79: /// Expr users should use the static Get*Val() wrapper functions to evaluate exprs, this still reflects the confusion between expr and expctx. please make a complete pass and clean this up. rewrite where needed, as opposed to just adding new text. this should read as if it were written from scratch, not pieced together over the years. Line 84: /// --- Relationship with ExprContext and FunctionContext this makes sense to do at the very top: explain what the class's abstraction is, and how it relates to others. Line 87: /// a query is represented as a tree of Expr whose states are static after initialization whose states are static: hard to follow, unless i already know what it means. exprs represent static information, such as types, etc. Line 88: /// by their Prepare() functions. init Line 90: /// ExprContext contains thread private states (e.g. evaluation result, FunctionContext). "thread private states": exprctx encapsulates the execution state, which is why it can't be shared between executing threads. Line 93: /// via the field 'root_'. do you gain anything here by referencing private members vars? Line 100: /// Expressions in exec nodes are expected to be initialized once per fragment and the "exprs in ..." (clearer to reference the class name directly) same for exec nodes, etc. Line 128: class BasicBlock; remove indentation, like below Line 159: virtual BooleanVal GetBooleanVal(ExprContext* context, const TupleRow*); these don't make sense here anymore, execution-related functions should be in exprctx Line 171: inline void AddChild(Expr* expr) { children_.push_back(expr); } why do these need to be constructed manually, as opposed to from a texpr/texprnode? Line 180: inline int fn_context_index() const { return fn_context_ind
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 6: (1 comment) One final issue. I'm assuming that we're going to rename OpenContext()/CloseContext() as part of the larger rename of ExprContext http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 112: Expr::~Expr() { I think we've been trying to move away from destructors releasing resources. Do you think it's reasonable to add an Expr::Close() function for this logic? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#6). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change cleans up some entangled logic between Expr and ExprContext's initialization. In general, we store static states of an expression in Expr and thread-private / evaluation related states in ExprContext. This change cleans up the followings: 1. FunctionContext should be managed by ExprContext. They reside in a vector inside ExprContext. When creating the expression tree, each sub-expression requiring a FunctionContext will be assigned an index into the vector. The index is stored in the Expr object. ExprContext::Prepare() will allocate and prepare the FunctionContext for all sub-expressions in the expression tree. 2. Move the field 'output_scale_' to ExprContext. This field is needed to format the output in the table writer correctly for the case of RoundUpTo(). However, this field can only be derived much later in the initialization so in the spirit of not modifying the Expr after it has been initialized, let's move this field out to ExprContext until IMPALA-4743 and IMPALA-4720 are fixed. 3. Rename Expr::Prepare(), Expr::Open() and ExprClose() to Expr::Init(), Expr::OpenContext() and Expr::CloseContext() to better describe their purposes. Also move some functions which operate solely on ExprContext to ExprContext class. 4. Expr::CreateExprTree() and friends return Expr instead of ExprContext. This clean up is a step towards sharing expression states among plan fragment instances. This change also fixes a bug in Expr::GetFnContextError(). The old code only returned the error in the top level FunctionContext, ignoring the error set by its descendants. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exec/aggregation-node.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table-test.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/old-hash-table-test.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/sort-exec-exprs.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-context.cc M be/src/exprs/expr-context.h M be/src/exprs/expr-test.cc M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/is-not-empty-predicate.cc M be/src/exprs/is-not-empty-predicate.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/tuple-is-null-predicate.cc M be/src/exprs/tuple-is-null-predicate.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/row-batch.h M be/src/service/fe-support.cc M be/src/service/impalad-main.cc M common/thrift/ImpalaInternalService.thrift 50 files changed, 726 insertions(+), 618 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/5483/6 -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 4: (12 comments) http://gerrit.cloudera.org:8080/#/c/5483/4//COMMIT_MSG Commit Message: PS4, Line 31: soly > solely Done PS4, Line 39: ignorning > ignoring Done http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exec/hash-table-test.cc File be/src/exec/hash-table-test.cc: Line 74: ASSERT_OK(ExprContext::Prepare(build_expr_ctxs_, NULL, desc, &tracker_)) > Missing semi-colon. Should have compiled the tests too. Done http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.cc File be/src/exprs/expr-context.cc: PS4, Line 129: RegisterFnContexts > CreateFnContexts()? I don't think there's any real registration going on he Done http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.h File be/src/exprs/expr-context.h: Line 86: /// Use this interface except for creating ExprContext on a stack. > It looks like we only do this in two places in ExprTest. Why not change tho Done Line 102: /// Convenience function for closing multiple expr trees. > Are we closing the expr trees or the contexts? Same for the comments above Done http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 412: string Expr::DebugString(const vector& ctxs) { > This should probably be moved to ExprContext too, right? Done http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.h File be/src/exprs/expr.h: PS4, Line 100: its : /// states > What is "its states" expressions' static states Line 110: /// TODO: > Remove TODO's while we're here? These seem too non-specific to be useful. Done PS4, Line 354: on the expr tree. > It's really opening the ExprContext and FunctionContexts instead of the exp It's about initializing ExprContext and FunctionContexts for all nodes in the ExprTree. Comment rephrased. Line 358: /// Subclasses overriding this function should call Expr::Close(). > This really closes the ExprContext instead of the Expr. Renamed to Expr::CloseContext(). PS4, Line 387: FunctionContext > FunctionContexts Done -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#5). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change cleans up some entangled logic between Expr and ExprContext's initialization. In general, we stored static states of an expression in Expr and thread-private / evaluation related states in ExprContext. This change cleans up the followings: 1. FunctionContext should be managed by ExprContext. They reside in a vector inside ExprContext. When creating the expression tree, each sub-expression requiring a FunctionContext will be assigned an index into the vector. The index is stoed in the Expr object. ExprContext::Prepare() will allocate and prepare the FunctionContext for all sub-expressions in the expression tree. 2. Move the field 'output_scale_' to ExprContext. This field is needed to format the output in the table writer correctly for the case of RoundUpTo(). However, this field can only be derived much later in the initialization so in the spirit of not modifying the Expr after it has been initialized, let's move this field out to ExprContext until IMPALA-4743 and IMPALA-4720 are fixed. 3. Rename Expr::Prepare(), Expr::Open() and Expr::Close() to ExprContext class instead as they really operate solely on ExprContext but not Expr. 4. Expr::CreateExprTree() and friends return Expr instead of ExprContext. This clean up is a step towards sharing expression states among plan fragment instances. This change also fixes a bug in Expr::GetFnContextError(). The old code only returned the error in the top level FunctionContext, ignoring the error set by its descendants. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exec/aggregation-node.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table-test.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/old-hash-table-test.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/sort-exec-exprs.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-context.cc M be/src/exprs/expr-context.h M be/src/exprs/expr-test.cc M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/is-not-empty-predicate.cc M be/src/exprs/is-not-empty-predicate.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/tuple-is-null-predicate.cc M be/src/exprs/tuple-is-null-predicate.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/row-batch.h M be/src/service/fe-support.cc M be/src/service/impalad-main.cc M common/thrift/ImpalaInternalService.thrift 50 files changed, 726 insertions(+), 617 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/5483/5 -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Michael Ho has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exec/hash-table-test.cc File be/src/exec/hash-table-test.cc: Line 74: ASSERT_OK(ExprContext::Prepare(build_expr_ctxs_, NULL, desc, &tracker_)) Missing semi-colon. Should have compiled the tests too. http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 376: Status Expr::Prepare(RuntimeState* state, const RowDescriptor& row_desc) { > It's kind of weird that the scope of what is initialised is different for E Good point. I like OpenContext() and CloseContext(). -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 4: (12 comments) I had some high level comments. Still trying to understand the flow of execution with the various Open()/Close()/Clone() etc calls. http://gerrit.cloudera.org:8080/#/c/5483/4//COMMIT_MSG Commit Message: PS4, Line 31: soly solely PS4, Line 39: ignorning ignoring http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.cc File be/src/exprs/expr-context.cc: PS4, Line 129: RegisterFnContexts CreateFnContexts()? I don't think there's any real registration going on here. http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.h File be/src/exprs/expr-context.h: Line 86: /// Use this interface except for creating ExprContext on a stack. It looks like we only do this in two places in ExprTest. Why not change those to use Create() and hide the constructor? Line 102: /// Convenience function for closing multiple expr trees. Are we closing the expr trees or the contexts? Same for the comments above too. http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 376: Status Expr::Prepare(RuntimeState* state, const RowDescriptor& row_desc) { It's kind of weird that the scope of what is initialised is different for Expr::Prepare() vs Expr::Open() and Expr::Close(). Expr::Prepare() just initialises the Expr and doesn't do anything with ExprContext, but Expr::Open() and Expr::Close() initialize the ExprContext. We need something like this because there's Expr-subclass-specific logic in Open() and Close(). Maybe Open() and Close() should be OpenContext() and CloseContext() instead so the distinction is clearer? Line 412: string Expr::DebugString(const vector& ctxs) { This should probably be moved to ExprContext too, right? http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.h File be/src/exprs/expr.h: PS4, Line 100: its : /// states What is "its states" Line 110: /// TODO: Remove TODO's while we're here? These seem too non-specific to be useful. PS4, Line 354: on the expr tree. It's really opening the ExprContext and FunctionContexts instead of the expr tree, right? Line 358: /// Subclasses overriding this function should call Expr::Close(). This really closes the ExprContext instead of the Expr. This makes sense since we may need Expr-subclass-specific logic, but the comment should be clearer. PS4, Line 387: FunctionContext FunctionContexts -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5483 to look at the new patch set (#4). Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. IMPALA-4192: Disentangle Expr and ExprContext This change cleans up some entangled logic between Expr and ExprContext's initialization. In general, we stored static states of an expression in Expr and thread-private / evaluation related states in ExprContext. This change cleans up the followings: 1. FunctionContext should be managed by ExprContext. They reside in a vector inside ExprContext. When creating the expression tree, each sub-expression requiring a FunctionContext will be assigned an index into the vector. The index is stoed in the Expr object. ExprContext::Prepare() will allocate and prepare the FunctionContext for all sub-expressions in the expression tree. 2. Move the field 'output_scale_' to ExprContext. This field is needed to format the output in the table writer correctly for the case of RoundUpTo(). However, this field can only be derived much later in the initialization so in the spirit of not modifying the Expr after it has been initialized, let's move this field out to ExprContext until IMPALA-4743 and IMPALA-4720 are fixed. 3. Rename Expr::Prepare(), Expr::Open() and Expr::Close() to ExprContext class instead as they really operate soly on ExprContext but not Expr. This clean up is a step towards sharing expression states among plan fragment instances. This change also fixes a bug in Expr::GetFnContextError(). The old code only returned the error in the top level FunctionContext, ignorning the error set by its descendants. Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exec/aggregation-node.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table-test.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/old-hash-table-test.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/sort-exec-exprs.cc M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-context.cc M be/src/exprs/expr-context.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/is-not-empty-predicate.cc M be/src/exprs/is-not-empty-predicate.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/tuple-is-null-predicate.cc M be/src/exprs/tuple-is-null-predicate.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/row-batch.h M be/src/service/fe-support.cc M common/thrift/ImpalaInternalService.thrift 46 files changed, 650 insertions(+), 528 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/5483/4 -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong