[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 23: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators. Also, codegen
   functions will not return error status should the IR function fails the
   LLVM verification step.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Reviewed-on: http://gerrit.cloudera.org:8080/5483
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 23:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/754/

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 23: Code-Review+2

Rebase.

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Michael Ho (Code Review)
Hello Marcel Kornacker, Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#23).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators. Also, codegen
   functions will not return error status should the IR function fails the
   LLVM verification step.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 22: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 22:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/753/

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 22: Code-Review+2

Carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 22:

Fix clang-tidy errors:
1. Missing override in function declarations
2. Remove unused variables in HdfsPartitionDescriptor
3. Handles missing status check in ScalarFnCall::GetCodegendComputeFn().

Re-ran exhaustive build and clang-tidy to verify things are still sane.

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-17 Thread Michael Ho (Code Review)
Hello Marcel Kornacker, Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#22).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators. Also, codegen
   functions will not return error status should the IR function fails the
   LLVM verification step.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 21: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/743/

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 21:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/743/

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 21:

Rebase. Fix some bugs found during stress tests. Carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-16 Thread Michael Ho (Code Review)
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#21).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/e

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 19:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 304:   /// analytic_eval_fns_. When enough input rows have been consumed 
to produce the analytic
> long lines
Done


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 498: //   %eval_ptr = getelementptr inbounds 
%"class.impala::ScalarExprEvaluator"*, 
> trailing
Done


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

Line 250: if (EvalConjuncts(&conjunct_evals_[0], conjuncts_.size(), row)) {
> .data()
Done


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1238: if (ExecNode::EvalConjuncts(&evals[0], evals.size(), row)) {
> .data()
Done


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1410: //   %input0 = call { i8, double } 
@GetSlotRef(%"class.impala::ScalarExprEvaluator"* 
> trailing
Done


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 218:   for (int i = 0; i < evals.size(); ++i) {
> single line
Done


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/runtime/data-stream-test.cc
File be/src/runtime/data-stream-test.cc:

Line 307:   is_asc_, nulls_first_));
> tab
TupleRowComparator will keep a reference to is_asc_ so it needs to exist after 
calling the constructor.


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 141:   std::vector nulls_first_;
> const& here as well?
Not feasible given the way it's set up in the constructor.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 20: Code-Review+2

Carry +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-08 Thread Michael Ho (Code Review)
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#20).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/e

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 19: Code-Review+2

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 229:   /// Analytic functions and their evaluators.
ownership?


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 498: //   %eval_ptr = getelementptr inbounds 
%"class.impala::ScalarExprEvaluator"*, 
trailing


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 290:   boost::scoped_ptr expr_mem_pool_;
> We were passing the expr_mem_tracker directly into the ExprContext so there
i guess it's going in the right direction :)


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

Line 250: if (EvalConjuncts(&conjunct_evals_[0], conjuncts_.size(), row)) {
.data()


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1238: if (ExecNode::EvalConjuncts(&evals[0], evals.size(), row)) {
.data()


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1410: //   %input0 = call { i8, double } 
@GetSlotRef(%"class.impala::ScalarExprEvaluator"* 
trailing


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 218:   for (int i = 0; i < evals.size(); ++i) {
single line


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/runtime/data-stream-test.cc
File be/src/runtime/data-stream-test.cc:

Line 307:   is_asc_, nulls_first_));
tab

can't you specify the initializer list inline, as in: .., {true}, {true})?

if that doesn't work, shouldn't vector{1, true} work?


http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 141:   std::vector nulls_first_;
const& here as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5483/19/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 304:   /// evaluators_. When enough input rows have been consumed to 
produce the analytic
long lines


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 51:   opened_ = false;
> The problem is that the ordering_expr_evaluators_rhs_ is cloned from orderi
Turns out that there is still a hidden dependency for codegen to happen first 
before opening ordering_expr_evaluators_*. This dependency will be removed once 
we move to the next step in which codegen is done before creating fragment 
instances.

Keeping this function as Open() for now but I managed to rewrite it to not need 
the opened_ flag.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-06 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#19).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/18/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1707: //
TODO: Add IR function dump.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 16:

(61 comments)

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 79:   DCHECK_EQ(intermediate_tuple_desc_->slots().size(), 
output_tuple_desc_->slots().size());
> move to c'tor
Done


Line 124:   &agg_fn_evaluators_));
> nice, the separation is making the code a lot crisper.
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 92: new SlotRef(desc) : new SlotRef(desc, TYPE_BOOLEAN);
> Tab?
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 53: class AggregationNode : public ExecNode {
> it's sad that this still needs to be maintained. please apply my comments t
Done


Line 80:   /// FunctionContexts objects are stored in ObjectPool of 
RuntimeState.
> so what is agg_fn_pool_ for then?
Stale comment. It's the backing MemPools of 'agg_fn_evaluators_'.


Line 98:   boost::scoped_ptr intermediate_row_desc_;
> any reason this can't go in state->obj_pool()?
Done


Line 150:   /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the 
codegen'ed code.
> "for agg_fn_evaluators_.data()" is more idiomatic.
Done


Line 165:   /// slot_idx is the idx into aggregate_exprs_ (does not include 
grouping exprs).
> update comment
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 892:   AggFn::Close(analytic_fns_);
> having an expr::close() and having to call that here is a bit counterintuit
Expr::Close() will be called in PlanNode::Close() in the future. This is done 
here as a transitionary step.

Also, I don't see Expr or evaluators any different from other data structures 
in the exec node. You can extend the argument to any structures you have in an 
exec node and argue that you can add them to a container which automatically 
call Close() on them. I don't see the need to create a special case for Expr.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 210:   ScalarExpr* partition_by_eq_expr_;
> initialize inline
Done


Line 215:   ScalarExpr* order_by_eq_expr_;
> same
Done


Line 216:   ScalarExprEvaluator* order_by_eq_expr_evaluator_;
> same
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 100:   RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, 
thrift_output_exprs));
> why not just call create() directly here?
Not all data_sink have output_exprs. Some of them have partition_exprs only.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 123:   boost::scoped_ptr expr_mem_pool_;
> do we really need a separate mempool for that?
What other MemPool can we use then ?


Line 131:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
> move in/out param to back
Done


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:

Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) {
> why did we not need this before?
This was not needed before as the exchange node owns the evaluators before and 
was added to vector of evaluators to free periodically. The evaluator is not 
stored in the tuple row comparator.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 460: if (!EvalConjunct(evaluators[i], row)) return false;
> can't you just inline the call here?
Not sure I understand the comment. EvalConjunct() is already inlined.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 263:   /// object pool.
> why?
evaluators are execution related stuff so they are stored in the exec nodes' 
obj_pool (similar to other stuff which live in the obj_pool_).

exprs are fragment level entity so it makes sense for it to live in fragment 
level's obj_pool.


Line 290:   boost::scoped_ptr expr_mem_pool_;
> what mem pool did we use for that before? in general, we want to get away f
We were passing the expr_mem_tracker directly into the ExprContext so there is 
one MemPool per ExprContext before. Now all exprs in an exec node share the 
expr_mem_pool of the exec node.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 102:   Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,
> state contains pool
Caller can pass in pool with shorter life span than the entire query (e.g. 
scanner thread).


Line 103:   const FilterContext& from);
> move const args to front
Done


Line 114

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-05 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#18).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 17: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 92:new SlotRef(desc) : new SlotRef(desc, TYPE_BOOLEAN));
Tab?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

Line 96:   for (const TExpr& texpr: texprs) {
I was even thinking it might be less error-prone if, on failure, this closed 
all of the exprs it created and cleared the vector. Removes an edge case from 
the caller.

Ok to ignore.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-03 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 17:

please fix the pervasive non-const vector& or switch to vector* and explain why 
it can get updated

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-03 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 17:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 79:   DCHECK_EQ(intermediate_tuple_desc_->slots().size(), 
output_tuple_desc_->slots().size());
move to c'tor


Line 124:   DCHECK_EQ(agg_fns_.size(), agg_fn_evaluators_.size());
nice, the separation is making the code a lot crisper.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 53: class AggregationNode : public ExecNode {
it's sad that this still needs to be maintained. please apply my comments to 
the partitioined-*.{cc,h} as well, where it makes sense


Line 80:   /// FunctionContexts objects are stored in ObjectPool of 
RuntimeState.
so what is agg_fn_pool_ for then?


Line 98:   boost::scoped_ptr intermediate_row_desc_;
any reason this can't go in state->obj_pool()?


Line 150:   /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the 
codegen'ed code.
"for agg_fn_evaluators_.data()" is more idiomatic.

are you going to do a rename of "evaluator[s]" suffixes to "evl[s]" or 
"eval[s]" at the end?


Line 165:   /// slot_idx is the idx into aggregate_exprs_ (does not include 
grouping exprs).
update comment


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 892:   AggFn::Close(analytic_fns_);
having an expr::close() and having to call that here is a bit counterintuitive 
(and manual, since we need to do that for every expr). would be nice to move 
that to a utility function in runtimestate (track the cache entries in some 
ancillary structure and then free them all at once instead of piecemeal). that 
would reduce the likelihood of forgetting this somewhere.

not necessary to do it as part of this patch, but a todo would be nice.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 210:   ScalarExpr* partition_by_eq_expr_;
initialize inline


Line 215:   ScalarExpr* order_by_eq_expr_;
same


Line 216:   ScalarExprEvaluator* order_by_eq_expr_evaluator_;
same


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 100:   RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, 
thrift_output_exprs));
why not just call create() directly here?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 87:   static Status Create(ObjectPool* pool, RuntimeState* state,
state already contains a pool


Line 123:   boost::scoped_ptr expr_mem_pool_;
do we really need a separate mempool for that?


Line 131:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
move in/out param to back


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:

Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) {
why did we not need this before?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 460: if (!EvalConjunct(evaluators[i], row)) return false;
can't you just inline the call here?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 149:   static bool EvalConjunct(ScalarExprEvaluator* evaluator, TupleRow* 
row);
evalpredicate

a predicate is a conjunct if it's part of a conjunctive clause, ie, a predicate 
on its own can't be a conjunct.


Line 151:   /// Evaluate the conjuncts in 'evaluators' over 'row'.
add convenience function for a vector?


Line 263:   /// object pool.
why?


Line 290:   boost::scoped_ptr expr_mem_pool_;
what mem pool did we use for that before? in general, we want to get away from 
having mempools all over the place


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 102:   Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,
state contains pool


Line 103:   const FilterContext& from);
move const args to front


Line 114:   /// Codegen Eval() by codegen'ing the expression evaluations and 
replacing the type
update comment


Line 117:   static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* 
filter_expr,
does filter_expr get updated in this function?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 129:   void FreeBuildLocalAllocations();
FreeLocal<>Allocations()?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

Line 249: DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size());
this would benefit from a convenience function for a vecto

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-03 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#17).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 17:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 93: build_exprs_.push_back(build_expr);
> Wrap pool_->Add() around the above line. Makes it harder to reintroduce the
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS16, Line 163:  
> tab?
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 87: RETURN_IF_ERROR(ScalarExpr::Create(eq_join_conjunct.left, 
child(0)->row_desc(),
> It would be nice if we had a way to make this cleanup more automatic to mak
Done


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/anyval-util.h
File be/src/exprs/anyval-util.h:

Line 32: using impala_udf::FunctionContext;
> Using the whole namespace seems fine to me too so long as it's scoped to im
Keep it as is for now. Using the whole namespace in the header seems to bad 
practice in general.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-06-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 16:

(6 comments)

Looking good, just had a handful of comments.

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 93: pool_->Add(build_expr);
Wrap pool_->Add() around the above line. Makes it harder to reintroduce the bug 
by adding a line in-between.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ 
false));
> Good point. I think heap allocated may be safer in case the expression keep
I think we should be explicit in the ScalarExpr comment about whether Create() 
can hold onto a reference, or whether it has  to copy.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS16, Line 163: 
tab?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 87: Status status = ScalarExpr::Create(eq_join_conjunct.left, 
child(0)->row_desc(),
It would be nice if we had a way to make this cleanup more automatic to make it 
harder to introduce bugs like this. Maybe ScalarExpr::Create() should close the 
expr and return NULL if it fails?

Similarly maybe the vector version of ScalarExpr::Create() should close all the 
exprs and clear the vector on failure.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/anyval-util.h
File be/src/exprs/anyval-util.h:

Line 32: using impala_udf::FunctionContext;
Using the whole namespace seems fine to me too so long as it's scoped to impala 
(this is also fine).


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 157:   /// or its sub-expressions. 'start_idx' and 'end_idx' correspond to 
the range
> Yes, it'd work out but not sure the difference in overhead for not reusing 
I'm not too concerned about that. There's two possible levels of caching:

* In the MemPool if we don't transfer and call Clear()
* In TCMalloc or the new buffer pool, where there's thread-local caching or 
core-local caching respectively. TCMalloc's caching should be pretty effective 
for small allocations.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-26 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#16).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-23 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 14:

(70 comments)

http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 93: RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), 
state));
> build_expr is leaked if this returns.
Good catch. Fixed here and everywhere.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

> This all works out a lot cleaner - nice!
Done


Line 124: RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i],
> I think there's a potential resource leak if the expr Init() function fails
Done


Line 166:   fn_pool_.reset(new MemPool(expr_mem_tracker()));;
> extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 181:   // TODO: codegen table sink
> I'm not sure what this means. Is it something to be fixed in this patch? Ot
I think this is IMPALA-4356.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 126:   /// Not used in DataStreamSender and PHJBuilder.
> Or NljBuilder. Maybe easier to just say that they're not used in all subcla
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS14, Line 451: inline
> Is the inline needed here?
The inline hint is for inlining into EvalConjuncts().


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 261:   /// Conjuncts and their evaluators in this node.
> Maybe mention who owns the pointers?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

PS14, Line 86: expr_evaluator->root()
> Is the plan to eventually pass around the Expr itself and avoid going via t
Really depends on the context. I imagine most cases we will use Expr directly. 
In some cases when Expr is not directly available we may still go through the 
evaluator.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 120:   FilterContext() { }
> nit: can just remove this and rely on the default constructor
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 115:   /// computing the size of a scratch row.
> Maybe mention what 'pool' is for
Done


Line 415:   Status Init(ObjectPool* pool, RuntimeState* state, int 
num_build_tuples);
> What is pool used for?
Comments added.


Line 513:   MemPool* mem_pool_;
> Mention that it's not owned?
I suppose that's implied. Comments added.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS14, Line 294:   
> tab?
Done


Line 733: vector& output_expr_evaluators)
> Can't this be a const& still? The vector isn't being modified is it?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

Line 26: #include "exprs/scalar-expr.h"
> Maybe not needed?
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS14, Line 122:  DCHECK(conjuncts_map_[entry.first].empty());
Use find() instead.


Line 147: // TODO: Move this to Prepare()
> Is this TODO in this patch, or does it depend on another change?
I am thinking of moving it when PlanNode is introduced.


Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ 
false));
> This is a little inconsistent - above on line 121 we create the RowDescript
Good point. I think heap allocated may be safer in case the expression keeps a 
reference to it.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 146:   const std::vector 
min_max_conjunct_evaluators() const {
> Shouldn't this return a const&, rather than copying the vector?
Yes, right. Looks like the existing code is broken.


Line 393:   /// the per-scanner ScannerContext..
> Add something like "These correspond to the exprs in 'filter_exprs_'"
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 362:   MemPool filter_mem_pool(expr_mem_tracker());
> This is a little subtle - could do with a comment.
Done


Line 388: for (auto& ctx: filter_ctxs) {
> Maybe use a goto and an exit block at the bottom of the function to enforce
Done


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

Line 95:   boost::scoped_ptr expr_mem_pool_;
> Put it in 'obj_pool_'?
Appears to be more consi

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-23 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#15).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-23 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 14:

(32 comments)

Made it through, finally... Overall looks way saner and easier to understand 
than the previous state of things.

http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 694:   AggFnEvaluator::ShallowClone(parent->pool_, agg_fn_pool.get(),
This should go into 'partition_pool_'. Otherwise if we have an Agg in a subplan 
these will accumulate every Reset(). E.g. there would be a bug like 
https://gerrit.cloudera.org/#/c/1141/


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 84: AggFnEvaluator::~AggFnEvaluator() {
Can we add a DCHECK() that it was closed?


PS14, Line 87: inline
probably not needed


PS14, Line 91: inline
probably not needed


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 250
Consider leaving these as inline functions in the header - AnalyticEvalNode is 
still interpreted so the function call overhead may actually affect perf there.


Line 72:   /// Convenience functions for creating evaluators for multiple 
aggregate functions.
Mention that the caller is responsible for closing the created evaluators 
regardless of success or failure. That is subtle. Or we could just have 
Create() handle that internally.

This comment may also apply to ScalarExprEvaluator.


Line 174:   /// True if this evaluator is created from a Clone() call.
ShallowClone()


PS14, Line 175: is_clone_
Maybe is_shallow_clone_ to be more precise - although there is only one type of 
cloning supported so arguably not necessary to disambiguate.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 87:   
RETURN_IF_ERROR(LibCache::instance()->GetSoFunctionPtr(fn_.hdfs_location,
Nit: I don't think the remainder of this function needs all the vertical 
whitespace - it's essentially double-spaced code now


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/agg-fn.h
File be/src/exprs/agg-fn.h:

PS14, Line 52: state
value?. "state" could be taken to mean that it initialises the evaluator or 
some such thing.


Line 81:   /// TODO: The aggregation node has custom codegen paths for a few of 
the builtins.
Consider removing the TODO - it's a bit vague and unclear what if anything we 
want to do.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 53: /// --- Expr overview
Nice overview


Line 81:   static Status CreateTree(const TExpr& texpr, ObjectPool* pool, Expr* 
root);
I think this should be protected since it's only called from ScalarExpr and 
AggFn.


Line 94:   bool is_constant() const { return is_constant_; }
I think is_constant() should maybe only be defined in the ScalarExpr 
sub-hierarchy. It doesn't really mean anything for AggFn.


Line 100:   /// Simple debug string that provides no expr subclass-specific 
information.
Comment seems stale since there's no implementation.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/hive-udf-call.h
File be/src/exprs/hive-udf-call.h:

Line 28: using namespace impala_udf;
I guess some of these 'using namespace' instances were pre-existing - still 
seems like it contradicts our coding standards.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/literal.h
File be/src/exprs/literal.h:

Line 28: using namespace impala_udf;
See other comment about using namespace in headers.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 154:   // Owner of mem_pool_ needs to call FreeAll() on it.
We don't really know what the owner wants to do. Maybe something like "Memory 
allocated by 'fn_ctx_' is still living in 'mem_pool_'. The owner of 'mem_pool_' 
is responsible for freeing it when appropriate."


Line 436:   // TODO: is there a better way to do this?
Consider removing TODO - I don't think we're likely to fix it.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 29: using namespace impala_udf;
I'm not sure if this works, but maybe if we just move this "using" into the 
impala namespace it won't pollute any other namespaces. That would still avoid 
having impala_udf:: everywhere.


PS14, Line 43: types
type


Line 122:   void* GetValue(const ScalarExpr& e, const TupleRow* row);
Maybe GetSubExprValue()? I don't feel strongly but might make the intent 
clearer.


Line 144:   Status GetError(int start_idx = 0, int end_idx = -1) const 
WARN_UNUSED_RESULT;
Arguments maybe need explanation.


Line 147:   void PrintValue(const TupleRow* row, std::string* str);
Does this evaluate 

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 14:

(14 comments)

Next batch - made it through everything except the expr/ subdirectory.

http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 164: pool_->Add(build_expr);
Maybe stick with the usual idiom of wrapping pool_->Add() around the allocation?


Line 1574:   // TODO: consider moving the following codegen logic to AggFn.
This would make sense (not in this patch though).


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 109: // TODO: Move to Prepare().
Is this to do in this patch or are you planning to do it as part of a future 
JIRA?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

Line 243:   /// Init() function inherited from DataSink. Overridden to be a 
no-op for now.
This is a bit of a red flag to me - I think we should structure things so that 
we always call Init() on the parent class. Otherwise it's hard to reason about 
the possible states of the member variables of DataSink.


Line 381:   /// List of filters to build.
Mention that there's a correspondence with 'filter_exprs_'?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

PS14, Line 95: nd
and


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

Line 49: /// TODO: Move this to the equivalent of PlanNode class for DataSink.
Is there a JIRA for this?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 486:   DCHECK(partition_expr->IsLiteral());
Can you add a comment in CatalogObjects.thrift that these are always literal 
exprs? That fact is explicit in the frontend but it wasn't obvious that the 
exprs sent from the frontend are always literals just by looking at the thrift 
structures.


Line 491: // literals Partition exprs are not used in the codegen case. 
Don't codegen
Missing .


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

Line 93:   const TRuntimeFilterDesc& filter_desc_;
Comment what owns the filter?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 105:   Status Init(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool);
Comment what the arguments are for? The names could maybe be more descriptive 
too. E.g. to disambiguate from 'obj_pool_'. E.g. 'expr_evaluator_pool' and 
'expr_mem_pool'.

I think passing in 'state' is redundant too, since it's passed into the 
constructor.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

PS14, Line 194: mem_pool
expr_mem_pool?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/udf/udf.h
File be/src/udf/udf.h:

Line 111: /// TODO: Move FRAGMENT_LOCAL states to query_state for 
multi-threading.
Do we have a JIRA trackign this?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 82:   Status Open(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool);
Similar to my comment in Sorter - it's unclear what these different arguments 
do and what the requirements are. E.g. what is allocated from 'mem_pool'.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 14:

(26 comments)

First batch of comments - made it through exec/kudu*

http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 93: RETURN_IF_ERROR(build_expr->Init(*(intermediate_row_desc_.get()), 
state));
build_expr is leaked if this returns.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

This all works out a lot cleaner - nice!


Line 124: RETURN_IF_ERROR(AggFn::Create(analytic_node.analytic_functions[i],
I think there's a potential resource leak if the expr Init() function fails in 
certain ways. E.g. if it successfully got a lib cache entry, but failed later. 
We don't add it to 'analytic_fns_' so I don't think it can be closed.


Line 166:   fn_pool_.reset(new MemPool(expr_mem_tracker()));;
extra semicolon


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 181:   // TODO: codegen table sink
I'm not sure what this means. Is it something to be fixed in this patch? 
Otherwise could be just remove the TODO and file a JIRA if it's something that 
needs to be tracked.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 126:   /// Not used in DataStreamSender and PHJBuilder.
Or NljBuilder. Maybe easier to just say that they're not used in all subclasses.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS14, Line 451: inline
Is the inline needed here?


Line 465: void ExecNode::FreeLocalAllocations() {
This helper function doesn't seem to help that much - consider inlining it.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 261:   /// Conjuncts and their evaluators in this node.
Maybe mention who owns the pointers?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

PS14, Line 86: expr_evaluator->root()
Is the plan to eventually pass around the Expr itself and avoid going via the 
evaluator, or will we continue using this pattern? I don't feel that strongly 
either way, just curious.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 120:   FilterContext() { }
nit: can just remove this and rely on the default constructor


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 115:   /// computing the size of a scratch row.
Maybe mention what 'pool' is for


Line 415:   Status Init(ObjectPool* pool, RuntimeState* state, int 
num_build_tuples);
What is pool used for?


Line 513:   MemPool* mem_pool_;
Mention that it's not owned?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS14, Line 294:   
tab?


Line 733: vector& output_expr_evaluators)
Can't this be a const& still? The vector isn't being modified is it?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

Line 26: #include "exprs/scalar-expr.h"
Maybe not needed?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 147: // TODO: Move this to Prepare()
Is this TODO in this patch, or does it depend on another change?


Line 165: new RowDescriptor(min_max_tuple_desc_, /* is_nullable */ 
false));
This is a little inconsistent - above on line 121 we create the RowDescriptor 
on the stack but here it's heap-allocated. I think the stack allocation pattern 
is probably simpler.


PS14, Line 186: conjunct_evaluators_map_[entry.first]
[] has a side-effect of creating the element - maybe use find() instead for the 
DCHECK.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 146:   const std::vector 
min_max_conjunct_evaluators() const {
Shouldn't this return a const&, rather than copying the vector?


Line 393:   /// the per-scanner ScannerContext..
Add something like "These correspond to the exprs in 'filter_exprs_'"


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 362:   MemPool filter_mem_pool(expr_mem_tracker());
This is a little subtle - could do with a comment.


Line 388: for (auto& ctx: filter_ctxs) {
Maybe use a goto and an exit block at the bottom of the function to enforce 
that the cleanup happens on both paths.


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-17 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#14).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5483/13/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS13, Line 1104: const vector& filter_ctxs
Should avoid using FilterContext for codegen as it's runtime state.


http://gerrit.cloudera.org:8080/#/c/5483/13/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

PS13, Line 366: expr_mem_pool()
not thread safe to use expr_mem_pool().


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/13/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 126:   ScalarExprEvaluator::Close(conjunct_evaluators_, state_);
Missing expr_mem_pool_->FreeAll(); here.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 156:   friend class AndPredicate;
> The goal is to hide both Init() and Get*Val() from irrelevant classes to en
Removed AggregationNode and PartitionedAggregationNode from friend class list 
in PS13.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#13).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 285:   boost::scoped_ptr expr_mem_tracker_;
> if this is per querystate, the mempool would need to be thread-safe, not su
Yes, there should be one MemPool per fragment instance. TODO removed. Keep the 
name the same as it's clear enough already.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#12).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#11).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 10:

(57 comments)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 66: output_tuple_desc_(descs.GetTupleDescriptor(output_tuple_id_)),
> initialize in .h where possible
While I appreciate the suggestion, I prefer not to add more unrelated clean-up 
to this already gigantic patch.


Line 88: desc->type() == grouping_exprs_[i]->type());
> i would intuitively think that the build row is the input row.
renamed to intermediate_row_desc_.


Line 97: 
> it feels dicey to pass a reference to an automatic variable, Init() might h
Done


Line 232:   int count = 0;
> isn't that the same as const X**?
Not really :-).

const X** is pointer to pointer to constant X.
X* const * is pointer to constant pointer to X.

The latter fits the bill of vector::data().


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 122:   /// Output expressions to convert row batches onto output values.
> It's confusing that these aren't used in all subclasses of DataSink (e.g. D
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

PS9, Line 485: 
?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS9, Line 69: 
> Does this mean that multiple scanner threads share a single MemPool? That d
Nice catch. I hit similar bugs in HDFS scanners before. Those scanners also 
have a per-scanner MemPool.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

PS9, Line 68: :vector materialize_tuple_ doesn't seem to be defined. I think we always need these
Done


Line 69: 
> See my comment about the name in sorter.h.
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h
File be/src/exec/subplan-node.h:

Line 65:   /// tree rooted at 'node' and does any initialization that is 
required as a result of
> * and do any initialization that is required as a result of setting the sub
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS9, Line 78: output_tuple_exprs_;
> output_tuple_exprs_? Usually when we have exprs materializing tuples it's n
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 96: MemPool* mem_pool, AggFnEvaluator** result) {
> move input params to front
Done


Line 161: Status AggFnEvaluator::Open(
> why not const&?
Done


Line 514: 
> single line, here and subsequent functions, where possible
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// and caches all constant input arguments.
> regarding the caching: is there already a todo to move that into the expr s
TODO added.


PS9, Line 94: o, it
> Maybe this should be called ShallowClone() to make it more explicit. Withou
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 68:   }
> scalarexpr does that in create. do the same for this class?
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 87:   for (int i = 0; i < num_children; ++i) {
> we don't use 'const ..' elsewhere for automatic variables.
Done


Line 88: *child_node_idx += 1;
> formatting
Done


Line 92:   return Status::OK();
> what is this referring to?
Stale comment. Removed.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 82: 
> this is really CreateChildren, not CreateTree.
Moved in/out params to end.

As for the naming, this function actually builds out the entire expression tree 
rooted at 'root'. CreateChildren() doesn't seem to fit. May be I misunderstood 
your comment.


Line 134:   /// Called recursively to create children expr trees for 
sub-expressions.
> misleading: the root of the created tree is not 'parent' (the root is insta
I refactored this function and CreateTree() a bit so it's easier to follow now.


Line 135:   ///
> "for subexpressions"
Done


Line 140:   ///   root: root of the new tree. Created and initialized by the 
caller.
> let's call this root_idx, makes it easier to follow
It's called child_node_idx in the new patch.


Line 146:   ///   !status.ok() if tree is inconsistent or corrupt
> while we're cleaning up, move in/out params to end
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 69: 
> initialize vars in .h where practical
Done


Line 82: (*evaluator)->fn_ctxs_ptr_ = (*evaluator)->fn_ctxs_.data();
> why do you need that (for a newly cre

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#10).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-tabl

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 66: singleton_intermediate_tuple_(nullptr),
initialize in .h where possible


Line 88:   RowDescriptor build_row_desc(intermediate_tuple_desc_, false);
i would intuitively think that the build row is the input row.


Line 97: RETURN_IF_ERROR(build_expr->Init(state, build_row_desc));
it feels dicey to pass a reference to an automatic variable, Init() might hold 
on to it (and there is no reason it shouldn't - descriptors should last for the 
lifetime of the query).

this row descriptor should be member var and created in the c'tor.


Line 232:   ScalarExprEvaluator* const* evaluators = &conjunct_evaluators_[0];
isn't that the same as const X**?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 156:   friend class AggFnEvaluator;
why are these all friends? it looks like more of the "protected" functions need 
to be public. (why is Init() protected?)


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 9:

(10 comments)

Still trying to come to grips with it all but did an initial pass over the 
query execution changes.

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 122:   /// Output expressions to convert row batches onto output values.
It's confusing that these aren't used in all subclasses of DataSink (e.g. 
DataStreamSender, JoinBuildSink). Clarify that not all sinks use these?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS9, Line 69: expr_mem_pool
Does this mean that multiple scanner threads share a single MemPool? That 
doesn't seem safe.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

PS9, Line 68: materialize_tuple_
materialize_tuple_ doesn't seem to be defined. I think we always need these 
exprs anyway, right?


Line 69:   std::vector slot_materialize_exprs_;
See my comment about the name in sorter.h.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h
File be/src/exec/subplan-node.h:

Line 65:   /// tree rooted at 'node'.
* and do any initialization that is required as a result of setting the subplan.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS9, Line 78: slot_materialize_exprs_
output_tuple_exprs_? Usually when we have exprs materializing tuples it's named 
after the tuple it's materializing.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS9, Line 94: Clone
Maybe this should be called ShallowClone() to make it more explicit. Without 
reading the function comment, I would assume it did a deep clone.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS9, Line 182: slot_materialize_exprs_
Maybe something like 'sort_tuple_exprs_'? I think the important thing is that 
there is one expr per slot in the sort tuple.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 49:   ScalarExprEvaluator::Close(key_expr_evaluators_lhs_, state);
opened_ = false?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

PS9, Line 141: key_expr_evaluators_lhs_
ordering_expr_evaluators_lhs_ to match the expr name?


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 9:

(45 comments)

first half

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 285:   boost::scoped_ptr expr_mem_tracker_;
> There will be one per PlanNode in the future. Currently, it still counts to
if this is per querystate, the mempool would need to be thread-safe, not sure 
that's worth it (and if this is for evaluators, why have it per plannode?).

also, since we now clearly distinguish exprs and their evaluators, and this is 
specifically for evaluators, let's express that in the variable name. 
expr_eval_mem_pool_?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 96: const AggFn& agg_fn, AggFnEvaluator** result) {
move input params to front


Line 161: Status AggFnEvaluator::Open(vector& evaluators, 
RuntimeState* state) {
why not const&?


Line 514:   for (int i = 0; i < evaluators.size(); ++i) {
single line, here and subsequent functions, where possible


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// and caches all constant input arguments.
regarding the caching: is there already a todo to move that into the expr setup 
(as opposed to evaluator setup)?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 68: input_expr->AssignFnCtxIdx(&fn_ctx_idx);
scalarexpr does that in create. do the same for this class?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 87:   const int num_children = nodes[*node_idx].num_children;
we don't use 'const ..' elsewhere for automatic variables.


Line 88:   const bool isAggFn = root->IsAggFn();
formatting


Line 92: // Reset the function context index for each input expression.
what is this referring to?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 81:   /// status on failure.
> 'fn_ctx_idx_ptr' removed. It doesn't take Expr** as 'root' is expected to b
should this be a private function and the two subclasses be friends (to call 
this)?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 82:   static Status CreateTree(ObjectPool* pool, const TExpr& texpr, Expr* 
root);
this is really CreateChildren, not CreateTree.

move in/out params to end


Line 134:   /// Creates an expr tree with root 'parent' via depth-first 
traversal.
misleading: the root of the created tree is not 'parent' (the root is installed 
as a new child into 'parent')


Line 135:   /// Called recursively to create expr trees of sub-expressions.
"for subexpressions"


Line 140:   ///   node_idx: index in 'nodes' of the root of the next child 
TExprNode tree
let's call this root_idx, makes it easier to follow


Line 146:   static Status CreateTreeFromThrift(ObjectPool* pool,
while we're cleaning up, move in/out params to end


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
could you flag what's new in this file that i should look at?


Line 69: inited_(false),
initialize vars in .h where practical


Line 82:   (*evaluator)->fn_ctxs_.clear();
why do you need that (for a newly created object)?


Line 86: for (FunctionContext* fn_ctx : (*evaluator)->fn_ctxs_) 
DCHECK(fn_ctx != nullptr);
do that in CreateFnCtxs?


Line 133:   if (opened_) return Status::OK();
what's the need for guarding against multiple open calls? (how would that 
happen?)


Line 138:   is_clone_? FunctionContext::THREAD_LOCAL : 
FunctionContext::FRAGMENT_LOCAL;
is this really instance_local? if so, leave todo in the right places 
(presumably at least udf.h)


Line 155: delete fn_ctxs_[i];
instead put a unique_ptr into fn_ctxs?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 72:   std::vector* evaluators) 
WARN_UNUSED_RESULT;
> Yes, ideally, that should be done in Expr::Init(). This is being tracked by
todo?


Line 111:   /// should only be called after Open() has been called on this 
expr. Returns an error if
> There is subtlety in this. "expr" is potentially a sub-expression of root_ 
but why not just call GetValue(nullptr) on the subexpr?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 55: /// rooted at a node is defind by [fn_ctx_idx_start_, fn_ctx_idx_end_).
defined


Line 65:   static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,
move in/out params to back


Line 81:  

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 62:   static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,
> There are places in which the evaluator will have shorter life span than th
+1 putting things in the RuntimeState object pool with the wrong lifetime has 
been a source of some nasty bugs. We should be really careful not to just 
blindly move things to ObjectPools. Especially if things are re-created inside 
subplans, per-partition, etc.

I can think of at least 3-4 different cases where we leaked a lot of memory as 
a result of this kind of mistake. It's hard to test for since you generally 
need a very long-running query.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-03 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#9).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-table

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 8:

(56 comments)

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 84:   std::vector grouping_exprs_;
> where are the evaluators for these?
In old-hash-table.cc.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 181:   /// ProcessChildBatch().
> rename variables ending in _ctx if they're referring to now-obsolete class 
Done


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 49: /// flag gets set.
> leave todo to outline very briefly what's going to happen with PlanNode.
Done


Line 285:   boost::scoped_ptr expr_mem_tracker_;
> is there a need to have one per exec node? how about one per fragment insta
There will be one per PlanNode in the future. Currently, it still counts 
towards the ExecNode's MemTracker. TODO added.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 54: /// AggFnEvaluator contains runtime state and implements wrapper 
functions which convert
> "runtime state"
Done


Line 55: /// the input TupleRow into AnyVal format expected by UDAF functions 
defined in AggFn.
> stick to singular or plural
Done


Line 70:   const AggFn& agg_fn, AggFnEvaluator** evaluator) 
WARN_UNUSED_RESULT;
> why not make agg_fn a const&?
Done


Line 80:   /// and caches all constant input arguments.
> what does agg_fn_ctx mean?
The 'agg_fn_ctx' part of the comment is removed.


Line 88:   /// Avoid the overhead of re-initializing an evaluator (e.g. calling 
GetConstVal()
> how big exactly is that overhead?
Depends on how number and size of the input expressions.


Line 94:   void Clone(
> do we really need this?
It's used in PAGG::Partition logic for resource isolation.


Line 117:   /// called either to drive the UDA's Update() or Merge() function, 
depending on whether
> function, depending
Done


Line 136:   /// merge phases.
> comment on the types of src and dst. also, those aren't great names, they s
Done


Line 159: 
> function comment missing
Done


Line 160:   /// Free local allocations made in UDA functions and input 
arguments' evaluators.
> why not a const&?
Done


Line 205:   inline const ColumnType& intermediate_type() const;
> long lines
Done


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn.h
File be/src/exprs/agg-fn.h:

Line 21: 
> ..
Done


Line 28: class MemPool;
> the comment about "static states"/compile-time information applies to all e
Done


Line 97:   /// of the output value. Returns error status on failure.
> can intermediate and output slodesc be null?
No. Changed to const ref instead.


Line 101:   WARN_UNUSED_RESULT;
> who owns agg_fn?
It lives in state->obj_pool(). Comments added.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/case-expr.h
File be/src/exprs/case-expr.h:

Line 38:  protected:
> this is needed?
Yes so we can call CaseExpr() constructor.


Line 41: 
> it would be nice to keep the fnctx index out of every class. is it possible
Done


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 19: #ifndef IMPALA_EXPRS_EXPR_H
> move directly above class, not sure why the class comment was up here to st
Done


Line 34: using namespace impala_udf;
> explain function context index concept here
The latest patch avoids exposing function context index outside scalar-expr-* 
so the comments will be kept in scalar-expr-* instead.


Line 34: using namespace impala_udf;
> also, it would be good to have an explanation of the general class hierarch
Done


Line 81:   /// status on failure.
> why does this not take an Expr**?
'fn_ctx_idx_ptr' removed. It doesn't take Expr** as 'root' is expected to be 
created by the caller (constructors of ScalarExpr or AggFn).


Line 113:   /// some ScalarExpr such as ScalarFnCall. NULL if it's not used.
> initialize all pointers to nullptr here, that way you can't forget it in th
Done


Line 134:   /// Creates an expr tree with root 'parent' via depth-first 
traversal.
> doesn't look like a useful function (if you see it being called you probabl
Removed.


Line 137:   ///   pool: Object pool in which Expr created from nodes are stored
> "to create expr trees for subexpressions"
Done


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 40: /// reference to the root of a ScalarExpr tree, runtime state (e.g. 
FunctionContexts)
> runtime state
Done


Line 62:   /// FunctionContexts needed during evaluation. Allocations from this 
evaluator will
> runtimestate already has an objectpool
There are places in which the evaluator will have shorter life span than the 
e

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-02 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#8).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-table

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-04-19 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 7:

(25 comments)

some more comments. i haven't looked at the exec nodes in detail yet.

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 84:   std::vector grouping_exprs_;
where are the evaluators for these?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 181:   bool PrevRowCompare(ScalarExprEvaluator* pred_ctx);
rename variables ending in _ctx if they're referring to now-obsolete class 
names.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 54: /// AggFnEvaluator contains runtime states and implements wrapper 
functions which
"runtime state"


Line 55: /// converts the input TupleRow into AnyVal format expected by UDAF 
functions defined
stick to singular or plural

apply to this class comment and all other comments.


Line 70:   const AggFn* agg_fn, AggFnEvaluator** evaluator) 
WARN_UNUSED_RESULT;
why not make agg_fn a const&?


Line 80:   /// and caches constant input arguments in 'agg_fn_ctx'.
what does agg_fn_ctx mean?

is the caching behavior relevant for the caller?


Line 88:   /// Avoid the overhead of re-initializing an evaluator (e.g. calling 
GetConstVal()
how big exactly is that overhead?


Line 94:   void Clone(
do we really need this?


Line 117:   /// called either to drive the UDA's Update() or Merge() function 
depending on whether
function, depending


Line 136:   void Finalize(Tuple* src, Tuple* dst);
comment on the types of src and dst. also, those aren't great names, they sound 
like you're referring to Add()


Line 159:   void FreeLocalAllocations();
function comment missing


Line 160:   static void FreeLocalAllocations(std::vector& 
evaluators);
why not a const&?


Line 205:   /// The interpreted path for the UDA's Update() function. It sets 
up the arguments to call
long lines


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 34: /// Please see the headers of ScalarExpr and AggFn for details.
> explain function context index concept here
also, it would be good to have an explanation of the general class hierarchy 
somewhere.

this description is also unclear on what differentiates the classes in this 
hierarchy. exprs and their subclasses contain compile-time information and the 
code to evaluate the expr (represented by the specific class). evaluators 
contain the general runtime state needed for the actual evaluation, but they 
don't need to be subclassed, because the expr-specific code sits in the expr 
subclasses.

"embodies" doesn't really mean anything, that could also be applied to the 
analyzer (java) classes.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h
File be/src/exprs/scalar-expr-evaluator.h:

Line 40: /// reference to the root of a ScalarExpr tree, runtime states (e.g. 
FunctionContexts)
runtime state

apply elsewhere


Line 62:   static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,
runtimestate already has an objectpool


Line 63:   const ScalarExpr* expr, ScalarExprEvaluator** evaluator) 
WARN_UNUSED_RESULT;
why not a const&?


Line 72:   /// also the location in which constant arguments to functions are 
computed. Does not
the location?

do you mean "this also computes constant function arguments"?

why here and not in expr::init()? it sounds like static setup.


Line 73:   /// need to be called on clones. Idempotent (this allows exprs to be 
opened multiple
this clone business is tricky, would be nice to remove it.


Line 111:   Status GetConstValue(
can't this be subsumed by GetValue(nullptr)?


Line 112:   RuntimeState* state, const ScalarExpr* expr, AnyVal** 
const_val) WARN_UNUSED_RESULT;
why pass in expr? how is that different from the expr of Create()?


Line 179:   /// and owned by this ScalarExprEvaluator.
shouldn't they go into an objectpool?


Line 208:   int output_scale_;
comment why this is here and not in expr


Line 210:   /// Call Create() instead.
reconsider whether this comment is useful


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 113:   std::vector* exprs);
could you also add warn_unused_result to all other .h files that you're 
touching?


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Ti

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-04-19 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 7:

(31 comments)

initial comments

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 49: /// flag gets set.
leave todo to outline very briefly what's going to happen with PlanNode.


Line 285:   /// Created in Prepare().
is there a need to have one per exec node? how about one per fragment instance?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/agg-fn.h
File be/src/exprs/agg-fn.h:

Line 21: /// An aggregate function generates an output over a set of tuple 
rows..
..


Line 28: /// AggFn contains static states about an aggregate function such as 
the aggregation
the comment about "static states"/compile-time information applies to all exprs 
and should move in the class comment for Expr (adapted, of course).


Line 97:   /// 'output_slot_desc' is the slot descriptor of the output value. 
Returns error
can intermediate and output slodesc be null?


Line 101:   const SlotDescriptor* output_slot_desc, const TExpr& texpr, 
AggFn** agg_fn);
who owns agg_fn?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/case-expr.h
File be/src/exprs/case-expr.h:

Line 38:   friend class ScalarExpr;
this is needed?


Line 41:   CaseExpr(const TExprNode& node, int fn_context_index);
it would be nice to keep the fnctx index out of every class. is it possible to 
have only generic logic in Expr deal with this? (could createtree set this in a 
final pass over the tree?)


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 19: /// --- Expr overview
move directly above class, not sure why the class comment was up here to start 
with


Line 34: /// Please see the headers of ScalarExpr and AggFn for details.
explain function context index concept here


Line 81:   ObjectPool* pool, const TExpr& texpr, Expr* root, int* 
fn_ctx_idx_ptr);
why does this not take an Expr**?

unclear what the significance of the idx_ptr is at this point.


Line 113:   LibCacheEntry* cache_entry_;
initialize all pointers to nullptr here, that way you can't forget it in the 
c'tor.


Line 134:   void AddChild(ScalarExpr* expr) { children_.push_back(expr); }
doesn't look like a useful function (if you see it being called you probably 
want to look at the implementation to figure out if it does anything beyond the 
obvious).


Line 137:   /// Called recursively to create expr tree of sub-expression.
"to create expr trees for subexpressions"


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 28: /// ScalarExpr implements compute functions which given a row, 
performs the computation
stick to singular or plural in this and the following sentence.

which,


Line 40: /// in FunctionContext in ScalarExprEvaluator which is the interface 
to evaluate a
..evaluator,


Line 44: /// initializes itself and frees its resources by calling 
OpenEvaluator() and
move comments about evaluator to evaluator class comment.


Line 50: /// GetCodegendComputeFn() to either generate custom IR compute 
functions using IRBuilder
..builder, which inlines ...functions, or...


Line 107:   static Status Create(ObjectPool* pool, RuntimeState* state,
why do we need this and Expr::CreateTree?


Line 155:   static void Close(std::vector& exprs);
move to Expr?


Line 223:   /// Initializes the static states of all nodes in the expr tree.
what is the non-static state in the expr tree?


Line 280:   /// evaluator and initialized by calling 
ScalarExpr::OpenEvaluator().
would be good to have that in the Expr class comment, maybe with some more 
context.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 379: const vector& thrift_output_exprs) {
unused parameter


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

Line 133:   /// per-row partition values for shuffling exchange;
also leave todo that the exprs go into a separate class.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/data-stream-test.cc
File be/src/runtime/data-stream-test.cc:

Line 198:   scoped_ptr mem_pool_;
more mem pools?


Line 207:   vector ordering_exprs_;
?


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 288:   /// TODO: Move these into the new query-wide state, indexed by 
partition id.
does this todo still apply, given that the descriptor tbl itself moves into the 
querystate?


Line 291:   std::vector partition_key_value_evaluators_;
why is there an evaluator in here, instead of the expr? this is shared by all 
instance executors.


http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/runtime/sorter.h
File be/src/runt

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-04-11 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#7).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-table

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-04-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

I will post a new patch once the draft has gotten further along. Please feel 
free to wait till then to review.

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-04-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

Is this superseded by the other review?

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 159:   virtual BooleanVal GetBooleanVal(ExprContext* context, const 
TupleRow*);
> i wrote that comment before i reacquainted myself with the approach to spec
I don't see it conflicting with the abstractions: Expr stores static data which 
can be _shared_ among fragment instances while the latter two have to be 
created per fragment instance for execution. These interfaces don't modify the 
states of an Expr.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 159:   virtual BooleanVal GetBooleanVal(ExprContext* context, const 
TupleRow*);
> Not sure I understand this comment. These need to be overridden by subclass
i wrote that comment before i reacquainted myself with the approach to 
specialization in expr/exprctx/fnctx (all specialization is in expr).

this conflicts with the abstractions we had in mind (expr for static data, 
exprevaluator/fnctx for execution). i think we should have another, hopefully 
shorter, discussion about what we ultimately want these classes to look like. 
we don't want to make the changes are part of this patch, but it's good to know 
where we're headed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 159:   virtual BooleanVal GetBooleanVal(ExprContext* context, const 
TupleRow*);
> these don't make sense here anymore, execution-related functions should be 
Not sure I understand this comment. These need to be overridden by subclasses 
of Exprs, no ? Not sure if it makes sense to put both 
ScalarFnCall::GetBooleanVal() or Literal::GetBooleanVal() in ExprCtx. Can you 
please elaborate ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/case-expr.cc
File be/src/exprs/case-expr.cc:

Line 76:   fn_ctx->Free(reinterpret_cast(case_state));
is this really case-specific? doesn't look like it. (is there a need for an 
expr-specific 'close'? or do we simply free thread-local state?)


http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr-context.cc
File be/src/exprs/expr-context.cc:

Line 73:   RETURN_IF_ERROR(root_->Init(state, row_desc));
init should happen independently of the exprctxs (and certainly not once per 
exprctx)


http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr-context.h
File be/src/exprs/expr-context.h:

Line 41: /// An ExprContext contains thread private states for the evaluation 
of an Expr Tree.
'thread private' sounds mysterious/vague. isn't this 'the state necessary for 
the evaluation of an expr tree' (ie, all of it)?


Line 52:   /// Allocates all FunctionContexts for this ExprContext. Also 
initialize the expr tree
what does 'expr tree' refer to? the tree of expr objects? (if so, why would 
that be initialized here?)


Line 58:   /// Initializes the FunctionContexts in the ExprContext on all nodes 
in the Expr tree.
is there a need to separate prepare() and open()?


Line 66:   /// originals but have their own MemPool and thread-local state. 
Clone() should be used
this seems convoluted, and they shouldn't all need a separate mempool. unless 
you think this cloning stuff is a good idea, leave todo in class comment.


Line 77:   /// call is a no-op. The new ExprContexts are created in 
state->obj_pool().
do we really need this? it's bizarre.


Line 124:   void EvaluateWithoutRow(TColumnValue* col_val);
awkward name. getconstantvalue?


Line 200:   /// Pool backing fn_contexts_. Counts against the runtime state's 
UDF mem tracker.
is there any need to have a separate pool for each exprctx? if not, leave todo 
to remove.

also leave todos for other specific clean-up items.


http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 112: Expr::~Expr() {
> I think we've been trying to move away from destructors releasing resources
related note: we should figure out what differentiates prepare from open and 
whether we need both (and if the counterpart should be close or 
releaseresources)


Line 119: void Expr::CloseContext(RuntimeState* state, ExprContext* context,
this shouldn't be here


Line 131:   return Status("Expression tree only partially reconstructed. 
Not all thrift " \
why does this make sense?


http://gerrit.cloudera.org:8080/#/c/5483/3/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 41: /// The Expr superclass defines a virtual Get*Val() compute function 
for each possible
a lot of this is outdated.


http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 60: /// Only short-circuited operators (e.g. &&, ||) and other special 
functions like
duplicated paragraph


Line 79: /// Expr users should use the static Get*Val() wrapper functions to 
evaluate exprs,
this still reflects the confusion between expr and expctx. please make a 
complete pass and clean this up. rewrite where needed, as opposed to just 
adding new text. this should read as if it were written from scratch, not 
pieced together over the years.


Line 84: /// --- Relationship with ExprContext and FunctionContext
this makes sense to do at the very top: explain what the class's abstraction 
is, and how it relates to others.


Line 87: /// a query is represented as a tree of Expr whose states are static 
after initialization
whose states are static: hard to follow, unless i already know what it means.

exprs represent static information, such as types, etc.


Line 88: /// by their Prepare() functions.
init


Line 90: /// ExprContext contains thread private states (e.g. evaluation 
result, FunctionContext).
"thread private states": exprctx encapsulates the execution state, which is why 
it can't be shared between executing threads.


Line 93: /// via the field 'root_'.
do you gain anything here by referencing private members vars?


Line 100: /// Expressions in exec nodes are expected to be initialized once per 
fragment and the
"exprs in ..." (clearer to reference the class name directly)

same for exec nodes, etc.


Line 128:   class BasicBlock;
remove indentation, like below


Line 159:   virtual BooleanVal GetBooleanVal(ExprContext* context, const 
TupleRow*);
these don't make sense here anymore, execution-related functions should be in 
exprctx


Line 171:   inline void AddChild(Expr* expr) { children_.push_back(expr); }
why do these need to be constructed manually, as opposed to from a 
texpr/texprnode?


Line 180:   inline int fn_context_index() const { return fn_context_ind

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

(1 comment)

One final issue. I'm assuming that we're going to rename 
OpenContext()/CloseContext() as part of the larger rename of ExprContext

http://gerrit.cloudera.org:8080/#/c/5483/6/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 112: Expr::~Expr() {
I think we've been trying to move away from destructors releasing resources. Do 
you think it's reasonable to add an Expr::Close() function for this logic?


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-20 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#6).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change cleans up some entangled logic between Expr and
ExprContext's initialization. In general, we store static
states of an expression in Expr and thread-private / evaluation
related states in ExprContext.

This change cleans up the followings:

1. FunctionContext should be managed by ExprContext. They reside
   in a vector inside ExprContext. When creating the expression tree,
   each sub-expression requiring a FunctionContext will be assigned
   an index into the vector. The index is stored in the Expr object.
   ExprContext::Prepare() will allocate and prepare the FunctionContext
   for all sub-expressions in the expression tree.

2. Move the field 'output_scale_' to ExprContext. This field is needed
   to format the output in the table writer correctly for the case of
   RoundUpTo(). However, this field can only be derived much later in
   the initialization so in the spirit of not modifying the Expr after
   it has been initialized, let's move this field out to ExprContext
   until IMPALA-4743 and IMPALA-4720 are fixed.

3. Rename Expr::Prepare(), Expr::Open() and ExprClose() to
   Expr::Init(), Expr::OpenContext() and Expr::CloseContext()
   to better describe their purposes. Also move some functions
   which operate solely on ExprContext to ExprContext class.

4. Expr::CreateExprTree() and friends return Expr instead of
   ExprContext.

This clean up is a step towards sharing expression states among plan
fragment instances.

This change also fixes a bug in Expr::GetFnContextError().
The old code only returned the error in the top level
FunctionContext, ignoring the error set by its descendants.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/old-hash-table-test.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/sort-exec-exprs.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-context.h
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/row-batch.h
M be/src/service/fe-support.cc
M be/src/service/impalad-main.cc
M common/thrift/ImpalaInternalService.thrift
50 files changed, 726 insertions(+), 618 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/5483/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/5483/4//COMMIT_MSG
Commit Message:

PS4, Line 31: soly
> solely
Done


PS4, Line 39: ignorning
> ignoring
Done


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exec/hash-table-test.cc
File be/src/exec/hash-table-test.cc:

Line 74: ASSERT_OK(ExprContext::Prepare(build_expr_ctxs_, NULL, desc, 
&tracker_))
> Missing semi-colon. Should have compiled the tests too.
Done


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.cc
File be/src/exprs/expr-context.cc:

PS4, Line 129: RegisterFnContexts
> CreateFnContexts()? I don't think there's any real registration going on he
Done


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.h
File be/src/exprs/expr-context.h:

Line 86:   /// Use this interface except for creating ExprContext on a stack.
> It looks like we only do this in two places in ExprTest. Why not change tho
Done


Line 102:   /// Convenience function for closing multiple expr trees.
> Are we closing the expr trees or the contexts? Same for the comments above 
Done


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 412: string Expr::DebugString(const vector& ctxs) {
> This should probably be moved to ExprContext too, right?
Done


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS4, Line 100:  its
 : /// states
> What is "its states"
expressions' static states


Line 110: /// TODO:
> Remove TODO's while we're here? These seem too non-specific to be useful.
Done


PS4, Line 354: on the expr tree.
> It's really opening the ExprContext and FunctionContexts instead of the exp
It's about initializing ExprContext and FunctionContexts for all nodes in the 
ExprTree. Comment rephrased.


Line 358:   /// Subclasses overriding this function should call Expr::Close().
> This really closes the ExprContext instead of the Expr.
Renamed to Expr::CloseContext().


PS4, Line 387: FunctionContext
> FunctionContexts
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-20 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#5).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change cleans up some entangled logic between Expr and
ExprContext's initialization. In general, we stored static
states of an expression in Expr and thread-private / evaluation
related states in ExprContext.

This change cleans up the followings:

1. FunctionContext should be managed by ExprContext. They reside
   in a vector inside ExprContext. When creating the expression tree,
   each sub-expression requiring a FunctionContext will be assigned
   an index into the vector. The index is stoed in the Expr object.
   ExprContext::Prepare() will allocate and prepare the FunctionContext
   for all sub-expressions in the expression tree.

2. Move the field 'output_scale_' to ExprContext. This field is needed
   to format the output in the table writer correctly for the case of
   RoundUpTo(). However, this field can only be derived much later in
   the initialization so in the spirit of not modifying the Expr after
   it has been initialized, let's move this field out to ExprContext
   until IMPALA-4743 and IMPALA-4720 are fixed.

3. Rename Expr::Prepare(), Expr::Open() and Expr::Close() to
   ExprContext class instead as they really operate solely on
   ExprContext but not Expr.

4. Expr::CreateExprTree() and friends return Expr instead of
   ExprContext.

This clean up is a step towards sharing expression states among plan
fragment instances.

This change also fixes a bug in Expr::GetFnContextError().
The old code only returned the error in the top level
FunctionContext, ignoring the error set by its descendants.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/old-hash-table-test.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/sort-exec-exprs.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-context.h
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/row-batch.h
M be/src/service/fe-support.cc
M be/src/service/impalad-main.cc
M common/thrift/ImpalaInternalService.thrift
50 files changed, 726 insertions(+), 617 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/5483/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exec/hash-table-test.cc
File be/src/exec/hash-table-test.cc:

Line 74: ASSERT_OK(ExprContext::Prepare(build_expr_ctxs_, NULL, desc, 
&tracker_))
Missing semi-colon. Should have compiled the tests too.


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 376: Status Expr::Prepare(RuntimeState* state, const RowDescriptor& 
row_desc) {
> It's kind of weird that the scope of what is initialised is different for E
Good point. I like OpenContext() and CloseContext().


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 4:

(12 comments)

I had some high level comments. Still trying to understand the flow of 
execution with the various Open()/Close()/Clone() etc calls.

http://gerrit.cloudera.org:8080/#/c/5483/4//COMMIT_MSG
Commit Message:

PS4, Line 31: soly
solely


PS4, Line 39: ignorning
ignoring


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.cc
File be/src/exprs/expr-context.cc:

PS4, Line 129: RegisterFnContexts
CreateFnContexts()? I don't think there's any real registration going on here.


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr-context.h
File be/src/exprs/expr-context.h:

Line 86:   /// Use this interface except for creating ExprContext on a stack.
It looks like we only do this in two places in ExprTest. Why not change those 
to use Create() and hide the constructor?


Line 102:   /// Convenience function for closing multiple expr trees.
Are we closing the expr trees or the contexts? Same for the comments above too.


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 376: Status Expr::Prepare(RuntimeState* state, const RowDescriptor& 
row_desc) {
It's kind of weird that the scope of what is initialised is different for 
Expr::Prepare() vs Expr::Open() and Expr::Close().

Expr::Prepare() just initialises the Expr and doesn't do anything with 
ExprContext, but Expr::Open() and Expr::Close() initialize the ExprContext. We 
need something like this because there's Expr-subclass-specific logic in Open() 
and Close(). Maybe Open() and Close() should be OpenContext() and 
CloseContext() instead so the distinction is clearer?


Line 412: string Expr::DebugString(const vector& ctxs) {
This should probably be moved to ExprContext too, right?


http://gerrit.cloudera.org:8080/#/c/5483/4/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS4, Line 100:  its
 : /// states
What is "its states"


Line 110: /// TODO:
Remove TODO's while we're here? These seem too non-specific to be useful.


PS4, Line 354: on the expr tree.
It's really opening the ExprContext and FunctionContexts instead of the expr 
tree, right?


Line 358:   /// Subclasses overriding this function should call Expr::Close().
This really closes the ExprContext instead of the Expr.

This makes sense since we may need Expr-subclass-specific logic, but the 
comment should be clearer.


PS4, Line 387: FunctionContext
FunctionContexts


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-01-19 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#4).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change cleans up some entangled logic between Expr and
ExprContext's initialization. In general, we stored static
states of an expression in Expr and thread-private / evaluation
related states in ExprContext.

This change cleans up the followings:

1. FunctionContext should be managed by ExprContext. They reside
   in a vector inside ExprContext. When creating the expression tree,
   each sub-expression requiring a FunctionContext will be assigned
   an index into the vector. The index is stoed in the Expr object.
   ExprContext::Prepare() will allocate and prepare the FunctionContext
   for all sub-expressions in the expression tree.

2. Move the field 'output_scale_' to ExprContext. This field is needed
   to format the output in the table writer correctly for the case of
   RoundUpTo(). However, this field can only be derived much later in
   the initialization so in the spirit of not modifying the Expr after
   it has been initialized, let's move this field out to ExprContext
   until IMPALA-4743 and IMPALA-4720 are fixed.

3. Rename Expr::Prepare(), Expr::Open() and Expr::Close() to
   ExprContext class instead as they really operate soly on
   ExprContext but not Expr.

This clean up is a step towards sharing expression states among plan
fragment instances.

This change also fixes a bug in Expr::GetFnContextError().
The old code only returned the error in the top level
FunctionContext, ignorning the error set by its descendants.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/old-hash-table-test.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/sort-exec-exprs.cc
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-context.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/row-batch.h
M be/src/service/fe-support.cc
M common/thrift/ImpalaInternalService.thrift
46 files changed, 650 insertions(+), 528 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/5483/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong