[Impala-ASF-CR] IMPALA-5031: Remove undefined behavior "reference binding to null"

2017-05-27 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new patch set (#3).

Change subject: IMPALA-5031: Remove undefined behavior "reference binding to 
null"
..

IMPALA-5031: Remove undefined behavior "reference binding to null"

When p has type T* and p is nullptr, then T x = p[0] has undefined
behavior (obviously). Less obviously, T& x = p[0] also has undefined
behavior -- references cannot be null. That undefined behavior can be
caused by expressions like &v[0] when v is a vector of size 0. The
issue is that the expression is parsed as &(v[0]), aka
&(v.operator[](0)). The return type of the operator[] method is T&,
and when v is empty the return value is a null reference.

This patch was created by running the following command, then fixing
the resulting compile errors:

find be/src -type f -execdir \
  sed -i 's/&\([A-Za-z0-9_]\+\)\[0\]/\1.data()/g' \{\} \;

Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2
---
M be/src/exec/aggregation-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/delimited-text-parser-test.cc
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/old-hash-table-ir.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/select-node.cc
M be/src/exec/unnest-node.cc
M be/src/experiments/tuple-splitter-test.cc
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/string-functions-ir.cc
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-util-test.cc
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/parallel-executor-test.cc
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/udf/udf-test-harness.h
M be/src/util/bitmap.h
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M be/src/util/dict-encoding.h
M be/src/util/internal-queue-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/streaming-sampler.h
M be/src/util/tuple-row-compare.h
M be/src/util/uid-util.h
M be/src/util/webserver.cc
47 files changed, 111 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7008/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5031: Remove undefined behavior "reference binding to null"

2017-05-27 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new patch set (#2).

Change subject: IMPALA-5031: Remove undefined behavior "reference binding to 
null"
..

IMPALA-5031: Remove undefined behavior "reference binding to null"

When p has type T* and p is nullptr, then T x = p[0] has undefined
behavior (obviously). Less obviously, T& x = p[0] also has undefined
behavior -- references cannot be null. That undefined behavior can be
caused by expressions like &v[0] when v is a vector of size 0. The
issue is that the expression is parsed as &(v[0]), aka
&(v.operator[](0)). The return type of the operator[] method is T&,
and when v is empty the return value is a null reference.

This patch was created by running the following command, then fixing
the resulting compile errors:

find be/src -type f -execdir \
  sed -i 's/&\([A-Za-z0-9_]\+\)\[0\]/\1.data()/g' \{\} \;

Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2
---
M be/src/exec/aggregation-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/delimited-text-parser-test.cc
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/old-hash-table-ir.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/select-node.cc
M be/src/exec/unnest-node.cc
M be/src/experiments/tuple-splitter-test.cc
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/string-functions-ir.cc
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-util-test.cc
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/parallel-executor-test.cc
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/udf/udf-test-harness.h
M be/src/util/bitmap.h
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M be/src/util/dict-encoding.h
M be/src/util/internal-queue-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/streaming-sampler.h
M be/src/util/tuple-row-compare.h
M be/src/util/uid-util.h
M be/src/util/webserver.cc
47 files changed, 111 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7008/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4164: Avoid overly aggressive inlining in LLVM IR

2017-05-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4164: Avoid overly aggressive inlining in LLVM IR
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6941/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 645:   // Check that there are no calls to 
FunctionContextImpl::GetConstFnAttr(). These should all
> See my other comment about whether InlineHint is required.
Done


http://gerrit.cloudera.org:8080/#/c/6941/7/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 700:   // Set the cross-compiled functions' "target-cpu" and 
"target-features" to
> It isn't limited to cross-compiled functions is it?
Done


Line 704:   // If there is no "noinline" attribute, add an inline hint to the 
function.
> Instead of adding the hint everywhere it seems better to adjust the inlinin
In fact, it may be simpler to just use the default value (225) with -O2 
optimization level. The new patch removes all inline hint and relies on the 
inlining pass to do the right thing. I ran some standard perf benchmarks with 
PS8 in the 16-node cluster and there was no regression.


-- 
To view, visit http://gerrit.cloudera.org:8080/6941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4164: Avoid overly aggressive inlining in LLVM IR

2017-05-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#8).

Change subject: IMPALA-4164: Avoid overly aggressive inlining in LLVM IR
..

IMPALA-4164: Avoid overly aggressive inlining in LLVM IR

When generating IR functions during codegen, we used to
always tag the functions with the "AlwaysInline" attribute.
That potentially leads to excessive inlining, causing very
long optimization / compilation time with marginal performance
benefit at runtime. One of the reasons for doing it was that
the "target-cpu" and "target-features" attributes were
missing in the generated IR functions so the LLVM inliner
considers them incompatible with the cross-compiled functions.
As a result, the inliner will not inline the generated IR
functions into cross-compiled functions and vice versa unless
the "AlwaysInline" attributes exist.

This change fixes the problem above by setting the "target-cpu"
and "target-features" attributes of all IR functions to match
that of of the host's CPUs so both generated IR functions and
cross-compiled functions will have the same values for those
attributes. With these attributes set, we now rely on the
inliner of LLVM to determine whether a function is worth being
inlined. With this change, the codegen time of a query with very
long predicate went from 15s to 4s and the overall runtime went
from 19s to 8s.

Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
A testdata/workloads/targeted-perf/queries/primitive_long_predicate.test
3 files changed, 211 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6941/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Add a script to test performance on a developer machine

2017-05-27 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a script to test performance on a developer machine
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6818/4/bin/single_node_perf_run.py
File bin/single_node_perf_run.py:

Line 71:   """Loads a database with a particular scale factor."""
> Given that you os.chdir(IMPALA_HOME) in main(), can it be removed elsewhere
Done


PS4, Line 110: 
> get_git_hash_for_name("HEAD") ?
Done


Line 241:   parser.add_option("--workloads", default="targeted-perf",
> Nit: I usually like to refactor the boilerplate option-parsing stuff to ano
Done


PS4, Line 254: help="load databases for the chosen 
workloads")
 :   parser.add_option("--start_minicluster", action="store_true",
 : help="start a new Hadoop minicluster")
 :   parser.add_option("--ninja", action="store_true",
 : help = "use ninja, rather than Make, as
> It's confusing to me to have --no options that default to True that are sto
I couldn't possible refuse to eliminate the "no"s! :-D


PS4, Line 262: 
 : When one hash is given, measures the performance on the 
specified workloads.
 : When two hashes are given, compares their performance. 
Output is in
 : $IMPALA_HOME/perf_results/latest.
 : 
 : WARNING: This script will run git checkout. You should not 
touch the tree
 : while the script is running. You should start the script 
from a clean git
 : tree.
 : 
> Nit: Consider a triple-quoted string and using textwrap.dedent().
Done


Line 274:   options, args = parser.parse_args()
> Seems confusing that 1 git hash is allowed when the docstrings and usage al
Done


PS4, Line 296: curren
> 2 spaces.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a script to test performance on a developer machine

2017-05-27 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new patch set (#5).

Change subject: Add a script to test performance on a developer machine
..

Add a script to test performance on a developer machine

This is a migration from an old and broken script from another
repository. Example use:

bin/single_node_perf_run.py --ninja --workloads targeted-perf \
  --load --scale 4 --iterations 20 --num_impalads 3 \
  --start_minicluster --query_names PERF_AGG-Q3 \
  $(git rev-parse HEAD~1) $(git rev-parse HEAD)

The script can load data, run benchmarks, and compare the statistics
of those runs for significant differences in performance. It glues
together buildall.sh, bin/load-data.py, bin/run-workload.py, and
tests/benchmark/report_benchmark_results.py.

Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
---
M bin/load-data.py
M bin/run-workload.py
A bin/single_node_perf_run.py
M testdata/bin/generate-schema-statements.py
M testdata/datasets/tpcds/preload
M testdata/datasets/tpch/preload
M tests/common/test_dimensions.py
M tests/performance/query_executor.py
M tests/performance/scheduler.py
M tests/performance/workload_runner.py
10 files changed, 385 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6818/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5315 Cast to timestamp fails for YYYY-M-D format

2017-05-27 Thread Vincent Tran (Code Review)
Vincent Tran has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7009

Change subject: IMPALA-5315 Cast to timestamp fails for -M-D format
..

IMPALA-5315 Cast to timestamp fails for -M-D format

This change allows casting of a string in 'lazy' date
format to timestamp. The supported lazy date formats are:
  -M-D
  -MM-D
  -M-DD

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
---
M be/src/exprs/expr-test.cc
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
3 files changed, 40 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran