[Impala-ASF-CR] IMPALA-5031: Remove undefined behavior "reference binding to null"
Jim Apple has uploaded a new patch set (#3). Change subject: IMPALA-5031: Remove undefined behavior "reference binding to null" .. IMPALA-5031: Remove undefined behavior "reference binding to null" When p has type T* and p is nullptr, then T x = p[0] has undefined behavior (obviously). Less obviously, T& x = p[0] also has undefined behavior -- references cannot be null. That undefined behavior can be caused by expressions like &v[0] when v is a vector of size 0. The issue is that the expression is parsed as &(v[0]), aka &(v.operator[](0)). The return type of the operator[] method is T&, and when v is empty the return value is a null reference. This patch was created by running the following command, then fixing the resulting compile errors: find be/src -type f -execdir \ sed -i 's/&\([A-Za-z0-9_]\+\)\[0\]/\1.data()/g' \{\} \; Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2 --- M be/src/exec/aggregation-node.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/delimited-text-parser-test.cc M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scanner.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/old-hash-table-ir.cc M be/src/exec/old-hash-table.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/select-node.cc M be/src/exec/unnest-node.cc M be/src/experiments/tuple-splitter-test.cc M be/src/exprs/expr-context.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/string-functions-ir.cc M be/src/rpc/authentication.cc M be/src/rpc/thrift-util-test.cc M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/parallel-executor-test.cc M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/udf/udf-test-harness.h M be/src/util/bitmap.h M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M be/src/util/dict-encoding.h M be/src/util/internal-queue-test.cc M be/src/util/runtime-profile.cc M be/src/util/streaming-sampler.h M be/src/util/tuple-row-compare.h M be/src/util/uid-util.h M be/src/util/webserver.cc 47 files changed, 111 insertions(+), 109 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7008/3 -- To view, visit http://gerrit.cloudera.org:8080/7008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5031: Remove undefined behavior "reference binding to null"
Jim Apple has uploaded a new patch set (#2). Change subject: IMPALA-5031: Remove undefined behavior "reference binding to null" .. IMPALA-5031: Remove undefined behavior "reference binding to null" When p has type T* and p is nullptr, then T x = p[0] has undefined behavior (obviously). Less obviously, T& x = p[0] also has undefined behavior -- references cannot be null. That undefined behavior can be caused by expressions like &v[0] when v is a vector of size 0. The issue is that the expression is parsed as &(v[0]), aka &(v.operator[](0)). The return type of the operator[] method is T&, and when v is empty the return value is a null reference. This patch was created by running the following command, then fixing the resulting compile errors: find be/src -type f -execdir \ sed -i 's/&\([A-Za-z0-9_]\+\)\[0\]/\1.data()/g' \{\} \; Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2 --- M be/src/exec/aggregation-node.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/delimited-text-parser-test.cc M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scanner.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/old-hash-table-ir.cc M be/src/exec/old-hash-table.cc M be/src/exec/parquet-column-stats.inline.h M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/select-node.cc M be/src/exec/unnest-node.cc M be/src/experiments/tuple-splitter-test.cc M be/src/exprs/expr-context.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/string-functions-ir.cc M be/src/rpc/authentication.cc M be/src/rpc/thrift-util-test.cc M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/parallel-executor-test.cc M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/udf/udf-test-harness.h M be/src/util/bitmap.h M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M be/src/util/dict-encoding.h M be/src/util/internal-queue-test.cc M be/src/util/runtime-profile.cc M be/src/util/streaming-sampler.h M be/src/util/tuple-row-compare.h M be/src/util/uid-util.h M be/src/util/webserver.cc 47 files changed, 111 insertions(+), 109 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7008/2 -- To view, visit http://gerrit.cloudera.org:8080/7008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifef5c3de686318bff4eac7dfad6e1639ddadb8f2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4164: Avoid overly aggressive inlining in LLVM IR
Michael Ho has posted comments on this change. Change subject: IMPALA-4164: Avoid overly aggressive inlining in LLVM IR .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/6941/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 645: // Check that there are no calls to FunctionContextImpl::GetConstFnAttr(). These should all > See my other comment about whether InlineHint is required. Done http://gerrit.cloudera.org:8080/#/c/6941/7/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 700: // Set the cross-compiled functions' "target-cpu" and "target-features" to > It isn't limited to cross-compiled functions is it? Done Line 704: // If there is no "noinline" attribute, add an inline hint to the function. > Instead of adding the hint everywhere it seems better to adjust the inlinin In fact, it may be simpler to just use the default value (225) with -O2 optimization level. The new patch removes all inline hint and relies on the inlining pass to do the right thing. I ran some standard perf benchmarks with PS8 in the 16-node cluster and there was no regression. -- To view, visit http://gerrit.cloudera.org:8080/6941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4164: Avoid overly aggressive inlining in LLVM IR
Michael Ho has uploaded a new patch set (#8). Change subject: IMPALA-4164: Avoid overly aggressive inlining in LLVM IR .. IMPALA-4164: Avoid overly aggressive inlining in LLVM IR When generating IR functions during codegen, we used to always tag the functions with the "AlwaysInline" attribute. That potentially leads to excessive inlining, causing very long optimization / compilation time with marginal performance benefit at runtime. One of the reasons for doing it was that the "target-cpu" and "target-features" attributes were missing in the generated IR functions so the LLVM inliner considers them incompatible with the cross-compiled functions. As a result, the inliner will not inline the generated IR functions into cross-compiled functions and vice versa unless the "AlwaysInline" attributes exist. This change fixes the problem above by setting the "target-cpu" and "target-features" attributes of all IR functions to match that of of the host's CPUs so both generated IR functions and cross-compiled functions will have the same values for those attributes. With these attributes set, we now rely on the inliner of LLVM to determine whether a function is worth being inlined. With this change, the codegen time of a query with very long predicate went from 15s to 4s and the overall runtime went from 19s to 8s. Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h A testdata/workloads/targeted-perf/queries/primitive_long_predicate.test 3 files changed, 211 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6941/8 -- To view, visit http://gerrit.cloudera.org:8080/6941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add a script to test performance on a developer machine
Jim Apple has posted comments on this change. Change subject: Add a script to test performance on a developer machine .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/6818/4/bin/single_node_perf_run.py File bin/single_node_perf_run.py: Line 71: """Loads a database with a particular scale factor.""" > Given that you os.chdir(IMPALA_HOME) in main(), can it be removed elsewhere Done PS4, Line 110: > get_git_hash_for_name("HEAD") ? Done Line 241: parser.add_option("--workloads", default="targeted-perf", > Nit: I usually like to refactor the boilerplate option-parsing stuff to ano Done PS4, Line 254: help="load databases for the chosen workloads") : parser.add_option("--start_minicluster", action="store_true", : help="start a new Hadoop minicluster") : parser.add_option("--ninja", action="store_true", : help = "use ninja, rather than Make, as > It's confusing to me to have --no options that default to True that are sto I couldn't possible refuse to eliminate the "no"s! :-D PS4, Line 262: : When one hash is given, measures the performance on the specified workloads. : When two hashes are given, compares their performance. Output is in : $IMPALA_HOME/perf_results/latest. : : WARNING: This script will run git checkout. You should not touch the tree : while the script is running. You should start the script from a clean git : tree. : > Nit: Consider a triple-quoted string and using textwrap.dedent(). Done Line 274: options, args = parser.parse_args() > Seems confusing that 1 git hash is allowed when the docstrings and usage al Done PS4, Line 296: curren > 2 spaces. Done -- To view, visit http://gerrit.cloudera.org:8080/6818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a script to test performance on a developer machine
Jim Apple has uploaded a new patch set (#5). Change subject: Add a script to test performance on a developer machine .. Add a script to test performance on a developer machine This is a migration from an old and broken script from another repository. Example use: bin/single_node_perf_run.py --ninja --workloads targeted-perf \ --load --scale 4 --iterations 20 --num_impalads 3 \ --start_minicluster --query_names PERF_AGG-Q3 \ $(git rev-parse HEAD~1) $(git rev-parse HEAD) The script can load data, run benchmarks, and compare the statistics of those runs for significant differences in performance. It glues together buildall.sh, bin/load-data.py, bin/run-workload.py, and tests/benchmark/report_benchmark_results.py. Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 --- M bin/load-data.py M bin/run-workload.py A bin/single_node_perf_run.py M testdata/bin/generate-schema-statements.py M testdata/datasets/tpcds/preload M testdata/datasets/tpch/preload M tests/common/test_dimensions.py M tests/performance/query_executor.py M tests/performance/scheduler.py M tests/performance/workload_runner.py 10 files changed, 385 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6818/5 -- To view, visit http://gerrit.cloudera.org:8080/6818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5315 Cast to timestamp fails for YYYY-M-D format
Vincent Tran has uploaded a new change for review. http://gerrit.cloudera.org:8080/7009 Change subject: IMPALA-5315 Cast to timestamp fails for -M-D format .. IMPALA-5315 Cast to timestamp fails for -M-D format This change allows casting of a string in 'lazy' date format to timestamp. The supported lazy date formats are: -M-D -MM-D -M-DD Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f --- M be/src/exprs/expr-test.cc M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h 3 files changed, 40 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7009/1 -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran