[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README File testdata/bad_parquet_data/README: PS6, Line 16: out-of-range-timestamp.parq The parquet file is no longer in this folder, so I will remove this comment. -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 21: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded > I'm assuming you are measuring response time. Since there is overall more w Right, but I'm measuring the absolute difference in response time, not the relative difference. So it doesn't totally matter what the rest of the query is doing, as long as it's stable (I agree that variance can drown out the signal). Thankfully all the runtimes I measured were remarkably stable. However, your next point is a very good one. Line 23: * No performance difference measured by introduction of extra > I'm assuming you compared response times. With multi-threaded scans the los Doh, of course you're right, and I wasn't properly measuring the overhead, because of parallelism. With a single thread, decoding 1B decimals is about 1 second slower, which is about 1ns per decode. I've updated the commit message (still think that's not a perf difference worth chasing). http://gerrit.cloudera.org:8080/#/c/5115/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: Line 203: /// assumed to be BYTE_ARRAY, otherwise it is assumed to be FIXED_LEN_BYTE_ARRAY. > Document return value? Done PS1, Line 328: fixed_len_size > This name seems inaccurate since it can be variable. Maybe 'encoded_size'? Hm, I think it's accurate - either it's set (> 0) and fixed, or it's not set (-1). 'encoded_size' wouldn't work well later in the method (because it's set to the size of the buffer, not the total encoded size). I could of course have another variable to manage that, but this seems ok to me. PS1, Line 338: fixed_len_size > What if fixed_len_size is negative? Done. http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: Line 150: # Decimal Parquet encodings written by Java Parquet library. > If you're going to create the table as part of the test, might as well copy Done. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has uploaded a new patch set (#2). Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner * Extend metadata checks to allow more than one possible physical type for a given logical type. * Change decimal decoding to handle non-fixed-length format in same path as fixed-length encoding. Testing: * Query test that decodes dictionary-encoded decimals using binary encoding. Perf: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded decimal(12,2) values using FIXED_BYTE_ARRAY physical type encoding. * The overhead of decoding with the extra branch was measured at 1s; i.e. the per-decode overhead is 1ns. Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc A testdata/data/byte_array_decimal_dict_encoded.parquet A testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test M tests/query_test/test_scanners.py 6 files changed, 119 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5115/2 -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.
Marcel Kornacker has posted comments on this change. Change subject: PREVIEW: IMPALA-1788: Fold constant expressions during plan generation. .. Patch Set 1: (20 comments) http://gerrit.cloudera.org:8080/#/c/5109/1/be/src/exprs/expr-context.cc File be/src/exprs/expr-context.cc: Line 155: void* value = GetValue(NULL); how does this handle/bail out of cases where exprctx is accidentally not a constant expr? http://gerrit.cloudera.org:8080/#/c/5109/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1541: // The rewrite rule for extracting conjuncts simplifies these to NULL. rather than special-casing it here, it would make more sense to special-case GetValue() so you don't need to compare a type. http://gerrit.cloudera.org:8080/#/c/5109/1/be/src/exprs/literal.h File be/src/exprs/literal.h: Line 26: #include "runtime/string-value.h" duplicated http://gerrit.cloudera.org:8080/#/c/5109/1/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: Line 41: TIMESTAMP_LITERAL please group all the literals together, so we can easily see what's still missing. http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 242: fnName = path.get(path.size() - 1); why is this necessary? http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java File fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java: Line 204: if (Double.isNaN(val.double_val) || Double.isInfinite(val.double_val)) return null; nice catch! long line Line 226: for (byte b: bytes) { single line Line 233: result = new StringLiteral(strVal, constExpr.getType(), false); did you add a new test that blows up with the previous default unescaping? also, this retains the type of the original expr, which seems like the right thing to do (and what we previously did was the wrong thing to do). does this warrant another test? http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java File fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java: Line 34: * produce, so it better to defer to a single source of truth (the BE implementation). it -> it's http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 164: if (analyzer.getQueryOptions().enable_expr_rewrites && rewriter != null) { pass in null for the rewriter if it's disabled, so you don't have to check that as well. move the null check into rewriteExprs() so you don't have to check that either. http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: Line 523: // constant into nested-loop joins. refer to cross products, to avoid confusion with future index nlj. http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 373 do we still need this function for anything? there's a call in the partition pruner, but that should also go. http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 434: public void initNoRewrite(Analyzer analyzer) throws ImpalaException { why not just init(Analyzer)? that would also make it clear that you're not applying transformation rules http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 577: emptySetNode.init(ctx_.getRewriter(), analyzer); instead of passing this in, maybe each PlanNode should have access to the ctx_ http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java: Line 93: public void rewriteList(List exprs, Analyzer analyzer) throws AnalysisException { why rewrite as opposed to apply here? http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 204:* Only contains very basic tests for a few interesting cases. A more thorough ". More thorough" http://gerrit.cloudera.org:8080/#/c/5109/1/testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test File testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test: Line 92:predicates: a.int_col = 0, a.times
[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Alex Behm has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3812: Fix error message for unsupported types .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3812: Fix error message for unsupported types .. IMPALA-3812: Fix error message for unsupported types Before this patch an unclear error message was returned if DATE or DATETIME appeared in the select list after a star expansion. This was because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE. This is fixed by serializing correctly. Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Reviewed-on: http://gerrit.cloudera.org:8080/4859 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/UnsupportedTypes/data.csv M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/misc.test 6 files changed, 35 insertions(+), 12 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 445: /// Writes the next value into the next slot in the *tuple using pool if necessary. > ... into the destination slot in tuple ... Done Line 480: } else if (UNLIKELY(NeedsConversion() && > Is it possible that an otherwise invalid value becomes valid after conversi No, because the input to conversion must be a valid date. http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: Line 512: /// the next slot in *tuple. > ... into the appropriate destination slot in 'tuple'. Done http://gerrit.cloudera.org:8080/#/c/4968/5/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test File testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test: Line 5: DROP TABLE IF EXISTS invalid_parquet_timestamp; > setting up the table is already done in the .py file, right? Done -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4363: Add Parquet timestamp validation .. IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we did't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the timestamp falls into 1400.. date range as we are scanning Parquet. Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/runtime/timestamp-value.h M common/thrift/generate_error_codes.py M testdata/bad_parquet_data/README A testdata/data/out_of_range_timestamp.parquet A testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test A testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test M tests/query_test/test_scanners.py 9 files changed, 133 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/6 -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. IMPALA-4392: restore PeakMemoryUsage to DataSink profiles The join build sink patches refactored the DataSink interface and inadvertently removed this counter from the profile. The problem was that the sink MemTracker was not initialized with the sink's profile. The fix is for the sink to create the MemTracker itself. Testing: Ran core tests. Manually checked profile to make sure the counter appeared in HdfsTableSink, DataStreamSender, etc. Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Reviewed-on: http://gerrit.cloudera.org:8080/4969 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/plan-root-sink.cc M be/src/exec/row-batch-cache.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/plan-fragment-executor.h 19 files changed, 64 insertions(+), 74 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting Adds support in the shell to report the number of modified rows for all DML operations, as well as the number of rows with errors. Testing: Added shell tests. Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Reviewed-on: http://gerrit.cloudera.org:8080/5103 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M be/src/service/impala-beeswax-server.cc M common/thrift/ImpalaService.thrift M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_commandline.py 5 files changed, 102 insertions(+), 25 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Michael Ho has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250; > I'd just be picking an arbitrary threshold in that case too, since I don't I agree that it's more actionable if it's based on number of tuples. However, isn't the longer compilation + optimization time with too many tuples a direct result of having large functions ? If so, instruction count seems to be a more robust estimate. For example, the generated functions can be different if we change the codegen function. In that case this threshold may not hold anymore. This threshold really shouldn't be arbitrary. The ideal case is that the planner can estimate a threshold of instruction count in which the cost of codegen is less than interpretation. Also, it seems that the planner should have all the information needed to make the call. Is there some unexpected difficulty for doing it in the planner ? http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) { > I disable it for this particular exchange. This is a proactive thing only r This isn't exactly what I had in mind. If you look at ExchangeNode::Codegen(), we only codegen when is_mergeing_ is true. This is pretty much the same situation as the Agg node here. So, it would be nice to follow the same pattern for exchange node and convert if (is_merging_) to DCHECK(is_merging_); -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4466: Improve Kudu CRUD test coverage .. IMPALA-4466: Improve Kudu CRUD test coverage The results in the test files were verified by hand. This patch also introduces a new test section 'DML_RESULTS', which takes the name of a table as a comment and the contents of the table as its body and then verifies that the body matches the actual contents of the table. This makes it easy to check that a DML operation has the desired effect on the contents of a table, rather than always having to add another test case that runs a select on the table. For now, this section cannot be used in a test along with the RESULTS or ERRORS sections. TODO: Refactor the DML test case handling (IMPALA-4471) Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Reviewed-on: http://gerrit.cloudera.org:8080/4953 Reviewed-by: Matthew Jacobs Tested-by: Internal Jenkins --- M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test D testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test A testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test A testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test A testdata/workloads/functional-query/queries/QueryTest/kudu_update.test A testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test M tests/common/impala_test_suite.py M tests/common/test_result_verifier.py M tests/query_test/test_kudu.py M tests/util/test_file_parser.py 10 files changed, 1,624 insertions(+), 487 deletions(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4466: Improve Kudu CRUD test coverage .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Michael Ho has posted comments on this change. Change subject: IMPALA-4432: Handle internal codegen disabling properly .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 161: if (state->CodegenDisabled()) { > Shouldn't this be COdegenDisabledByQueryOption() to match the message? I refactored this into a function similar to what you did in the other patch and updated the profile string based on the condition. http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS2, Line 113: Interanl > Internal Done PS2, Line 132: too_many_args > too_many_args_to_interpret? Done PS2, Line 156: AddUdfToCodegen > Udf seems like a bit of a misnomer, since it may be a builtin. Maybe "AddSc Done http://gerrit.cloudera.org:8080/#/c/5105/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS2, Line 318: disable_codegen > It would be good if we had a different name to distinguish this from the ot Renamed to disable_codegen_internal. http://gerrit.cloudera.org:8080/#/c/5105/2/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: PS2, Line 42: exec_single_node_option=[100] > It seems like we'd want to test with the single node option both enabled an Done -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-4432: Handle internal codegen disabling properly .. IMPALA-4432: Handle internal codegen disabling properly There are some conditions in which codegen is disabled internally even if it's enabled in the query option. For instance, the single node optimization or the expression evaluation requests sent from the FE to the BE. These internal disabling of codegen are advisory as their purposes are to reduce the latency for tables with no or very few rows. The internal disabling of codegen doesn't interact well with UDFs which cannot be interpreted (e.g. IR UDF) as it conflates with the 'disable_codegen' query option set by the user. As a result, it's hard to differentiate between when codegen is disabled explicitly by users and when it is disabled internally. This change fixes the problem above by adding an explicit flag in TQueryCtx to indicate that codegen is disabled internally. This flag is only advisory. For cases in which codegen is needed to function, this internal flag is ignored and if codegen is disabled via query option, an error is thrown. For this new flag to work with ScalarFnCall, codegen needs to happen after ScalarFnCall::Prepare() because it's hard to tell if a fragment contains any UDF that cannot be interpreted until after ScalarFnCall::Prepare() is called. However, Prepare() needs the codegen object to codegen so it needs to be created before Prepare(). We can either always create the codegen module or defer codegen to a point after ScalarFnCall::Prepare(). The former has the downside of introducing unnecessary latency for say single-node optimization so the latter is implemented. It is needed as part of IMPALA-4192 any way. After this change, ScalarFnCall expressions which need to be codegen'd are inserted into a vector in RuntimeState in ScalarFnCall::Prepare(). Later in the codegen phase, these expressions' GetCodegendComputeFn() will be called after codegen for operators is done. If any of these expressions are already codegen'd indirectly by the operators, GetCodegendComputeFn() will be a no-op. This preserves the behavior that ScalarFnCall will always be codegen'd even if the fragment doesn't contain any codegen enabled operators. Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 --- M be/src/exec/aggregation-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/fe-support.cc M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/planner/Planner.java M tests/query_test/test_udfs.py 24 files changed, 167 insertions(+), 109 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5105/3 -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Improve message output from run-step.sh
Alex Behm has posted comments on this change. Change subject: Improve message output from run-step.sh .. Patch Set 2: Code-Review+2 This was indeed annoying! Thanks for fixing it. -- To view, visit http://gerrit.cloudera.org:8080/5116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4493: fix string-compare-test when using clang .. IMPALA-4493: fix string-compare-test when using clang Only the 0 value or sign bit is specified in the return value for strncmp(), so fix up the test accordingly. Testing: - verified the new test still reproduces IMPALA-4436 - verify the new test passes under ASAN build Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Reviewed-on: http://gerrit.cloudera.org:8080/5110 Reviewed-by: Jim Apple Tested-by: Internal Jenkins --- M be/src/runtime/string-compare-test.cc 1 file changed, 9 insertions(+), 2 deletions(-) Approvals: Jim Apple: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has uploaded a new patch set (#3). Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. IMPALA-4490: Only generate runtime filters for hash join nodes. Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 2 files changed, 34 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5117/3 -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5117/2/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test: PS2, Line 1405: # IMPALA-3809 > Mistyped Jira. Oops, too many at once. Fixed. -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Michael Brown has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5117/2/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test: PS2, Line 1405: # IMPALA-3809 Mistyped Jira. -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. IMPALA-4490: Only generate runtime filters for hash join nodes. Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 2 files changed, 34 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5117/2 -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5117/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test: PS1, Line 1405: # IMPALA-3809 > Can you extend this easily to have a HJ node below the NLJ that does create Sure. Done. -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. IMPALA-4397,IMPALA-3259: reduce codegen time and memory A handful of fixes to codegen memory usage: * Delete the IR module when we're done with it (it can be fairly large) * Track the compiled code size (typically not that large, but it can add up if there are many fragments). * Estimate optimisation memory requirements and track it in the memory tracker. This is very crude but much better than not tracking it. A handful of fixes to improve codegen time/cost, particularly targeted at compute stats workloads: * Avoid over-inlining when there are many aggregate functions, conjuncts, etc by adding "NoInline" attributes. * Bail of out text scanner codegen for wide tables. * Don't codegen non-grouping merge aggregations. They will only process one row per Impala daemon, so codegen is not worth it. * Make the Hll algorithm more efficient by specialising the hash function based on decimal width. Limitations: * This doesn't tackle over-inlining of large expr trees, but a similar approach will be used there in a follow-on patch. Perf: Compute stats on functional_parquet.widetable_1000_cols goes from 1min+ of codegen to < 1s codegen on my machine. Local perf runs of tpc-h and targeted perf showed no regressions and some moderate improvements (1-2%). Also did an experiment to understand the perf consequences of disabling inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran: drop stats tpch_20_parquet.lineitem compute stats tpch_20_parquet.lineitem; There was no difference in time spent in the agg node: 30.7s with inlining, 30.5s without. Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/codegen/mcjit-mem-mgr.h M be/src/exec/aggregation-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.h M be/src/runtime/lib-cache.cc M be/src/runtime/runtime-state.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M tests/query_test/test_aggregation.py 29 files changed, 1,465 insertions(+), 165 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/7 -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 7: PS7 is a rebase -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5117/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test: PS1, Line 1405: # IMPALA-3809 Can you extend this easily to have a HJ node below the NLJ that does create a filter? This would confirm that filters are still created (otherwise if they're somehow disabled, this test still passes). No need if it's too noisy. -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. IMPALA-4397,IMPALA-3259: reduce codegen time and memory A handful of fixes to codegen memory usage: * Delete the IR module when we're done with it (it can be fairly large) * Track the compiled code size (typically not that large, but it can add up if there are many fragments). * Estimate optimisation memory requirements and track it in the memory tracker. This is very crude but much better than not tracking it. A handful of fixes to improve codegen time/cost, particularly targeted at compute stats workloads: * Avoid over-inlining when there are many aggregate functions, conjuncts, etc by adding "NoInline" attributes. * Bail of out text scanner codegen for wide tables. * Don't codegen non-grouping merge aggregations. They will only process one row per Impala daemon, so codegen is not worth it. * Make the Hll algorithm more efficient by specialising the hash function based on decimal width. Limitations: * This doesn't tackle over-inlining of large expr trees, but a similar approach will be used there in a follow-on patch. Perf: Compute stats on functional_parquet.widetable_1000_cols goes from 1min+ of codegen to < 1s codegen on my machine. Local perf runs of tpc-h and targeted perf showed no regressions and some moderate improvements (1-2%). Also did an experiment to understand the perf consequences of disabling inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran: drop stats tpch_20_parquet.lineitem compute stats tpch_20_parquet.lineitem; There was no difference in time spent in the agg node: 30.7s with inlining, 30.5s without. Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/codegen/mcjit-mem-mgr.h M be/src/exec/aggregation-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.h M be/src/runtime/lib-cache.cc M be/src/runtime/runtime-state.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M tests/query_test/test_aggregation.py 29 files changed, 1,465 insertions(+), 165 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/6 -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 952: RETURN_IF_ERROR(OptimizeModule()); > Is there a reason why we don't call DestroyIRModule() when OptimizeModule() No reason, it's inconsistent. I don't think it's really necessary to destroy the IR module on the error path, since it will be cleaned up shortly anyway, so I removed it on those paths to simplify things. PS4, Line 986: memory_manager_->bytes_allocated()), : memory_manager_->bytes_allocated() > The code may be easier to read if this factored into a single variable inst Done Line 1039: if (!mem_tracker_->TryConsume(estimated_memory)) { > I wonder if it's better to skip optimization if we cannot reserve the memor Compiling the code can also be pretty CPU/memory intensive so it probably isn't even safe to do that. I think the codegen memory will only be greater than the runtime memory requirement in some pretty extreme cases. E.g. in the queries I looked at the memory calculation was at most a few MB per fragment, and usually less. PS4, Line 1041: Substitute("Codegen failed to reserve '$0' bytes for optimization", : estimated_memory), : estimated_memory); > These line wraps look a bit weird to me but not sure if it's a side effect I agree they look weird but clang-format really wants to format it like this. http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS4, Line 708: optimiser > nit: optimizer We generally use a mix of American and commonwealth English spellings in comments. I guess here we're inconsistent internally so I just changed it. PS4, Line 709: 512 bytes > Mind documenting how this is derived so we can update it accordingly in cas Elaborated a bit. It's very approximate (and will become very inaccurate once functions get large). http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/mcjit-mem-mgr.h File be/src/codegen/mcjit-mem-mgr.h: PS4, Line 36: memory > This is for tracking memory consumed by the compiled machine code, right ? Reworded to clarify. http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS4, Line 187: false > true ? Done http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS4, Line 762: if (ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) { : // Avoid bloating function by inlining too many exprs into it. : expr_fn->addFnAttr(llvm::Attribute::NoInline); : } > I wonder if this can be more generalized by putting it in LlvmCodegen::Fina Not a bad idea, but doesn't address the main scenario this patch helps with, which is when many small functions are inlined into a large function. http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250; > This seems a bit adhoc to me. May be better to track the size of IR functio I'd just be picking an arbitrary threshold in that case too, since I don't really have a good way to accurately estimate the cost. The advantage of doing it this way is that it's easier to understand why it was disabled and gives a user an actionable way to work around it. PS4, Line 190: const > constexpr There isn't a semantic difference between static const and static constexpr when it's an integer literal. http://en.cppreference.com/w/cpp/language/constant_expression http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS4, Line 1911: : > Is it necessary to remove it ? For comparison, please see FirstValUpdate(). I added it back in. It shouldn't be necessary to instantiate it, but this makes it more consistent. http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/expr.h File be/src/exprs/expr.h: PS4, Line 311: CODEGEN_INLINE_EXPRS_THRESHOLD > Wouldn't instruction count be a more precise estimation ? It's a pretty arbitrary threshold anyway (afaik optimisation time is only loosely correlated with instruction count) so I like that this is a little simpler and easier to understand. PS4, Line 311: const > constexpr ? I don't feel strongly but I don't think there's any advantage to constexpr when it's a simple literal. http://gerrit.cloudera.org:8080/#/c/4956/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 494: 24: required bool disable_codegen > please move up, into the common/non-union part of this struct (field 8?) a
[Impala-ASF-CR] Improve message output from run-step.sh
Henry Robinson has uploaded a new patch set (#2). Change subject: Improve message output from run-step.sh .. Improve message output from run-step.sh run-step prints a message to tell the reader what it's doing. However, that message wasn't flushed so that run-step could print OK or FAILED on the same line. The result was that long-running steps wouldn't print anything to the log until they were done, at least in Jenkins contexts. This patch changes it so that the message is flushed, and then the result is printed on a separate line (including the time it took to run the step). $ run-step "Hello world!" helloworld.out sleep 5 Hello world! (logging to /tmp/helloworld.out)... OK (Took: 0 min 5 sec) Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 --- M testdata/bin/run-step.sh 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5116/2 -- To view, visit http://gerrit.cloudera.org:8080/5116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Alex Behm has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 21: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded > Can do - but where would that extra slowness really come from? I would have I'm assuming you are measuring response time. Since there is overall more work for the scanner to do in your dict-encoded experiment, any difference in perf will be less pronounced because it affects a relatively smaller portion of the work. With plain encoded there is no "overhead" of decoding the dictionary indexes and fetching the values from the dictionary. For a single decimal column, the work of decoding the dict indexes and fetching their values should be in the same ball park as just populating the slot directly with plain encoding, so there is roughly 50% "noise" it seems. Line 23: * No performance difference measured by introduction of extra > No, but I can do. What do you expect to change? I'm assuming you compared response times. With multi-threaded scans the loss in perf might not be apparent. With mt_dop=1 we're running the whole query in a single thread, so any slowdown along that critical path should prominently affect response time. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. IMPALA-4397,IMPALA-3259: reduce codegen time and memory A handful of fixes to codegen memory usage: * Delete the IR module when we're done with it (it can be fairly large) * Track the compiled code size (typically not that large, but it can add up if there are many fragments). * Estimate optimisation memory requirements and track it in the memory tracker. This is very crude but much better than not tracking it. A handful of fixes to improve codegen time/cost, particularly targeted at compute stats workloads: * Avoid over-inlining when there are many aggregate functions, conjuncts, etc by adding "NoInline" attributes. * Bail of out text scanner codegen for wide tables. * Don't codegen non-grouping merge aggregations. They will only process one row per Impala daemon, so codegen is not worth it. * Make the Hll algorithm more efficient by specialising the hash function based on decimal width. Limitations: * This doesn't tackle over-inlining of large expr trees, but a similar approach will be used there in a follow-on patch. Perf: Compute stats on functional_parquet.widetable_1000_cols goes from 1min+ of codegen to < 1s codegen on my machine. Local perf runs of tpc-h and targeted perf showed no regressions and some moderate improvements (1-2%). Also did an experiment to understand the perf consequences of disabling inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran: drop stats tpch_20_parquet.lineitem compute stats tpch_20_parquet.lineitem; There was no difference in time spent in the agg node: 30.7s with inlining, 30.5s without. Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/codegen/mcjit-mem-mgr.h M be/src/exec/aggregation-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.h M be/src/runtime/lib-cache.cc M be/src/runtime/runtime-state.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M tests/query_test/test_aggregation.py 29 files changed, 1,458 insertions(+), 165 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/5 -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/5115/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: Line 203: /// assumed to be BYTE_ARRAY, otherwise it is assumed to be FIXED_LEN_BYTE_ARRAY. Document return value? PS1, Line 328: fixed_len_size This name seems inaccurate since it can be variable. Maybe 'encoded_size'? PS1, Line 338: fixed_len_size What if fixed_len_size is negative? http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: Line 150: # Decimal Parquet encodings written by Java Parquet library. If you're going to create the table as part of the test, might as well copy the data as part of the test (and avoid the need to reload test data). E.g. like https://github.com/apache/incubator-impala/blob/master/tests/query_test/test_scanners.py#L248 http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test File testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test: PS1, Line 1: : QUERY : select col2 from decimal_encodings; : RESULTS : 123.00 : 123.00 : 123.00 : 123.00 : 123.00 : TYPES : DECIMAL > We're working on getting some richer test data - it's just a bit of a pain Sounds good, I think we need a bit more, especially boundary cases for different decimal sizes. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/5117 Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. IMPALA-4490: Only generate runtime filters for hash join nodes. Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 2 files changed, 24 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5117/1 -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] Improve message output from run-step.sh
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/5116 Change subject: Improve message output from run-step.sh .. Improve message output from run-step.sh run-step prints a message to tell the reader what it's doing. However, that message wasn't flushed so that run-step could print OK or FAILED on the same line. The result was that long-running steps wouldn't print anything to the log until they were done, at least in Jenkins contexts. This patch changes it so that the message is flushed, and then the result is printed on a separate line (including the time it took to run the step). $ run-step "Hello world!" helloworld.out sleep 5 Hello world! (logging to /tmp/helloworld.out)... OK (Took: 0 min 5 sec) Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 --- M testdata/bin/run-step.sh 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5116/1 -- To view, visit http://gerrit.cloudera.org:8080/5116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 21: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded > Can also verify on plain encoding? Seems like that would be the extreme cas Can do - but where would that extra slowness really come from? I would have thought decoding 1 billion values would represent a large portion of the execution time of such a simple query, and if the proportion is really that small I'm inclined to worry even less about the perf differential. Line 23: * No performance difference measured by introduction of extra > Did you run this with num_scanner_threads=1 or even better mt_dop=1? No, but I can do. What do you expect to change? -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Alex Behm has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 23: * No performance difference measured by introduction of extra Did you run this with num_scanner_threads=1 or even better mt_dop=1? -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Alex Behm has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 21: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded Can also verify on plain encoding? Seems like that would be the extreme case. The perf difference may not be pronounced enough with dict-encoding due to the general slowness of a dictionary-encoded column with only distinct values. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Henry Robinson has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 2: > Thanks!Do you think I should add some code to catalog to validate the > value of 'initial_hms_cnxn_timeout_s'? What would be the acceptable > range? You could validate that it was > 0, but otherwise I don't think you should put a bound on it. Users may want to wait effectively forever, and I can see them setting 99 as a value to simulate that. > Adding this config parameter to the CM UI is another issue. Do you think it > should be added for the 5.10 release or it is not that urgent? I guess, the > default value is reasonable for most users and they can always set it as a > safety valve if they have to. Sounds like you're talking about a Cloudera release. Reminder that this is the Apache Impala project - vendor specific concerns should be discussed elsewhere. -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Tim Armstrong has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 157: if (LIKELY(dst->ptr && src.ptr)) memcpy(dst->ptr, src.ptr, src.len); Prefer explicit NULL checks. I also don't believe this change is necessary and could mask other bugs. If src.ptr is NULL, that means src.len is 0 (unless there's a bug), which means that dst->ptr should have been set to NULL by the StringVal constructor. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1684: accumulators::count(completion_times) > accumulators::count(completion_times) can be zero, making mean and variance Can you do an explicit comparison against 0 here? Line 1701: if (accumulators::count(rates)) { And here http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/row-batch-serialize-test.cc File be/src/runtime/row-batch-serialize-test.cc: PS1, Line 164: len > int buf[x] where x is a 0-length variable at run-time is UB How about using a vector or string here instead to avoid the extra branch. http://gerrit.cloudera.org:8080/#/c/5082/1/bin/run-backend-tests.sh File bin/run-backend-tests.sh: Line 41: export PATH="${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/bin:${PATH}" > http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#stack-traces-and It's a bit unfortunately there isn't a cleaner way to do this. -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test File testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test: PS1, Line 1: : QUERY : select col2 from decimal_encodings; : RESULTS : 123.00 : 123.00 : 123.00 : 123.00 : 123.00 : TYPES : DECIMAL We're working on getting some richer test data - it's just a bit of a pain to write it out. This will include > 1 dict-encoded value, and some values that are plain encoded. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/5115 Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner * Extend metadata checks to allow more than one possible physical type for a given logical type. * Change decimal decoding to handle non-fixed-length format in same path as fixed-length encoding. Testing: * Query test that decodes dictionary-encoded decimals using binary encoding. Perf: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded decimal(12,2) values using FIXED_BYTE_ARRAY physical type encoding. * No performance difference measured by introduction of extra predictable branch to Decode() path. Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M testdata/bin/create-load-data.sh A testdata/data/byte_array_decimal_dict_encoded.parquet A testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test M tests/query_test/test_scanners.py 7 files changed, 118 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5115/1 -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0. .. IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0. Our NumericLiteral is backed by a BigDecimal which cannot represent the special float values NaN, infinity or negative zero. As a result, when evaluating constant expressions from the FE we hit an exception when trying to create a NumericLiteral from a NaN or infinity value. Before, negative zero would silently get converted to zero which is dangerous. The fix is to treat the expr evaluation as a failure and not replace the constant Expr with a LiteralExpr. Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230 Reviewed-on: http://gerrit.cloudera.org:8080/5050 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test 3 files changed, 32 insertions(+), 3 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0. .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Alex Behm has posted comments on this change. Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Matthew Jacobs has uploaded a new patch set (#3). Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting Adds support in the shell to report the number of modified rows for all DML operations, as well as the number of rows with errors. Testing: Added shell tests. Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 --- M be/src/service/impala-beeswax-server.cc M common/thrift/ImpalaService.thrift M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_commandline.py 5 files changed, 102 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/5103/3 -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5103/2/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 260: // Number of row operations attempted but not modified due to errors reported by the > Suggest some minor rephrasing: Done http://gerrit.cloudera.org:8080/#/c/5103/2/shell/impala_shell.py File shell/impala_shell.py: Line 928: error_report = ", %d row error(s)" % (num_row_errors) > Don't we need to check somewhere whether num_row_errors was set in the resp Yes, this wasn't handled yet, thx. I added a test. -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types
Alex Behm has posted comments on this change. Change subject: IMPALA-3812: Fix error message for unsupported types .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Alex Behm has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation .. Patch Set 5: (4 comments) Almost ready for +1 http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 445: /// Writes the next value into the next slot in the *tuple using pool if necessary. ... into the destination slot in tuple ... (it's not the "next" slot, it is always the same slot for a particular column reader) Line 480: } else if (UNLIKELY(NeedsConversion() && Is it possible that an otherwise invalid value becomes valid after conversion? http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: Line 512: /// the next slot in *tuple. ... into the appropriate destination slot in 'tuple'. (it's not the "next" slot, it is always the same slot for a particular column reader) http://gerrit.cloudera.org:8080/#/c/4968/5/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test File testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test: Line 5: DROP TABLE IF EXISTS invalid_parquet_timestamp; setting up the table is already done in the .py file, right? -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution
anujphadke has posted comments on this change. Change subject: IMPALA-3342: Add thread counters to monitor plan fragment execution .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/4633/8/be/src/runtime/plan-fragment-executor.cc File be/src/runtime/plan-fragment-executor.cc: PS8, Line 42: #include "runtime/runtime-state. > what do you need this for? This came from an older patch. Removed. http://gerrit.cloudera.org:8080/#/c/4633/8/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: PS8, Line 137: rage_wait_timer_ = > Does it work if you pass an empty string? I think "PlanFragmentThreads" is Done http://gerrit.cloudera.org:8080/#/c/4633/8/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS8, Line 323: total_thread_statistics > how about calling this total_thread_statistics_ or similar? Done -- To view, visit http://gerrit.cloudera.org:8080/4633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4633 to look at the new patch set (#9). Change subject: IMPALA-3342: Add thread counters to monitor plan fragment execution .. IMPALA-3342: Add thread counters to monitor plan fragment execution This change removes the use of total_cpu_timer which incorrectly monitors the CPU time. Adding THREAD_COUNTERS to measure the user and sys time in plan fragment execution. This also accounts for the time spent in the hdfs/kudu scanner and in a blocking join. Snippet of a query plan with the newly added PlanFragment THREAD_COUNTERS: Instance 2b40b101e2626e7a:a3d8f23 - PeakMemoryUsage: 32.02 KB (32784) - PerHostPeakMemUsage: 430.52 MB (451431312) - RowsProduced: 1 (1) - TotalNetworkReceiveTime: 10s379ms - TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 0.000ns - TotalWallClockTime: 10s577ms - SysTime: 8.000ms - UserTime: 8.000ms - VoluntaryContextSwitches: 80 (80) Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 --- M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/plan-fragment-executor.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h 7 files changed, 17 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4633/9 -- To view, visit http://gerrit.cloudera.org:8080/4633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types
Hello Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4859 to look at the new patch set (#3). Change subject: IMPALA-3812: Fix error message for unsupported types .. IMPALA-3812: Fix error message for unsupported types Before this patch an unclear error message was returned if DATE or DATETIME appeared in the select list after a star expansion. This was because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE. This is fixed by serializing correctly. Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 --- M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/UnsupportedTypes/data.csv M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/misc.test 6 files changed, 35 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/4859/3 -- To view, visit http://gerrit.cloudera.org:8080/4859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Dan Hecht has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (22 comments) what's the plan for keeping this compiling? http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: PS1, Line 95: synbol typo http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc File be/src/exec/data-source-scan-node.cc: PS1, Line 150: ) add "!= 0" as we try to avoid implicit comparisons against zero conversion to bool. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 276: *decimal >>= bytes_to_fill * 8; > If bytes_to_fill >= sizeof(decimal), this is UB since len comes from the data file, we could also have bytes_to_fill < 0, no? isn't that also undefined? Line 278: *decimal = 0; can this happen with well-formed avro? if not, this be a return false case. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1379: double percent = total_rows * 100 / std::max(1L, num_input_rows); > num_input_rows is sometimes 0. if that were the case, we would have seen crashes. In any case, when num_input_rows == 0, shouldn't we set precent to either 0 or 100? or i guess that happens since it implies total_rows == 0, but would probably be more clear what's going on to just write: percent = num_input_rows == 0 ? 0 : total_rows * 100 / num_input_rows; http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc File be/src/exec/read-write-util.cc: PS1, Line 97: * nit: we put the * next to type. Line 100: memcpy(&uinteger, &integer, sizeof(integer)); > type punning is UB. using memcpy doesn't seem necessary. can't we just write: uint32_t uinteger = integer; uinteger = (uinteger << 1) | (uinteger >> 31); neither of those stmts are undefined. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc File be/src/exprs/bit-byte-functions-ir.cc: PS1, Line 151: v * why is this necessary? and does it generate the same code (let's make sure there is no multiply)? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 820: if (!haystack) return NULL; okay but this check is still unnecessary - no loop iterations will be taken. let's avoid the extraneous cmp/branch. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: PS1, Line 804: t nit: "t != nullptr" per our style. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: Line 161: if (dest->len) memcpy(dest->ptr, src->ptr, dest->len); Don't we return a non-null ptr (zero_length_region_) in this case to avoid the need for this check? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 975: if (string_val->ptr) memcpy(dest, string_val->ptr, string_val->len); != nullptr http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/string-value.inline.h File be/src/runtime/string-value.inline.h: Line 58: const int result = len ? strncmp(s1, s2, len) : 0; > if either s1 or s2 is nullptr, this is UB len != 0 though, we may define StringCompare() to have the same semantics, making this unnecessary. e.g. Compare() wrapper, though maybe other callsites can pass nulls? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/tuple.h File be/src/runtime/tuple.h: Line 75: if (tuple_desc.num_null_bytes()) { != 0 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/udf/udf.cc File be/src/udf/udf.cc: Line 487: if (LIKELY(len && !result.is_null)) { len != 0 though this may be another place we expect to always have a valid pointer (as long as i!s_null). We may need to switch the order of the FLAGS_stress_free_pool_alloc code in free-pool and its check for 0, however. That said, I'm okay with adding the len check here since we don't make the non-null assumption in the StringVal::StringVal(FunctionContext* context, int len) constructor itself either. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bit-packing-test.cc File be/src/util/bit-packing-test.cc: Line 36: if (bit_width >= CHAR_BIT * sizeof(uint32_t)) return ~0; > lshift past the end is UB But where would we ever do that? i.e. shouldn't this be a precondition to this routine (DCHECK)? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bitmap.h File be/src/util/bitmap.h: Line 78: if (buffer_.size()) memset(buffer_.data(), 255 * b, buffer_.size() * sizeof(uint64_t)); != 0 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/coding-util-test.cc File be/src/util/coding-util-test.cc: Lin
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Alex Behm has posted comments on this change. Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5103/2/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 260: // Number of row operations attempted but not modified due to errors reported by the Suggest some minor rephrasing: Number of row operations attempted but not completed due to non-fatal errors reported by the storage engine that Impala treats as warnings. http://gerrit.cloudera.org:8080/#/c/5103/2/shell/impala_shell.py File shell/impala_shell.py: Line 928: error_report = ", %d row error(s)" % (num_row_errors) Don't we need to check somewhere whether num_row_errors was set in the response thrift struct? This still seems to add the error report for all DML statements, but maybe I'm missing something. -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4466: Improve Kudu CRUD test coverage .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Michael Ho has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 952: RETURN_IF_ERROR(OptimizeModule()); Is there a reason why we don't call DestroyIRModule() when OptimizeModule() fails ? PS4, Line 986: memory_manager_->bytes_allocated()), : memory_manager_->bytes_allocated() The code may be easier to read if this factored into a single variable instead. Line 1039: if (!mem_tracker_->TryConsume(estimated_memory)) { I wonder if it's better to skip optimization if we cannot reserve the memory to do so than punting back to interpretation ? PS4, Line 1041: Substitute("Codegen failed to reserve '$0' bytes for optimization", : estimated_memory), : estimated_memory); These line wraps look a bit weird to me but not sure if it's a side effect of clang-format ? http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS4, Line 708: optimiser nit: optimizer PS4, Line 709: 512 bytes Mind documenting how this is derived so we can update it accordingly in case we change the optimization level in the future ? http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/mcjit-mem-mgr.h File be/src/codegen/mcjit-mem-mgr.h: PS4, Line 36: memory This is for tracking memory consumed by the compiled machine code, right ? http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS4, Line 187: false true ? http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS4, Line 762: if (ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) { : // Avoid bloating function by inlining too many exprs into it. : expr_fn->addFnAttr(llvm::Attribute::NoInline); : } I wonder if this can be more generalized by putting it in LlvmCodegen::FinalizeFunction() instead and use instruction count to determine whether the function is too large to be worth being inlined. Of course, this may be a problem if we intentionally inline a big function into the cross-compiled code and rely on constant folding to eliminate 90% of the instructions in the function. In which case, we may need to do a quick constant propagation pass first in FinalizeFunction(). Just a thought. http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250; This seems a bit adhoc to me. May be better to track the size of IR functions and don't use it if it gets too large. Empirically, the time to generate the IR is rather small compared to the optimization time so the wasted work may not be too costly and hopefully, the large IR function is not a common case. PS4, Line 190: const constexpr http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: PS4, Line 85: Oops. My bad. http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS4, Line 1911: : Is it necessary to remove it ? For comparison, please see FirstValUpdate(). http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/expr.h File be/src/exprs/expr.h: PS4, Line 311: const constexpr ? PS4, Line 311: CODEGEN_INLINE_EXPRS_THRESHOLD Wouldn't instruction count be a more precise estimation ? -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3882: Simplify some query exec state locking
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3882: Simplify some query exec state locking .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/4935/6/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 40: using namespace strings; This shouldn't be needed, we have 'using strings::Substitute' in names.h now. http://gerrit.cloudera.org:8080/#/c/4935/4/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 82: /// 2. session_state_map_lock_ > Similar-ish, but not as acute. We don't have the problem where we have this Sounds like we should file a JIRA for this. http://gerrit.cloudera.org:8080/#/c/4935/6/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: Line 79: void WaitForMetadataAvailable() { metadata_available_.Get(); } I think the new name is a bit misleading, because the metadata may not be available if an error is encountered in the meantime. There's an undocumented requirement to check the query status after calling this function. -- To view, visit http://gerrit.cloudera.org:8080/4935 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I516357d2b5e9eb83e8209872cbe4c078c778a629 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4466: Improve Kudu CRUD test coverage .. Patch Set 12: (1 comment) Yes, the only changes since the last time it was +2ed are the three new test files (kudu_insert.test, kudu_delete.test, and kudu_update.test), the changes to add these and remove kudu_crud.test from test_kudu.py, and moving some tests from kudu_crud.test into kudu_create.test. http://gerrit.cloudera.org:8080/#/c/4953/11/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test: Line 39: insert into tdata values > wrap long line Done -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage
Hello Matthew Jacobs, Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4953 to look at the new patch set (#12). Change subject: IMPALA-4466: Improve Kudu CRUD test coverage .. IMPALA-4466: Improve Kudu CRUD test coverage The results in the test files were verified by hand. This patch also introduces a new test section 'DML_RESULTS', which takes the name of a table as a comment and the contents of the table as its body and then verifies that the body matches the actual contents of the table. This makes it easy to check that a DML operation has the desired effect on the contents of a table, rather than always having to add another test case that runs a select on the table. For now, this section cannot be used in a test along with the RESULTS or ERRORS sections. TODO: Refactor the DML test case handling (IMPALA-4471) Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 --- M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test D testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test A testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test A testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test A testdata/workloads/functional-query/queries/QueryTest/kudu_update.test A testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test M tests/common/impala_test_suite.py M tests/common/test_result_verifier.py M tests/query_test/test_kudu.py M tests/util/test_file_parser.py 10 files changed, 1,624 insertions(+), 487 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4953/12 -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. IMPALA-4477: Upgrade Kudu version to latest master Change the toolchain build and Kudu version to use the latest master, using Kudu commit e836ac. Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Reviewed-on: http://gerrit.cloudera.org:8080/5106 Reviewed-by: Matthew Jacobs Tested-by: Internal Jenkins --- M bin/impala-config.sh 1 file changed, 5 insertions(+), 4 deletions(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Jim Apple has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 1: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/490/ -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Dan Hecht has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc File be/src/runtime/string-compare-test.cc: Line 34: EXPECT_EQ(stringcompare_r, 0); > But the args to EXPECT_EQ don't include the args to RunTestCase: the actual Oops, sorry, didn't read that carefully enough. Done. -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5110 to look at the new patch set (#2). Change subject: IMPALA-4493: fix string-compare-test when using clang .. IMPALA-4493: fix string-compare-test when using clang Only the 0 value or sign bit is specified in the return value for strncmp(), so fix up the test accordingly. Testing: - verified the new test still reproduces IMPALA-4436 - verify the new test passes under ASAN build Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 --- M be/src/runtime/string-compare-test.cc 1 file changed, 9 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5110/2 -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Jim Apple has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc File be/src/runtime/string-compare-test.cc: Line 34: EXPECT_EQ(stringcompare_r, 0); > EXPECT_EQ already prints its args (stringcompare_r), and the exact value of But the args to EXPECT_EQ don't include the args to RunTestCase: the actual strings being compared. -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Dan Hecht has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc File be/src/runtime/string-compare-test.cc: Line 34: EXPECT_EQ(stringcompare_r, 0); > Failures here would be easier to diagnose with the addition of " << l << ' EXPECT_EQ already prints its args (stringcompare_r), and the exact value of strncmp_r isn't interesting (only whether it is 0, neg, pos) which is implied by the EXPECT case. -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Michael Ho has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); > We could probably do it in the backend but the planner seemed like the appr Sure. This seems to set things up so the planner can selectively disable codegen in exec node the future. While you are at it, would you mind using the same infrastructure to disable codegen for non-merging exchange ? -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Jim Apple has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc File be/src/runtime/string-compare-test.cc: Line 34: EXPECT_EQ(stringcompare_r, 0); Failures here would be easier to diagnose with the addition of " << l << ' ' << r" -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/4956/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 494: 24: required bool disable_codegen please move up, into the common/non-union part of this struct (field 8?) and renumber the rest. http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); > We could probably do it in the backend but the planner seemed like the appr i agree, the planner is the right place for making static decisions about execution. -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/4956/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1919: template void AggregateFunctions::HllUpdate( Reinstate this. http://gerrit.cloudera.org:8080/#/c/4956/1/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 310: /// evaluation into the calling function. Remove partially implemented stuff. http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); > A high level question: is there a reason why this decision cannot be made i We could probably do it in the backend but the planner seemed like the appropriate place since the decision is based on the plan shape and expected cost. -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Michael Ho has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); A high level question: is there a reason why this decision cannot be made in PartitionedAggregationNode::Codegen() ? I suppose there is way for an Agg node to know whether it's a merging Agg node, right ? -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4493: fix string-compare-test when using clang .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4432: Handle internal codegen disabling properly .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 161: if (state->CodegenDisabled()) { Shouldn't this be COdegenDisabledByQueryOption() to match the message? http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS2, Line 113: Interanl Internal PS2, Line 132: too_many_args too_many_args_to_interpret? PS2, Line 156: AddUdfToCodegen Udf seems like a bit of a misnomer, since it may be a builtin. Maybe "AddScalarFnToCodegen()"? http://gerrit.cloudera.org:8080/#/c/5105/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS2, Line 318: disable_codegen It would be good if we had a different name to distinguish this from the other disable_codegen flag. E.g. hint_disable_codegen? http://gerrit.cloudera.org:8080/#/c/5105/2/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: PS2, Line 42: exec_single_node_option=[100] It seems like we'd want to test with the single node option both enabled and disabled -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang
Dan Hecht has uploaded a new change for review. http://gerrit.cloudera.org:8080/5110 Change subject: IMPALA-4493: fix string-compare-test when using clang .. IMPALA-4493: fix string-compare-test when using clang Only the 0 value or sign bit is specified in the return value for strncmp(), so fix up the test accordingly. Testing: - verified the new test still reproduces IMPALA-4436 - verify the new test passes under ASAN build Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 --- M be/src/runtime/string-compare-test.cc 1 file changed, 9 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5110/1 -- To view, visit http://gerrit.cloudera.org:8080/5110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht
[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4466: Improve Kudu CRUD test coverage .. Patch Set 11: Code-Review+2 (1 comment) Reviewed the new test cases that were added. I assume there weren't other changes to the UPSERT tests or the python infra code, right? http://gerrit.cloudera.org:8080/#/c/4953/11/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test: Line 39: insert into tdata values (3, cast('nan' as float), max_bigint(), '', true, min_tinyint(), max_smallint(), cast('-inf' as double)) wrap long line -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.
Alex Behm has posted comments on this change. Change subject: PREVIEW: IMPALA-1788: Fold constant expressions during plan generation. .. Patch Set 1: There are still some failing tests because of expected warnings from expr evaluations. I still need to address those, probably by bailing on constant folding if there is any warning/error. -- To view, visit http://gerrit.cloudera.org:8080/5109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-HasComments: No
[Impala-ASF-CR] PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/5109 Change subject: PREVIEW: IMPALA-1788: Fold constant expressions during plan generation. .. PREVIEW: IMPALA-1788: Fold constant expressions during plan generation. Adds a new ExprRewriteRule for replacing constant expressions with their literal equivalent via BE evaluation. Applies the rewrite during part of plan generation. After this patch the state of expression rewriting is: 1. A first phase of rewriting is performed on the analyzed parse tree. The modified parse tree is reset() and analyzed again. We need to apply some rewrites at this stage because the transformations may affect conjunct registration (e.g., may generate new conjuncts). The rewrites in this phase are applied to the unresolved expressions of the parse tree. 2. A second phase of rewriting is performed during plan generation (now only constant folding). Doing rewrites at this stage has the benefit of being able to transform the fully resolved exprs. It also implies that we can perform rewrites without changing the output type of exprs, as opposed to the the first phase where types may change due to the re-analysis. This ensures that the optimizations have no user-visible effect. This patch includes the following interesting changes: - Introduces a timestamp literal that can only be produced by constant folding (not expressible directly via SQL). - Modifies PlanNode.init() and all its implementations. The rewrites are performed in this function because predicate pushdown depends on constant folding for some scan nodes, e.g., pushing predicates to Kudu/HBase. - Fixes an existing issue with converting strings between the FE/BE. String produced in the BE that have characters with a value > 127 are not correctly deserialized into a Java String via thrift. We detect this case during constant folding and abandon folding of such exprs. - Cleans up ExprContext::GetValue() into ExprContext::GetConstantValue() which clarifies its only use of evaluating exprs from the FE. Testing: - Modifies expr-test.cc to run all tests through the constant folding path. - Adds basic planner and rewrite rule tests. Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9 --- M be/src/exprs/expr-context.cc M be/src/exprs/expr-context.h M be/src/exprs/expr-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/service/fe-support.cc M common/thrift/Exprs.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java A fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java M fe/src/main/java/org/apache/impal
[Impala-ASF-CR] IMPALA-4216: Test became flaky: TestTpchMemLimitError.test low mem limit q20
Dan Hecht has posted comments on this change. Change subject: IMPALA-4216: Test became flaky: TestTpchMemLimitError.test_low_mem_limit_q20 .. Patch Set 1: Tim, maybe this was one of the dropped scanner status that you've fixed? -- To view, visit http://gerrit.cloudera.org:8080/4572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I471a90c8e3bdecb4fed08fbae3436c50b8618471 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4216: Test became flaky: TestTpchMemLimitError.test low mem limit q20
Dan Hecht has posted comments on this change. Change subject: IMPALA-4216: Test became flaky: TestTpchMemLimitError.test_low_mem_limit_q20 .. Patch Set 1: > > But why do we no longer get the MEM_LIMIT_EXCEEDED error status > > returned by this query? > > I could not reproduce the issue. My guess is that we get the > MEM_LIMIT_EXCEEDED error status but it doesn't get added to the > error_log. Probably when we call LogError(status.msg), ErrorCount() > is greater than max_errors: > > bool RuntimeState::LogError(const ErrorMsg& message, int > vlog_level) { > lock_guard l(error_log_lock_); > // All errors go to the log, unreported_error_count_ is counted > independently of the > // size of the error_log to account for errors that were already > reported to the > // coordinator > VLOG(vlog_level) << "Error from query " << query_id() << ": " << > message.msg(); > if (ErrorCount(error_log_) < query_options().max_errors) { > AppendError(&error_log_, message); > return true; > } > return false; > } > > I could be mistaken, I'm not really familiar with how error > handling is implemented. The query status doesn't come from the error log, so I don't think the limit should have an impact here. It seems we are dropping a returned mem limit exceeded status somewhere, and we should fix that. Do we still have the log file for a failed run? If so, the log should show the callstack in which the MEM_LIMIT_EXCEEDED status object was created, which may give us a clue as to where it's getting dropped. If we don't have the log, we may need to wait for this to repro. -- To view, visit http://gerrit.cloudera.org:8080/4572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I471a90c8e3bdecb4fed08fbae3436c50b8618471 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4216: Test became flaky: TestTpchMemLimitError.test low mem limit q20
Attila Jeges has posted comments on this change. Change subject: IMPALA-4216: Test became flaky: TestTpchMemLimitError.test_low_mem_limit_q20 .. Patch Set 1: > But why do we no longer get the MEM_LIMIT_EXCEEDED error status > returned by this query? I could not reproduce the issue. My guess is that we get the MEM_LIMIT_EXCEEDED error status but it doesn't get added to the error_log. Probably when we call LogError(status.msg), ErrorCount() is greater than max_errors: bool RuntimeState::LogError(const ErrorMsg& message, int vlog_level) { lock_guard l(error_log_lock_); // All errors go to the log, unreported_error_count_ is counted independently of the // size of the error_log to account for errors that were already reported to the // coordinator VLOG(vlog_level) << "Error from query " << query_id() << ": " << message.msg(); if (ErrorCount(error_log_) < query_options().max_errors) { AppendError(&error_log_, message); return true; } return false; } I could be mistaken, I'm not really familiar with how error handling is implemented. -- To view, visit http://gerrit.cloudera.org:8080/4572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I471a90c8e3bdecb4fed08fbae3436c50b8618471 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3838: Codegen EvalRuntimeFilters(). .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4833/1/be/src/exec/hdfs-parquet-scanner-ir.cc File be/src/exec/hdfs-parquet-scanner-ir.cc: Line 73: if (!filter_ctxs_[i]->Eval(row)) { > Actually, RuntimeFilter::Eval() will return true if the filter hasn't arriv I had it reversed - as far as I can see, before filter arrival, it will increment 'considered' and not 'rejected' for every row processed even though the filter is not available yet. I couldn't convince myself that the code was right, so I added some logging and it looks like it's doing something that doesn't make sense: IMPALA-4495 http://gerrit.cloudera.org:8080/#/c/4833/2/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: PS2, Line 61: tuple row Is it a tuple row? I though it was a raw value pointer? -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5106 to look at the new patch set (#2). Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. IMPALA-4477: Upgrade Kudu version to latest master Change the toolchain build and Kudu version to use the latest master, using Kudu commit e836ac. Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 --- M bin/impala-config.sh 1 file changed, 5 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/5106/2 -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5106/1/bin/impala-config.sh File bin/impala-config.sh: PS1, Line 55: generated by the : # native-toolchain build when publishing its build artifacts > Can we expand this comment to indicate where this generated id can be found Done -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5106/1/bin/impala-config.sh File bin/impala-config.sh: PS1, Line 55: generated by the : # native-toolchain build when publishing its build artifacts Can we expand this comment to indicate where this generated id can be found? -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/5106 Change subject: IMPALA-4477: Upgrade Kudu version to latest master .. IMPALA-4477: Upgrade Kudu version to latest master Change the toolchain build and Kudu version to use the latest master, using Kudu commit e836ac. Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 --- M bin/impala-config.sh 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/5106/1 -- To view, visit http://gerrit.cloudera.org:8080/5106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting Adds support in the shell to report the number of modified rows for all DML operations, as well as the number of rows with errors. Testing: Added shell tests. Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 --- M be/src/service/impala-beeswax-server.cc M common/thrift/ImpalaService.thrift M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_commandline.py 5 files changed, 92 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/5103/2 -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5103/1/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: Line 532: insert_result->__set_num_row_errors(num_row_errors); > Can we leave this unset for DML where reporting the num_row_errors doesn't Done http://gerrit.cloudera.org:8080/#/c/5103/1/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 260: // Number of rows that weren't modified due to errors. Only applies to Kudu tables. > Can you qualify errors more precisely? These are constraint violations corr Done http://gerrit.cloudera.org:8080/#/c/5103/1/shell/impala_shell.py File shell/impala_shell.py: Line 928: errors_stmt = ", %d row error(s)" % (num_row_errors) > Let's avoid "stmt" since that is a somewhat overloaded abbreviation (aka SQ Done -- To view, visit http://gerrit.cloudera.org:8080/5103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. Patch Set 3: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/487/ -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. Patch Set 3: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/486/ -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. Patch Set 3: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/485/ -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. Patch Set 3: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/483/ -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0. .. Patch Set 5: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/484/ -- To view, visit http://gerrit.cloudera.org:8080/5050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. Patch Set 3: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4969 to look at the new patch set (#3). Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. IMPALA-4392: restore PeakMemoryUsage to DataSink profiles The join build sink patches refactored the DataSink interface and inadvertently removed this counter from the profile. The problem was that the sink MemTracker was not initialized with the sink's profile. The fix is for the sink to create the MemTracker itself. Testing: Ran core tests. Manually checked profile to make sure the counter appeared in HdfsTableSink, DataStreamSender, etc. Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/plan-root-sink.cc M be/src/exec/row-batch-cache.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/plan-fragment-executor.h 19 files changed, 64 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4969/3 -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0. .. Patch Set 5: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/482/ -- To view, visit http://gerrit.cloudera.org:8080/5050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Attila Jeges has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 2: > Thanks! > > Do you think I should add some code to catalog to validate the > value of 'initial_hms_cnxn_timeout_s'? What would be the acceptable > range? Also, what is the recommended way to validate a config parameter in catalog? A simple Preconditions.checkArgument() call would be sufficient? Adding this config parameter to the CM UI is another issue. Do you think it should be added for the 5.10 release or it is not that urgent? I guess, the default value is reasonable for most users and they can always set it as a safety valve if they have to. -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No