[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 05:16:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test Expected results from spillable-bufe-sizing.test and max-row-size.test included incorrect expressions: the un-anayzed GROUP BY expression with ordinals represented as (invalid) casts. SelectStmt is a bit of a mess. There are two copies of the grouping expressions. Here we want to use the analyzed version with the ordinals replaced. Testing: * Problem found when running PlannerTest. PlannerTest now passes, with correct results, after this change. * Turns out there is another path used when generating SQL for a view which does toSql() on an unanalyzed query. Added unit tests for this case. Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Reviewed-on: http://gerrit.cloudera.org:8080/11993 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 4 files changed, 88 insertions(+), 14 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Regenerate missing configuration files in run-all.sh
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12006 ) Change subject: Regenerate missing configuration files in run-all.sh .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1464/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f5955574d304b343e904851cfb9081648e350f8 Gerrit-Change-Number: 12006 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 29 Nov 2018 02:12:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1465/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 02:58:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/exec/plan-root-sink.cc@79 PS7, Line 79: while (results_ == nullptr && !state->is_cancelled()) sender_cv_.Wait(l); > Should we count the time spent here as inactive_time ? This seems to be the Yeah I like that idea. http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/runtime/coordinator.cc@234 PS7, Line 234: optional in the thrift > Should they become required then ? I'm not sure about the schema evolution rules - I think it's probably ok to switch an optional field to required if all of the producers were already setting it. I actually kind of think that optional is the right thing and it's a bug/oversight in impala-shell that it doesn't handle this properly. Maybe the ideal approach would be to fix impala-shell and at some point later make the corresponding change here, once we think it's unlikely that older versions of impala-shell will be used against the Impala version. I can file a JIRA and add a comment here if you agree. (I don't think we can be confident that older versions of impala-shell won't be used against a newer cluster, although I don't think we've ever had a clear compatibility policy stating that this should work) http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.h@307 PS7, Line 307: plan node > data sink Done http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc@962 PS7, Line 962: void RuntimeProfile::SetPlanNodeId(int node_id) { > Should we DCHECK that "data_sink_id" is not set here ? Done http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc@966 PS7, Line 966: void RuntimeProfile::SetDataSinkId(int sink_id) { > Should we DCHECK that "plan_node_id" is not set here ? Done http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/summary-util.cc File be/src/util/summary-util.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/summary-util.cc@79 PS7, Line 79: LOG(INFO) << label_ss.str() << " " << node.node_id; > Is this for debugging only ? Done http://gerrit.cloudera.org:8080/#/c/11967/7/common/thrift/RuntimeProfile.thrift File common/thrift/RuntimeProfile.thrift: http://gerrit.cloudera.org:8080/#/c/11967/7/common/thrift/RuntimeProfile.thrift@67 PS7, Line 67: // Set if this node corresponds to a plan node. : 1: optional i32 plan_node_id : : // Set if this node corresponds to a data sink. : 2: optional i32 data_sink_id > Are these two not supposed to be set at the same time ? if so, can you plea Done -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 02:26:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Hello Michael Ho, Lars Volker, Philip Zeyliger, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11967 to look at the new patch set (#8). Change subject: IMPALA-1048: show sinks in exec summary .. IMPALA-1048: show sinks in exec summary The exec summary now includes the total time taken and memory consumed by the data sink at the root of each fragment. Previously the exec summary could hide where time and memory went while executing a query. The high-level changes are: * Generalising logic in the exec summary and runtime profile to handle data sinks, not just plan nodes, including adding richer metadata to runtime profile nodes. * Threading through metadata about the data sinks, like names and estimates, so that it can appear in the exec summary. The major potential downside is that the new timings reported for data stream sender can overlap with the receiver's time and potentially cause confusion. [localhost:21000] default> select count(distinct l_comment) from tpch_parquet.lineitem; summary; Query: select count(distinct l_comment) from tpch_parquet.lineitem Query submitted at: 2018-11-20 16:47:03 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=f5464383a3bb6878:54b5252b +---+ | count(distinct l_comment) | +---+ | 4580667 | +---+ Fetched 1 row(s) in 4.13s +-++--+--+---++---+---+---+ | Operator| #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail| +-++--+--+---++---+---+---+ | F02:ROOT| 1 | 59.11ms | 59.11ms | || 0 B | 0 B | | | 06:AGGREGATE| 1 | 274.24us | 274.24us | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE | | 05:EXCHANGE | 1 | 75.16us | 75.16us | 3 | 1 | 32.00 KB | 16.00 KB | UNPARTITIONED | | F01:EXCHANGE SENDER | 3 | 119.53us | 146.28us | || 16.00 KB | 0 B | | | 02:AGGREGATE| 3 | 19.26ms | 19.89ms | 3 | 1 | 16.00 KB | 10.00 MB | | | 04:AGGREGATE| 3 | 1.06s| 1.07s| 4.58M | 4.65M | 96.02 MB | 62.63 MB | | | 03:EXCHANGE | 3 | 243.91ms | 246.44ms | 5.01M | 4.65M | 464.00 KB | 10.12 MB | HASH(l_comment) | | F00:EXCHANGE SENDER | 3 | 2.41s| 2.55s| || 337.53 KB | 0 B | | | 01:AGGREGATE| 3 | 1.05s| 1.14s| 5.01M | 4.65M | 97.20 MB | 121.17 MB | STREAMING | | 00:SCAN HDFS| 3 | 37.88ms | 41.28ms | 6.00M | 6.00M | 27.88 MB | 80.00 MB | tpch_parquet.lineitem | +-++--+--+---++---+---+---+ Testing: Added a basic observability test. Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/summary-util.cc M common/thrift/DataSinks.thrift M common/thrift/ExecStats.thrift M common/thrift/RuntimeProfile.thrift M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M shell/impala_client.py M tests/beeswax/impala_beeswax.py M tests/query_test/test_observability.py 36 files changed, 328 insertions(+), 123 deletions(-) git
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@384 PS6, Line 384: OpType.GET_FILE_BLOCK_LOCATIONS.getSymbol())); Can we assert non-zero list status calls? (same below) http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@392 PS6, Line 392: OpType.GET_FILE_BLOCK_LOCATIONS.getSymbol())); > Can't quite explain yet what the test is doing. Did trace though the code a I tried this patch locally and this test passes regardless of the if/else bug and that's kinda weird. I think we should narrow down the problem before committing this patch. -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 29 Nov 2018 02:03:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Regenerate missing configuration files in run-all.sh
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12006 Change subject: Regenerate missing configuration files in run-all.sh .. Regenerate missing configuration files in run-all.sh When bin/clean.sh runs, it removes some configuration files in fe/src/test/resources that are required for the minicluster to function. This changes testdata/bin/run-all.sh to detect when these configurations are missing and regenerate them. It detects issues due to running clean.sh, which is the most common case. It is not intended to fix anything else. Change-Id: I9f5955574d304b343e904851cfb9081648e350f8 --- M testdata/bin/run-all.sh 1 file changed, 10 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/12006/1 -- To view, visit http://gerrit.cloudera.org:8080/12006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9f5955574d304b343e904851cfb9081648e350f8 Gerrit-Change-Number: 12006 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-6924: Add child queries to profile in compute stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11977 ) Change subject: IMPALA-6924: Add child queries to profile in compute stats .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc File be/src/service/child-query.cc: http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc@104 PS2, Line 104: profile_->AddInfoString("Profile", get_profile_resp.profile); > Do we expose any way to get the actual Thrift profile with a client? I thou You're right. I think if we wanted to go that way it might be not that bad to plumb through - we could add an optional mode flag in TRuntimeProfileRequest and an optional TRuntimeProfileTree to TRuntimeProfileResponse. Or we could add a new method to ImpalaServer directly. Anyway, that's only relevant if we want to go down that path. I don't want the perfect to be the enemy of the useful here. -- To view, visit http://gerrit.cloudera.org:8080/11977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5006c3b366d381eed4687e550cdfc463be3d1350 Gerrit-Change-Number: 11977 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 01:38:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5200: Count child time for parent's total time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11791 ) Change subject: IMPALA-5200: Count child time for parent's total time .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1463/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id6c1191c39fd18b6be45325366a74cf54908c77e Gerrit-Change-Number: 11791 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 29 Nov 2018 01:31:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 01:27:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3503/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 01:27:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 01:26:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1462/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 01:23:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6924: Add child queries to profile in compute stats
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/11977 ) Change subject: IMPALA-6924: Add child queries to profile in compute stats .. Patch Set 2: (2 comments) Looks like a useful change http://gerrit.cloudera.org:8080/#/c/11977/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11977/2//COMMIT_MSG@9 PS2, Line 9: COMPUTE STATS triggers to child queries which do the actual stats Should this be "triggers some child queries"? Or "triggers two child queries"? http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.h File be/src/service/child-query.h: http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.h@129 PS2, Line 129: RuntimeProfile* profile_; Do you want a /// comment here? -- To view, visit http://gerrit.cloudera.org:8080/11977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5006c3b366d381eed4687e550cdfc463be3d1350 Gerrit-Change-Number: 11977 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 01:07:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11993 to look at the new patch set (#3). Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test Expected results from spillable-bufe-sizing.test and max-row-size.test included incorrect expressions: the un-anayzed GROUP BY expression with ordinals represented as (invalid) casts. SelectStmt is a bit of a mess. There are two copies of the grouping expressions. Here we want to use the analyzed version with the ordinals replaced. Testing: * Problem found when running PlannerTest. PlannerTest now passes, with correct results, after this change. * Turns out there is another path used when generating SQL for a view which does toSql() on an unanalyzed query. Added unit tests for this case. Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 4 files changed, 88 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/11993/3 -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 3: Tim, thanks for the review and for pointing out the test failure. Tracked down the bug. We have a whole pile of "to sql" code in the FE, called from the BE, without unit tests. This code works with an AST without analysis, which triggered the NPE we saw earlier. The new patch includes a starter test. I'll add others in a separate patch. Please take a second look when you get a chance. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 00:55:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5200: Count child time for parent's total time
Hello Philip Zeyliger, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11791 to look at the new patch set (#2). Change subject: IMPALA-5200: Count child time for parent's total time .. IMPALA-5200: Count child time for parent's total time One problem with the total time counter on runtime profiles is that a parent's time may not be updated if execution is stuck in a child node. The child can accumulate time while the parent is stuck at zero. This leads to incorrect or misleading calculations of total time or non-child time for the parent node during execution. This makes a modest change in calculation for total time for parent nodes. It takes advantage of the fact that the parent should count all of the time from all of its children as total time for itself. Specifically, if a parent has accumulated X in its total timer and its children have accumulated Y summed across all of their timers, then a parent's total time should be at least max(X, Y). There is no way to know the appropriate overlap between X and Y, so this uses a conservative calculation assuming complete overlap. This prevents a parent node from reporting itself as 100% non-child time when it is actually stuck executing child code. However, it does not help if a child node is stuck and is not reporting its own time. Testing: - Added test case to runtime-profile-test - Core tests pass Change-Id: Id6c1191c39fd18b6be45325366a74cf54908c77e --- M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h 3 files changed, 122 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/11791/2 -- To view, visit http://gerrit.cloudera.org:8080/11791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id6c1191c39fd18b6be45325366a74cf54908c77e Gerrit-Change-Number: 11791 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1461/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 29 Nov 2018 00:39:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6924: Add child queries to profile in compute stats
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11977 ) Change subject: IMPALA-6924: Add child queries to profile in compute stats .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc File be/src/service/child-query.cc: http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc@104 PS2, Line 104: profile_->AddInfoString("Profile", get_profile_resp.profile); > Did you think about splicing the profiles together with AddChild(). It feel Do we expose any way to get the actual Thrift profile with a client? I thought we only exposed it as a string, eg. that's what GetRuntimeProfile() returns -- To view, visit http://gerrit.cloudera.org:8080/11977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5006c3b366d381eed4687e550cdfc463be3d1350 Gerrit-Change-Number: 11977 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 00:27:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 2: Code-Review+1 LGTM. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 29 Nov 2018 00:07:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc@47 PS1, Line 47: "Last report received time"; > I am nit-picking somewhat, but this is really 'Last received time' from the How about "Last report received time". "received time" by itself may not be very clear on what was received. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@95 PS1, Line 95: if report_time < report_time_dict.get(node.name, datetime.min): > What happens here the first time through, when node.name does not yet exist The second argument (datetime.min) is the default value if the key doesn't exist. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@111 PS1, Line 111: assert num_time_backward <= ceil(elapsed_time / MIN_NTP_POLL_PERIOD) > Should this be num_time_backward? time_backward is a boolean, and shouldn't Nice catch. Fixed. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@112 PS1, Line 112: s > flake8: F841 local variable 'results' is assigned to but never used Done -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 29 Nov 2018 00:04:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Hello Balazs Jeszenszky, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12000 to look at the new patch set (#2). Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report received time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 85 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/2 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/exec/plan-root-sink.cc@79 PS7, Line 79: while (results_ == nullptr && !state->is_cancelled()) sender_cv_.Wait(l); Should we count the time spent here as inactive_time ? This seems to be the convention for other plan nodes and DataStreamSender. Didn't dive into other hdfs table writer sinks to see if inactive time also applies there. http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/runtime/coordinator.cc@234 PS7, Line 234: optional in the thrift Should they become required then ? http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.h@307 PS7, Line 307: plan node data sink http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc@962 PS7, Line 962: void RuntimeProfile::SetPlanNodeId(int node_id) { Should we DCHECK that "data_sink_id" is not set here ? http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc@966 PS7, Line 966: void RuntimeProfile::SetDataSinkId(int sink_id) { Should we DCHECK that "plan_node_id" is not set here ? http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/summary-util.cc File be/src/util/summary-util.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/summary-util.cc@79 PS7, Line 79: LOG(INFO) << label_ss.str() << " " << node.node_id; Is this for debugging only ? http://gerrit.cloudera.org:8080/#/c/11967/7/common/thrift/RuntimeProfile.thrift File common/thrift/RuntimeProfile.thrift: http://gerrit.cloudera.org:8080/#/c/11967/7/common/thrift/RuntimeProfile.thrift@67 PS7, Line 67: // Set if this node corresponds to a plan node. : 1: optional i32 plan_node_id : : // Set if this node corresponds to a data sink. : 2: optional i32 data_sink_id Are these two not supposed to be set at the same time ? if so, can you please add a comment ? -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:53:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 ) Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1460/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:46:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 ) Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1459/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:36:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 ) Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/runtime/io/hdfs-monitored-ops.cc File be/src/runtime/io/hdfs-monitored-ops.cc: http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/runtime/io/hdfs-monitored-ops.cc@46 PS8, Line 46: OpenHdfsFileOp(const hdfsFS& fs, const std::string* fname, int flags, uint64_t blocksize) > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/runtime/io/hdfs-monitored-ops.cc@81 PS8, Line 81: Status HdfsMonitor::OpenHdfsFileWTimeout(const hdfsFS& fs, const std::string* fname, int flags, > line too long (95 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:05:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Hello Michael Ho, Philip Zeyliger, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11874 to look at the new patch set (#9). Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. IMPALA-7738: Implement timeouts for HDFS open calls This is part 1 of a push to add timeouts for all HDFS operations. It adds timeouts for opening an HDFS file handle. It introduces a new SynchronousThreadPool, which executes an operation in a thread pool and waits up to a specified timeout for the operation to complete. This type of thread pool can accept any subclass of SynchronousWorkItem, and a single thread pool can process different types of work items. It is tested by a new test case in thread-pool-test. This also introduces a new HdfsMonitor which implements timeouts for HDFS operations, currently limited to hdfsOpenFile(). This is implemented using a SynchronousThreadPool. The timeout for hdfs operations is specified by hdfs_operation_timeout_sec, which defaults to 5 minutes. Testing: 1. Added a test to thread-pool-test for the new SynchronousThreadPool. 2. Core tests Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/handle-cache.h M be/src/runtime/io/handle-cache.inline.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h A be/src/runtime/io/hdfs-monitored-ops.cc A be/src/runtime/io/hdfs-monitored-ops.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M common/thrift/generate_error_codes.py 12 files changed, 480 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/11874/9 -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 ) Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/util/thread-pool.h File be/src/util/thread-pool.h: http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/util/thread-pool.h@254 PS8, Line 254: virtual std::string GetDescription() { : DCHECK(false) << "GetDescription() must be implemented"; : return ""; : } Not 100% happy about requiring this function, but the reasoning behind it is that it allows the SynchronousThreadPool to construct the timeout error message with a customized error message. -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:03:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 ) Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/runtime/io/hdfs-monitored-ops.cc File be/src/runtime/io/hdfs-monitored-ops.cc: http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/runtime/io/hdfs-monitored-ops.cc@46 PS8, Line 46: OpenHdfsFileOp(const hdfsFS& fs, const std::string* fname, int flags, uint64_t blocksize) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/11874/8/be/src/runtime/io/hdfs-monitored-ops.cc@81 PS8, Line 81: Status HdfsMonitor::OpenHdfsFileWTimeout(const hdfsFS& fs, const std::string* fname, int flags, line too long (95 > 90) -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:03:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Hello Michael Ho, Philip Zeyliger, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11874 to look at the new patch set (#8). Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. IMPALA-7738: Implement timeouts for HDFS open calls This is part 1 of a push to add timeouts for all HDFS operations. It adds timeouts for opening an HDFS file handle. It introduces a new SynchronousThreadPool, which executes an operation in a thread pool and waits up to a specified timeout for the operation to complete. This type of thread pool can accept any subclass of SynchronousWorkItem, and a single thread pool can process different types of work items. It is tested by a new test case in thread-pool-test. This also introduces a new HdfsMonitor which implements timeouts for HDFS operations, currently limited to hdfsOpenFile(). This is implemented using a SynchronousThreadPool. The timeout for hdfs operations is specified by hdfs_operation_timeout_sec, which defaults to 5 minutes. Testing: 1. Added a test to thread-pool-test for the new SynchronousThreadPool. 2. Core tests Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/handle-cache.h M be/src/runtime/io/handle-cache.inline.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h A be/src/runtime/io/hdfs-monitored-ops.cc A be/src/runtime/io/hdfs-monitored-ops.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M common/thrift/generate_error_codes.py 12 files changed, 479 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/11874/8 -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6293: Unset JAVA TOOL OPTIONS variable from shell commands
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12005 ) Change subject: IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1458/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 Gerrit-Change-Number: 12005 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 22:34:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6293: Unset JAVA TOOL OPTIONS variable from shell commands
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/12005 ) Change subject: IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands .. Patch Set 3: > Patch Set 1: > > I'm vaguely uncomfortable with the non-genericness of the solution here. If > we wanted to expose generic "unset" options, we could model that as a > set, and act accordingly. I think defaulting to clearing out > JAVA_TOOL_OPTIONS is pretty sneaky. You also would need to think about shell > escaping and so on (or verifying that these are actually variables.) > > In practice, maybe just change: > > RunShellProcess(FLAGS_s3a_access_key_cmd, _access_key_, true) > > to > > RunShellProcess("unset JAVA_TOOL_OPTIONS; " + FLAGS_s3a_access_key_cmd, > _access_key_, true) > (or use string::substitute or whatever) > > in the three or so places you need it? I updated the code to make RunShellProcess more generic. I prefer putting RunShellProcess so that I can easily unit test it. Furthermore, this makes RunShellProcess somewhat similar (I know it's not quite the same) to Python's Popen with env parameter. -- To view, visit http://gerrit.cloudera.org:8080/12005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 Gerrit-Change-Number: 12005 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 22:03:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6293: Unset JAVA TOOL OPTIONS variable from shell commands
Fredy Wijaya has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/12005 ) Change subject: IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands .. IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands JAVA_TOOL_OPTIONS is a variable used by JVM to specify the JVM options. Impala has several flags that specify shell commands for Impala to run, such as - s3a_access_key_cmd - s3a_secret_key_cmd - ssl_private_key_password_cmd - webserver_private_key_password_cmd When debugging the JVM inside the Impala process, it is useful to specify JAVA_TOOL_OPTIONS to run the Java debugger on a particular port. However, JAVA_TOOL_OPTIONS remains in the environment, so it is passed to these shell commands. If any of these shell commands run Java, then that JVM will attempt to use the JAVA_TOOL_OPTIONS specified and thus try to bind to the same port. The Impala process JVM is already bound to that port, so this will fail. This patch fixes the issue by unsetting JAVA_TOOL_OPTIONS before running these shell commands. Testing: - Added a new test for os-util.cc Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 --- M be/src/rpc/thrift-server.cc M be/src/runtime/hdfs-fs-cache.cc M be/src/util/CMakeLists.txt A be/src/util/os-util-test.cc M be/src/util/os-util.cc M be/src/util/os-util.h M be/src/util/webserver.cc 7 files changed, 89 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/12005/3 -- To view, visit http://gerrit.cloudera.org:8080/12005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 Gerrit-Change-Number: 12005 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11992 ) Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1457/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 21:37:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 1: (3 comments) Thanks for doing this. http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc@47 PS1, Line 47: "Last report time"; I am nit-picking somewhat, but this is really 'Last received time' from the coordinator's PoV, right? I wonder if it might be better to rename it as such, so that we don't have to explain what the metric means. I don't feel too strongly about this, so I leave it to you. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@95 PS1, Line 95: if report_time < report_time_dict.get(node.name, datetime.min): What happens here the first time through, when node.name does not yet exist in the dict? http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@111 PS1, Line 111: assert time_backward <= ceil(elapsed_time / MIN_NTP_POLL_PERIOD) Should this be num_time_backward? time_backward is a boolean, and shouldn't be used outside of the above while loop. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 28 Nov 2018 21:25:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6293: Unset JAVA TOOL OPTIONS variable from shell commands
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12005 ) Change subject: IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1456/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 Gerrit-Change-Number: 12005 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 21:14:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11992 to look at the new patch set (#8). Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. IMPALA-2343: Add lifecycle timeline to plan nodes Track the time when various significant events in the lifecycle of an ExecNode occur the timeline of fragment execution. The events tracked are: 'Open Started', 'Open Finished', 'First Batch Fetched', 'First Batch Returned', 'Last Batch Returned', 'Closed'. This uses the existing EventSequence infrastructure so that time will correspond to the fragment instance lifecycle timelines. It's implemented mostly using scoped objects that add the events when entering or exiting Open() and GetNext(). These times are not set inside subplans because it would be mostly redundant with the counters in the containing subplan node. Testing: Added a basic test to verify that the event sequence is present. Perf: Ran TPC-H 10 locally. There was no significant perf change. Ran TPC-H Nested locally. There was no significant perf change. Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 --- M be/src/exec/aggregation-node.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/cardinality-check-node.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/empty-set-node.cc M be/src/exec/empty-set-node.h M be/src/exec/exchange-node.cc A be/src/exec/exec-node-util.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/select-node.cc M be/src/exec/sort-node.cc M be/src/exec/streaming-aggregation-node.cc M be/src/exec/subplan-node.cc M be/src/exec/topn-node.cc M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M tests/query_test/test_observability.py 24 files changed, 213 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/11992/8 -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11992 ) Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node-util.h File be/src/exec/exec-node-util.h: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node-util.h@57 PS7, Line 57: Fetched > nit: maybe reword to "First Batch Requested" to indicate that it is still i Done http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h@96 PS7, Line 96: virtual Status Open(RuntimeState* state) WARN_UNUSED_RESULT; Responding to your top-level comment here so we can have a proper thread. I'd thought about that but decided not to bite it off right now. I also thought about changing this so that there's a non-virtual GetNext()/Open() function in the base class that calls out to an implementation but I don't think that actually works since there are some special case ExecNodes that don't have the standard preamble. I feel like the macro approach might be easier to understand overall even if it leaves in some boilerplate. http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h@260 PS7, Line 260: were > nit: was Done http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/hbase-scan-node.cc@153 PS7, Line 153: // For GetNext, most of the time is spent in HBaseTableScanner::ResultScanner_next, : // but there's still some considerable time inside here. > I don't think this comment helps much and would be good with dropping it al Done http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc@92 PS7, Line 92: DCHECK(row_batch != NULL); Removed this DCHECK since it seems unnecessary (unlikely to be triggered and we'd find out that it was null soon enough anyway if it was). http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc@96 PS7, Line 96: SCOPED_TIMER(materialize_tuple_timer()); > Can you add a brief comment why this timer gets initialized after the pream I was kind of ignoring this because I didn't want to deal with it :). I think this timer is actually in the wrong place entirely. The equivalent timer in HDFS is measured in the scanner thread, since that's the thread that actually does the work to materialise the batch. I can fix in this patch if you are ok with that. http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/singular-row-src-node.cc File be/src/exec/singular-row-src-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/singular-row-src-node.cc@33 PS7, Line 33: ScopedOpenEventAdder ea(this); > From the FE comment in SingularRowSrcNode is seems that this will always ru Good point. I'll just revert all the changes to this file. http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/unnest-node.cc File be/src/exec/unnest-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/unnest-node.cc@142 PS7, Line 142: // Avoid expensive query maintenance overhead for small collections. > Did you omit the instrumentation here because it is expensive? Yep, and this is always in a subplan too. I removed the instrumentation about too, it wasn't necessary. -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 21:05:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6293: Unset JAVA TOOL OPTIONS variable from shell commands
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12005 ) Change subject: IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands .. Patch Set 1: I'm vaguely uncomfortable with the non-genericness of the solution here. If we wanted to expose generic "unset" options, we could model that as a set, and act accordingly. I think defaulting to clearing out JAVA_TOOL_OPTIONS is pretty sneaky. You also would need to think about shell escaping and so on (or verifying that these are actually variables.) In practice, maybe just change: RunShellProcess(FLAGS_s3a_access_key_cmd, _access_key_, true) to RunShellProcess("unset JAVA_TOOL_OPTIONS; " + FLAGS_s3a_access_key_cmd, _access_key_, true) (or use string::substitute or whatever) in the three or so places you need it? -- To view, visit http://gerrit.cloudera.org:8080/12005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 Gerrit-Change-Number: 12005 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 21:02:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1454/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 20:49:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6293: Unset JAVA TOOL OPTIONS variable from shell commands
Fredy Wijaya has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12005 Change subject: IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands .. IMPALA-6293: Unset JAVA_TOOL_OPTIONS variable from shell commands JAVA_TOOL_OPTIONS is a variable used by JVM to specify the JVM options. Impala has several flags that specify shell commands for Impala to run, such as - s3a_access_key_cmd - s3a_secret_key_cmd - ssl_private_key_password_cmd - webserver_private_key_password_cmd When debugging the JVM inside the Impala process, it is useful to specify JAVA_TOOL_OPTIONS to run the Java debugger on a particular port. However, JAVA_TOOL_OPTIONS remains in the environment, so it is passed to these shell commands. If any of these shell commands run Java, then that JVM will attempt to use the JAVA_TOOL_OPTIONS specified and thus try to bind to the same port. The Impala process JVM is already bound to that port, so this will fail. This patch fixes the issue by unsetting JAVA_TOOL_OPTIONS before running these shell commands. Testing: - Added a new test for os-util. Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 --- M be/src/util/CMakeLists.txt A be/src/util/os-util-test.cc M be/src/util/os-util.cc M be/src/util/os-util.h 4 files changed, 64 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/12005/1 -- To view, visit http://gerrit.cloudera.org:8080/12005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I96a84a27d3bf7b02f04e70562acfd67386b93183 Gerrit-Change-Number: 12005 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya
[Impala-ASF-CR] IMPALA-7889: Write new logical types in Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12004 ) Change subject: IMPALA-7889: Write new logical types in Parquet .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1455/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91 Gerrit-Change-Number: 12004 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 28 Nov 2018 20:58:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11947 ) Change subject: IMPALA-6656: BufferAllocator observability .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1453/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 28 Nov 2018 20:45:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7889: Write new logical types in Parquet
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12004 Change subject: IMPALA-7889: Write new logical types in Parquet .. IMPALA-7889: Write new logical types in Parquet Fill the LogicalType field in Parquet schemas for columns that have an associated logical type. ConvertedType still has to be filled to remain compatible with older readers. Testing: - added new tests to check both logical and converted types to test_insert_parquet.py Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91 --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M tests/query_test/test_insert_parquet.py M tests/util/get_parquet_metadata.py 5 files changed, 161 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/12004/2 -- To view, visit http://gerrit.cloudera.org:8080/12004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91 Gerrit-Change-Number: 12004 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-7889: Write new logical types in Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12004 ) Change subject: IMPALA-7889: Write new logical types in Parquet .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/12004/2/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/12004/2/tests/query_test/test_insert_parquet.py@776 PS2, Line 776: flake8: W292 no newline at end of file http://gerrit.cloudera.org:8080/#/c/12004/2/tests/util/get_parquet_metadata.py File tests/util/get_parquet_metadata.py: http://gerrit.cloudera.org:8080/#/c/12004/2/tests/util/get_parquet_metadata.py@186 PS2, Line 186: flake8: W292 no newline at end of file -- To view, visit http://gerrit.cloudera.org:8080/12004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91 Gerrit-Change-Number: 12004 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 28 Nov 2018 20:15:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11967/6/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/11967/6/shell/impala_client.py@211 PS6, Line 211: > flake8: E203 whitespace before ',' Done -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 20:10:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Hello Michael Ho, Lars Volker, Philip Zeyliger, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11967 to look at the new patch set (#7). Change subject: IMPALA-1048: show sinks in exec summary .. IMPALA-1048: show sinks in exec summary The exec summary now includes the total time taken and memory consumed by the data sink at the root of each fragment. Previously the exec summary could hide where time and memory went while executing a query. The high-level changes are: * Generalising logic in the exec summary and runtime profile to handle data sinks, not just plan nodes, including adding richer metadata to runtime profile nodes. * Threading through metadata about the data sinks, like names and estimates, so that it can appear in the exec summary. The major potential downside is that the new timings reported for data stream sender can overlap with the receiver's time and potentially cause confusion. [localhost:21000] default> select count(distinct l_comment) from tpch_parquet.lineitem; summary; Query: select count(distinct l_comment) from tpch_parquet.lineitem Query submitted at: 2018-11-20 16:47:03 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=f5464383a3bb6878:54b5252b +---+ | count(distinct l_comment) | +---+ | 4580667 | +---+ Fetched 1 row(s) in 4.13s +-++--+--+---++---+---+---+ | Operator| #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail| +-++--+--+---++---+---+---+ | F02:ROOT| 1 | 59.11ms | 59.11ms | || 0 B | 0 B | | | 06:AGGREGATE| 1 | 274.24us | 274.24us | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE | | 05:EXCHANGE | 1 | 75.16us | 75.16us | 3 | 1 | 32.00 KB | 16.00 KB | UNPARTITIONED | | F01:EXCHANGE SENDER | 3 | 119.53us | 146.28us | || 16.00 KB | 0 B | | | 02:AGGREGATE| 3 | 19.26ms | 19.89ms | 3 | 1 | 16.00 KB | 10.00 MB | | | 04:AGGREGATE| 3 | 1.06s| 1.07s| 4.58M | 4.65M | 96.02 MB | 62.63 MB | | | 03:EXCHANGE | 3 | 243.91ms | 246.44ms | 5.01M | 4.65M | 464.00 KB | 10.12 MB | HASH(l_comment) | | F00:EXCHANGE SENDER | 3 | 2.41s| 2.55s| || 337.53 KB | 0 B | | | 01:AGGREGATE| 3 | 1.05s| 1.14s| 5.01M | 4.65M | 97.20 MB | 121.17 MB | STREAMING | | 00:SCAN HDFS| 3 | 37.88ms | 41.28ms | 6.00M | 6.00M | 27.88 MB | 80.00 MB | tpch_parquet.lineitem | +-++--+--+---++---+---+---+ Testing: Added a basic observability test. Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/summary-util.cc M common/thrift/DataSinks.thrift M common/thrift/ExecStats.thrift M common/thrift/RuntimeProfile.thrift M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M shell/impala_client.py M tests/beeswax/impala_beeswax.py M tests/query_test/test_observability.py 36 files changed, 321 insertions(+), 122 deletions(-) git
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11947 ) Change subject: IMPALA-6656: BufferAllocator observability .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 28 Nov 2018 20:10:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Hello Michael Ho, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11947 to look at the new patch set (#4). Change subject: IMPALA-6656: BufferAllocator observability .. IMPALA-6656: BufferAllocator observability Adds a set of metrics per allocator arena in the buffer pool that help understand how buffers are being allocated and how much time is spent in the system allocator (i.e. TCMalloc). These are low level metrics that require some interpretation but provide visibility into behaviour that was previously totally opaque. Also tracks the total time spent in the system allocator in the query profile, to provide clues if time spent in TCMalloc is a perf issue for a particular query (e.g. if it's hitting a lot of lock contention). Backend tests required tweaks to avoid double-registration of the new metrics. Also switch default sort in /metrics to be by name, so that it's easier to locate metrics. Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 --- M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/suballocator-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/util/metrics.h M common/thrift/metrics.json M www/metric_group.tmpl 14 files changed, 272 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/11947/4 -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11947 ) Change subject: IMPALA-6656: BufferAllocator observability .. Patch Set 3: Code-Review+1 (2 comments) carry http://gerrit.cloudera.org:8080/#/c/11947/3/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: http://gerrit.cloudera.org:8080/#/c/11947/3/be/src/runtime/bufferpool/buffer-allocator.cc@38 PS3, Line 38: // various useful statistics that would be too expensive to update every allocation. > nit: Maybe it's worth to mention that we should keep it as a power-of-2, so Done http://gerrit.cloudera.org:8080/#/c/11947/3/be/src/runtime/bufferpool/buffer-allocator.cc@367 PS3, Line 367: int64_t sys_alloc_time = sys_alloc_sw.ElapsedTime(); > Shouldn't we move this line up by one line to only measure the allocation t Done -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 28 Nov 2018 20:06:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/11995 ) Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11995/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11995/1//COMMIT_MSG@9 PS1, Line 9: Replaces all use of ArrayList in : variable and method declarations with the interface List. Instead of just ArrayList, why don't to go further to clean up all other collections, like Map and Set? http://gerrit.cloudera.org:8080/#/c/11995/1/fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java File fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java: http://gerrit.cloudera.org:8080/#/c/11995/1/fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java@22 PS1, Line 22: import jline.internal.Preconditions; This is probably unintentional and should use Guava Preconditions instead. -- To view, visit http://gerrit.cloudera.org:8080/11995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a Gerrit-Change-Number: 11995 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 28 Nov 2018 17:46:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7865: Repeated type widening of arithmetic expressions
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/11952 ) Change subject: IMPALA-7865: Repeated type widening of arithmetic expressions .. Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/11952/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11952/2//COMMIT_MSG@10 PS2, Line 10: substututing typo: substituting http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@185 PS2, Line 185: though typo: through http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@704 PS2, Line 704: t nit: use upper T to make it consistent the rest of the tests http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@710 PS2, Line 710: // The following silliness is expected, if not wanted. : // The rewriter cannot (yet) optimize trailing constants. : // Types widen with each math operation. : expr = RewritesOk("tinyint_col + 1 + 1", : FoldConstantsRule.INSTANCE, null); : assertEquals(ScalarType.INT, expr.getType()); : expr = RewritesOk("tinyint_col + 1 + 1 + 1", : FoldConstantsRule.INSTANCE, null); : assertEquals(ScalarType.BIGINT, expr.getType()); Is it not possible to get all expressions and figure out the highest type width it can fit? For example: tinyint_col + 1 + bigint_col will be BIGINT. tinyint_col + 1 + smallint_col will be SMALLINT, etc. http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@242 PS2, Line 242: nit: remove extra new line http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@265 PS2, Line 265: nit: remove extra new line http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@299 PS2, Line 299: nit: remove extra new line http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@315 PS2, Line 315: nit: remove extra new line http://gerrit.cloudera.org:8080/#/c/11952/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@331 PS2, Line 331: nit: remove extra new line http://gerrit.cloudera.org:8080/#/c/11952/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test: http://gerrit.cloudera.org:8080/#/c/11952/2/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@59 PS2, Line 59: functional.alltypes.tinyint_col < 2 it's not quite clear to me why the order changed here. -- To view, visit http://gerrit.cloudera.org:8080/11952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a8f6d1c508a289083b5e026f33581bc44117ca2 Gerrit-Change-Number: 11952 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Wed, 28 Nov 2018 17:29:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11793 ) Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function .. Patch Set 3: Code-Review+1 (1 comment) Soft +2 for the qgen changes. http://gerrit.cloudera.org:8080/#/c/11793/3/tests/comparison/compat.py File tests/comparison/compat.py: http://gerrit.cloudera.org:8080/#/c/11793/3/tests/comparison/compat.py@21 PS3, Line 21: s Oops, I had a typo here. Should be "exist". -- To view, visit http://gerrit.cloudera.org:8080/11793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb Gerrit-Change-Number: 11793 Gerrit-PatchSet: 3 Gerrit-Owner: Greg Rahn Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 16:26:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11947 ) Change subject: IMPALA-6656: BufferAllocator observability .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/11947/3/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: http://gerrit.cloudera.org:8080/#/c/11947/3/be/src/runtime/bufferpool/buffer-allocator.cc@38 PS3, Line 38: // various useful statistics that would be too expensive to update every allocation. nit: Maybe it's worth to mention that we should keep it as a power-of-2, so the calculation of the modulo remain more efficient. http://gerrit.cloudera.org:8080/#/c/11947/3/be/src/runtime/bufferpool/buffer-allocator.cc@367 PS3, Line 367: int64_t sys_alloc_time = sys_alloc_sw.ElapsedTime(); Shouldn't we move this line up by one line to only measure the allocation time? -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 28 Nov 2018 15:30:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 ) Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1452/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 28 Nov 2018 13:00:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 ) Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. Patch Set 4: Patch set 4 is a rebase + trivial conflict resolution. The rebase included https://gerrit.cloudera.org/#/c/11949/, which moved parquet*.h/.cc files to a new directory. -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 28 Nov 2018 12:28:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11984 to look at the new patch set (#4). Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. IMPALA-7853: Add support to read int64 NANO timestamps from Parquet PARQUET-1387 added int64 timestamps with nanosecond precision that stores timestamps as nanoseconds since the Unix epoch. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, this new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be only described with new logical types in Parquet, it has no converted type equivalent. This made implementing CREATE TABLE LIKE PARQUET less trivial than it was for MICRO/MILLI: the type conversion logic in ParquetHelper.java had to be rewritten to use LogicalTypeAnnotation instead of ConvertedType. The changes on Java side also made bumping CDH_BUILD_NUMBER necessary. Testing: - added a new testfile with int64 nano timestamps - ran core tests Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f --- M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M bin/impala-config.sh M common/thrift/parquet.thrift M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java M testdata/data/README A testdata/data/int64_timestamps_nano.parquet M testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test M tests/query_test/test_scanners.py 13 files changed, 174 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/11984/4 -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 10:28:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query This patch fixes the issue with Ctrl+C handling for cancelling a non-running query to behave similar to Linux shell. Before (pressing Ctrl+C does not do anything): [localhost:21000] default> select After (pressing Ctrl+C cancels the query and starts a new prompt): [localhost:21000] default> select^C [localhost:21000] default> Testing: - Added a new cancellation test - Ran all shell E2E tests Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Reviewed-on: http://gerrit.cloudera.org:8080/11990 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M shell/impala_shell.py M tests/custom_cluster/test_client_ssl.py M tests/shell/test_shell_interactive.py 3 files changed, 8 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 6 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger