[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 2: (7 comments) Thanks for doing this patch. This will help reduce a lot of unnecessary network traffic. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165 PS2, Line 165: typedef TUniqueId RegistrationId; You can move this typedef to statestore.h and use the same type in statestore.h/cc too. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383 PS2, Line 383: SubscriberId Do we even need to store the SubscriberId here? Can't we just store a unique registration ID? Where ever the 'SubscriberId' is required, we just get it from the Subscriber object anyway, and that object can be retrieved using the unique registration id. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481 PS2, Line 481: subscriber exists Could you clarify what "exists" means here exactly? It could be confused with a node just existing as a part of the cluster. I think we want to say that it exists in the subscribers_ map. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482 PS2, Line 482: registration_ids nit: registration_id http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484 PS2, Line 484: std::shared_ptr* subscriber Add a comment about what is returned in this out parameter. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414 PS2, Line 414: onst SubscriberId& subscriber_id, : const TUniqueId& registration_id It looks like it just makes sense to pass 'const Subscriber&' here? Is there a case where we would not get a subscriber_id and a registration_id from the same Subscriber object while calling this function? http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634 PS2, Line 634: if (!RegisteredSubscriberExists(subscriber_to_update.first, subscriber_to_update.second, I'm a little worried that we're contending for a mutex two more times in this function. Do you anticipate any performance regression due to increased context switching? Consider using a spin lock if we won't have more than ~1000 entries in the map at one time. (unordered_map has a worst-case O(N) time complexity) -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 02 Nov 2017 03:19:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Hello Lars Volker, Laszlo Gaal, Zoltan Borok-Nagy, Attila Jeges, Tim Armstrong, Csaba Ringhofer, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8447 to look at the new patch set (#2). Change subject: IMPALA-2181: Add query option levels for display .. IMPALA-2181: Add query option levels for display Three display levels are introduced for each query option: REGULAR, ADVANCED and DEPRECATED. When the query options are diplayed in Impala shell using SET then only the REGULAR options are shown. A new command called SET ALL shows all the options grouped by their option levels. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 --- M be/src/service/child-query.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_interactive.py 9 files changed, 242 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8447/2 -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8038 ) Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 17: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1425/ -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-Change-Number: 8038 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Thu, 02 Nov 2017 01:28:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8449 Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 65 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/2 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8408 ) Change subject: IMPALA-6121: remove I/O mgr request context cache .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108 Gerrit-Change-Number: 8408 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 02 Nov 2017 00:45:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8419 ) Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536 .. IMPALA-5541: Reject BATCH_SIZE greater than 65536 Setting a very large value could cause strange behaviour like crashing, hanging, excessive memory usage, spinning etc. This patch rejects values out of the range [0,65536]. Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97 Reviewed-on: http://gerrit.cloudera.org:8080/8419 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc 2 files changed, 12 insertions(+), 3 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97 Gerrit-Change-Number: 8419 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8419 ) Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536 .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97 Gerrit-Change-Number: 8419 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 02 Nov 2017 00:45:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8405 ) Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 02 Nov 2017 00:40:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8405 ) Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). This is a revert of a revert, re-enabling parallel data load. It avoid the race condition by explicitly configuring the temporary directory in question in load-data.py. When the parallel data load change went in, we discovered a race with a signature of: java.io.FileNotFoundException: File /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist The number in this path is milliseconds since the epoch, and the race occurs when two queries submitted to HiveServer2, running with the local runner, hit the same millisecond time stamp. The upstream bug is https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which is now marked as a dupe). I've tested this by running data load 5 times on the same machines where it failed before. I also ran data load manually and inspected the system to make sure that the temporary directories are getting created as expected in /tmp/impala-data-load-*. Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Reviewed-on: http://gerrit.cloudera.org:8080/8405 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 4 files changed, 59 insertions(+), 6 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 6: (8 comments) Here's my last set of comments for this round. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@59 PS6, Line 59: boost::bind I don't have a strong preference either way, but it'd be nice to be consistent between either using bind or [], rather than both... http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@322 PS6, Line 322: RespondToTimedOutSenders shouldn't be plural http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@52 PS6, Line 52: overflows the queue it's not clear what that means just from reading the comment. It'd be nice to briefly explain that this is talking about the soft limit of the number of bytes across all sender queues. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@105 PS6, Line 105: // number of pending row batch insertion. : int num_pending_enqueue_ = 0; it's not clear what a "pending insertion" means or why we have this. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@113 PS6, Line 113: RowBatch* how about using unique_ptr since this owns the row batch (until it's transferred to current_batch_)? http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@127 PS6, Line 127: queue> blocked_senders_; given that a single soft limit is imposed across all sender queues, does it make sense that the blocked_senders_ are maintained per sender? Why don't we maintain a single blocked_senders_ list per datastream recvr? Hmm, I guess we need to know if this sender has a blocked sender in GetBatch(). But given the single limit, it seems wrong that one sender's row batches can bypass another sender once we get into the blocked sender situation. i.e. the flow of batches across senders seems quite different depending on when the limit was reached. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@231 PS6, Line 231: proto_batch update http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@322 PS6, Line 322: payload->rpc_context->RespondSuccess(); doing this in Close() goes against the paradigm that Close() is only about releasing (local) resources. We've been going that way because there might be no where to bubble up a status from Close(). At least RespondSuccess() doesn't return a status, I suppose. But is there any place sooner we could do this? Does it make sense to do in during Cancel instead? -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-Change-Number: 8023 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 02 Nov 2017 00:10:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Correct log line in start-impala-cluster.py.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8432 ) Change subject: Correct log line in start-impala-cluster.py. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py@398 PS1, Line 398: executors = options.cluster_size - options.num_coordinators Could executors be negative due to the same bug? -- To view, visit http://gerrit.cloudera.org:8080/8432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e Gerrit-Change-Number: 8432 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Wed, 01 Nov 2017 23:58:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 9: (14 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13 PS9, Line 13: In RuntimeFilterGenerator in the planner, each partitioned hash join ... each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those ... http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: For now, min-max filters are only applied at the KuduScanner, which Not specific to the code changes, and I don't expect a response here (probably too long :)). How do the existing query options around runtime filters affect the new min/max filters on Kudu? For example, what does DISABLE_ROW_RUNTIME_FILTERING mean for the Kudu min/max filters? How should users think about setting: RUNTIME_FILTER_WAIT_TIME_MS In particular, are min/max filters more effective against Kudu PK or partition columns? http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41 PS9, Line 41: Perf Testing: Contrived extreme queries are good data points, but how about running the TPCH/DS perf suites against Kudu? http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44 PS9, Line 44: - Ran a contrived query with a filter that does eliminate any rows does not eliminate http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202 PS9, Line 202: // Maximum number of bloom runtime filters allowed per query I think I understand why you did this, but it seems confusing from a user's perspective. Ok to leave, but do you have a story around the eventual meaning of existing query options when HDFS can do min/max and Kudu can do bloom? http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java@730 PS9, Line 730: output.append(Joiner.on(", ").join(filtersStr) + "\n"); just return the string? don't think we need the 'output' StringBuilder http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@111 PS9, Line 111: private final Operator joinOp_; Let's call this cmpOp_ or exprCmpOp_ or something else because "joinOp_" usually indicates the join type like left outer, right outer, etc. http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@170 PS9, Line 170: SlotRef slotRef = expr.unwrapSlotRef(false); Add a comment stating that the validity of this is checked elsewhere (and where exactly) http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@368 PS9, Line 368: if (node instanceof HdfsScanNode && type_ != TRuntimeFilterType.BLOOM) { I feel like these checks belong in the caller. Having an addTarget() function be a co-op in some cases seems difficult to reason about. Can be cleaned with a isValidTarget() helper function. http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@463 PS9, Line 463: // We only enforce a limit on the number of bloom filters as they are much more This seems really confusing for users. I'm ok with checking in this version, and let's discuss separately. http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1248 PS9, Line 1248: | runtime filters: RF004 <- o_orderkey, RF005 <- o_clerk To me the re-numbering is a little strange. We can think about how to address this in a follow-on change. I'm thinking that ideally users should be able to quickly determine the number of runtime filters based on the max RF id, so we could assign the real RF id lazily instead of eagerly. http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test File testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test: http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/Pla
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7822/8/testdata/data/binary_decimal_dictionary.parquet File testdata/data/binary_decimal_dictionary.parquet: PS8: We should also mention how these files were generated in the README -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 23:30:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int Encode(const InternalType& t, int encoded_byte_size, uint8_t* buffer, > This test suite basically takes a test value and its expected size and firs Oh ok, thanks for the explanation. I think we should also add a comment mentioning that this doesn't exactly implement the Parquet spec, since the parquet spec would require that it remove leading zero bytes. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 23:29:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#8). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 18 files changed, 629 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/8 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391 PS7, Line 391: fixed_len_size > Looked again. The variable name (and recycling the argument storage) is con Done http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int fixed_len_size, uint8_t* buffer){ > I took another look at the standard and it says that the minimum number of This test suite basically takes a test value and its expected size and first encodes it then decodes it, in order to make minimal changes to the test suite I reused the function signature and passed the "minimum number of bytes required to store the unscaled value" as the expected size which is passed to the encode methods names as "fixed_len_size". As per you suggestion in a previous comment, I believe this will be more clear if i change the name to encoded_byte_size. Also, I gave an explanation of what "fixed_len_size" for decimals stored as BYTE_ARRAY as a comment at line 46. I believe it'll be better to move that comment right above this method instead. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 23:21:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8215 to look at the new patch set (#6). Change subject: IMPALA-5142 EventSequence displays negative elapsed time. .. IMPALA-5142 EventSequence displays negative elapsed time. EventSequence::EventList which stores the Event in increasing time order may have at times Events that are not stored in increasing time order. This may happen when concurrently EventSequence::MarkEvent() is called for different Events in the same EventSequence. The incorrect time order of Event in EventList results in a negative value being displayed, which is the issue reported in this JIRA. The issue can be fixed either be re-ordering the lock in EventSequence::MarkEvent() or by sorting the EventList before printing. Reordering of lock in EventSequence::MarkEvent() will involve holding the lock across clock_gettime() so that approach is not taken. This patch fixes the issue by sorting the EventList in GetEvents() as this function is expected to return sorted list of events based on time. Testing: Ran all the front-end/backend and end-end tests. Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 --- M be/src/util/runtime-profile-counters.h 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8215/6 -- To view, visit http://gerrit.cloudera.org:8080/8215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 Gerrit-Change-Number: 8215 Gerrit-PatchSet: 6 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Pranay Singh
[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8215 ) Change subject: IMPALA-5142 EventSequence displays negative elapsed time. .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@327 PS5, Line 327:[](Event const &event1, Event const &event2) { : return event1.second < event2.second; > rather than creating a tmp eventlist_sorted, you should create a temp for t I'm dropping this logic to call the sort() all the time to make the result more predictable each time this function gets called and also since the event list will be small so sorting won't be that expensive. http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@330 PS5, Line 330: == false > our style uses ! rather than == false Since I'm not checking the event list to be sorted so I won't need this anymore. http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326 PS4, Line 326: bool eventlist_sorted = std::is_sorted(events_.begin(), events_.end(), : [](Event const &event1, Event const &event2) { : return event1.second < event2.second; : }); : if (eventlist_sorted == false) { : std::sort(events_.begin(), events_.end(), : [](Event const &event1, Event const &event2) { : return event1.second < event2.second; : }); : > the flipside is that this becomes a cliff in the sense that normally it wil Done as suggested kept the functionality simple, removed the logic of having an extra check. -- To view, visit http://gerrit.cloudera.org:8080/8215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 Gerrit-Change-Number: 8215 Gerrit-PatchSet: 5 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 01 Nov 2017 23:01:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8408 ) Change subject: IMPALA-6121: remove I/O mgr request context cache .. Patch Set 4: (4 comments) Thanks for looking at the patch and for the good questions. http://gerrit.cloudera.org:8080/#/c/8408/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8408/4//COMMIT_MSG@12 PS4, Line 12: from TCMalloc's thread caches should scale much better than a > Though reader_context_ is not frequently allocated in current code, theoret Yeah I think that's a reasonable option in a lot of cases, particularly if perf is a concern. Allocating on the heap can avoid some slightly tricky issues around memory alignment (if the class requires more than the default alignment, e.g. cache line alignment, then any class including it must also be cache line aligned). Including everything inline also can also force viral header includes, since any class that needs to know the layout of Y that includes X inline then also needs to know the layout of X. I did actually just realise that we don't have copy constructors disable for request context, so I added that. http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h File be/src/runtime/disk-io-mgr-reader-context.h: http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h@274 PS4, Line 274: __sync_synchronize(); > Just a question - Does b need to be atomic here? If it does not generate a Yeah I think it would make more sense to use an explicit atomic for is_on_queue_. The whole concurrency story here could probably be simplified a lot but it would need some thought. I don't want to make any subtle changes to this logic as part of a refactoring patch, but it would be nice to simplify this code at some point. http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h@319 PS4, Line 319: if (!is_on_queue_ && num_threads_in_op_.Load() == 0 && !done_) { > The return value of Add can be used to remove this Load I'd have to think about this carefully to understand if the Acquire barrier in Load() makes a difference. I think you're probably right but I don't want to potentially include subtle behavioural changes in this patch. http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-test.cc File be/src/runtime/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-test.cc@251 PS4, Line 251: unique_ptr writer = io_mgr.RegisterContext(NULL); > nullptr? Did it for the whole file. -- To view, visit http://gerrit.cloudera.org:8080/8408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108 Gerrit-Change-Number: 8408 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 22:39:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache
Hello Michael Ho, Tianyi Wang, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8408 to look at the new patch set (#5). Change subject: IMPALA-6121: remove I/O mgr request context cache .. IMPALA-6121: remove I/O mgr request context cache This simplifies the lifecycle of the request contexts and eliminates some code. The comments claim that request context cache improves performance when allocating smallish the objects. But allocating from TCMalloc's thread caches should scale much better than a global object pool protected by a lock. I needed to move the definition to a non-internal header file so that it was visible to clients that manage it by unique_ptr. We also do not need to transfer the request contexts to the RuntimeState since I/O buffers do not leave scanners now. Testing: Ran exhaustive tests. Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc A be/src/runtime/disk-io-mgr-reader-context.h M be/src/runtime/disk-io-mgr-stress.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 16 files changed, 639 insertions(+), 804 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/8408/5 -- To view, visit http://gerrit.cloudera.org:8080/8408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108 Gerrit-Change-Number: 8408 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8215 ) Change subject: IMPALA-5142 EventSequence displays negative elapsed time. .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@327 PS5, Line 327:int64_t prev = 0L; : for (const Event& event: events_) { rather than creating a tmp eventlist_sorted, you should create a temp for this lamba (you can use "auto" for that), since it's used twice and must be consistent in order for this code to make sense. http://gerrit.cloudera.org:8080/#/c/8215/5/be/src/util/runtime-profile-counters.h@330 PS5, Line 330: = false; our style uses ! rather than == false http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326 PS4, Line 326: bool eventlist_sorted = true; : int64_t prev = 0L; : for (const Event& event: events_) { : if (event.second < prev) { : eventlist_sorted = false; : break; : } : prev = event.second; : } : > Since the list is sorted in most of the times calling sort() may be expensi the flipside is that this becomes a cliff in the sense that normally it will be fast, but every once in a while (when the race occurs), we'll pay the cost of sort(). If we always call sort(), then the performance is more predictable. I don't think the event list will ever be so long that this matters one way or another, so simplest seems best. That said, I don't feel strongly if you really want to keep the check. -- To view, visit http://gerrit.cloudera.org:8080/8215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 Gerrit-Change-Number: 8215 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 01 Nov 2017 22:31:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8215 to look at the new patch set (#5). Change subject: IMPALA-5142 EventSequence displays negative elapsed time. .. IMPALA-5142 EventSequence displays negative elapsed time. EventSequence::EventList which stores the Event in increasing time order may have at times Events that are not stored in increasing time order. This may happen when concurrently EventSequence::MarkEvent() is called for different Events in the same EventSequence. The incorrect time order of Event in EventList results in a negative value being displayed, which is the issue reported in this JIRA. The issue can be fixed either be re-ordering the lock in EventSequence::MarkEvent() or by sorting the EventList before printing. Reordering of lock in EventSequence::MarkEvent() will involve holding the lock across clock_gettime() so that approach is not taken. This patch fixes the issue by sorting the EventList in GetEvents() as this function is expected to return sorted list of events based on time. Testing: Ran all the front-end/backend and end-end tests. Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 --- M be/src/util/runtime-profile-counters.h 1 file changed, 14 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8215/5 -- To view, visit http://gerrit.cloudera.org:8080/8215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 Gerrit-Change-Number: 8215 Gerrit-PatchSet: 5 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Pranay Singh
[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8215 ) Change subject: IMPALA-5142 EventSequence displays negative elapsed time. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326 PS4, Line 326: bool eventlist_sorted = true; : int64_t prev = 0L; : for (const Event& event: events_) { : if (event.second < prev) { : eventlist_sorted = false; : break; : } : prev = event.second; : } : > why bother with this? the list is pretty short and this isn't a critical pa Since the list is sorted in most of the times calling sort() may be expensive (if in future # of events becomes large), hence I'll use the std::is_sorted() instead. http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@336 PS4, Line 336: [](Event const &event1, > nit: how about breaking the line before the [] instead of in the middle of Done -- To view, visit http://gerrit.cloudera.org:8080/8215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 Gerrit-Change-Number: 8215 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 01 Nov 2017 22:24:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 ) Change subject: IMPALA-5976: Remove equivalence class computation in FE .. Patch Set 3: (62 comments) http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281 PS3, Line 281: return new FunctionCallExpr("if", ifParams); Let's please avoid code changes unrelated to the purpose of this patch as much as reasonable. Cleanup is nice in general, but this patch is already complex and huge so let's not add anything extra. Such changes can also make backporting more difficult due to conflicts, so cleanup should be done in a separate change. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91 PS3, Line 91: * Slot A has value transfer to slot B if a predicate on A can be applied to B in most We need a precise definition. The original definition was more precise. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93 PS3, Line 93: * It is a reflexive, transitive binary relation on the set of slots. Not sure this part adds value. What's the significance of these statements with respect to our use of value transfers for planning purposes? If we don't make use of these terms/properties anywhere else in the code, then we should remove these statements. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277 PS3, Line 277: // The SCC-condensed graph representation of value transfer relationship. The SCC-condensed graph representation of all slot value transfers. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278 PS3, Line 278: SccCondensedGraph valueTransferGraph; public/private? http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146 PS3, Line 1146:* Create and register an auxiliary predicate to express an mutual value transfer a mutual value transfer http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467 PS3, Line 1467:* by replacing the slots of a source predicate with slots of the destTid, if for each how about: if every source slot has a value transfer to a slot in destTid http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513 PS3, Line 1513: // its referenced tuples are NULL. For example: Let's simplify the example and make it as clear as possible: select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618 PS3, Line 1618:* For each slot equivalence class, adds/removes predicates from conjuncts such that it Need a definition of equivalence class in the Analyzer class comment. You do mention the term "equivalence class" but I don't think it has the same meaning of the "equivalence class" terminology used here. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636 PS3, Line 1636: // A map from equivalence class ids to slot equivalence classes containing on slots Garbled sentence, please clean up http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956 PS3, Line 1956:* Compute the value transfer graph based on the registered value transfer relation the registered value transfers http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973 PS3, Line 1973: String condensedTc = globalState_.valueTransferGraph.print(); missing space in string http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982 PS3, Line 1982:* Populate the value transfer relation from eq conjuncts to graph 'g'. Add value-transfer edges to 'g' based on the registered equi-join conjuncts. http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059 PS3, Line 2059:* Returns the equivalence class of slotID. of the given slot id http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062 PS3, Line 2062: public List getEquivClass(SlotId slot) { slot -> sid http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@20
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8447 Change subject: IMPALA-2181: Add query option levels for display .. IMPALA-2181: Add query option levels for display Three display levels are introduced for each query option: REGULAR, ADVANCED and DEPRECATED. When the query options are diplayed in Impala shell using SET then only the REGULAR options are shown. A new command called SET ALL shows all the options grouped by their option levels. Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 --- M be/src/service/child-query.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_interactive.py 9 files changed, 244 insertions(+), 84 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8447/1 -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 6: (14 comments) Note to self: remaining files: krpc-data-stream-{mgr,recvr}.cc http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc@89 PS6, Line 89: "Number of datastream service processing threads"); how are these defaults chosen? http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@50 PS6, Line 50: outbound I think we should say something about KRPC to at least give that hint. maybe: A KRPC outbound row batch... http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@60 PS6, Line 60: sizeof(int32_t) sizeof(tuple_offsets_[0]) seems clearer and more robust http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@354 PS6, Line 354: /// it is ignored. This function does not Reset(). we should preserve this comment when removing the thrift variant. So you could just put the new decl here now so we don't forget that. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@424 PS6, Line 424: /// nit: i don't think we generally have all these line breaks between parameter comments. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@426 PS6, Line 426: . delete space http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@444 PS6, Line 444: nput_ delete http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@447 PS6, Line 447: input_ delete http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.h@537 PS6, Line 537: std::string compression_scratch_; this seems like a hack and we could do something simpler, but let's leave it alone for now. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/row-batch.cc@241 PS6, Line 241: // as sidecars to the RpcController. this comment was probably meant to be deleted? http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/data_stream_service.proto File common/protobuf/data_stream_service.proto: http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/data_stream_service.proto@29 PS6, Line 29: fragment isn't this the id of the instance? The comment in KrpcDataStreamSender is clearer, let's copy that: /// Sender instance id, unique within a fragment. int sender_id_; http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/data_stream_service.proto@59 PS6, Line 59: // Id of this fragment in its role as a sender. same http://gerrit.cloudera.org:8080/#/c/8023/3/common/protobuf/row_batch.proto File common/protobuf/row_batch.proto: http://gerrit.cloudera.org:8080/#/c/8023/3/common/protobuf/row_batch.proto@32 PS3, Line 32: = 2; > That's the tuple data sent as sidecar. Clarified in the new comments. My point is that writing it like 'tuple_data' doesn't make sense since it's not a field in this struct. You should just write: Size of the tuple data (sent as a sidecar) in bytes ... http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/row_batch.proto File common/protobuf/row_batch.proto: http://gerrit.cloudera.org:8080/#/c/8023/6/common/protobuf/row_batch.proto@32 PS6, Line 32: epeated int32 row_tuples = 2; why is this needed? i don't see it used. The size of it is used, though it seems like even that can be inferred from the descriptors. -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-Change-Number: 8023 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 01 Nov 2017 21:48:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8408 ) Change subject: IMPALA-6121: remove I/O mgr request context cache .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/8408/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8408/4//COMMIT_MSG@12 PS4, Line 12: from TCMalloc's thread caches should scale much better than a Though reader_context_ is not frequently allocated in current code, theoretically for reducing allocation won't it be better to store DiskIoRequestContext directly (or boost::optional<> for a "nullable cell")? http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h File be/src/runtime/disk-io-mgr-reader-context.h: http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h@274 PS4, Line 274: __sync_synchronize(); Just a question - Does b need to be atomic here? If it does not generate a MOV then this __sync_synchronize might not be effective as well http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-reader-context.h@319 PS4, Line 319: if (!is_on_queue_ && num_threads_in_op_.Load() == 0 && !done_) { The return value of Add can be used to remove this Load http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-test.cc File be/src/runtime/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/8408/4/be/src/runtime/disk-io-mgr-test.cc@251 PS4, Line 251: unique_ptr writer = io_mgr.RegisterContext(NULL); nullptr? -- To view, visit http://gerrit.cloudera.org:8080/8408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108 Gerrit-Change-Number: 8408 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 21:47:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8038 ) Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1425/ -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-Change-Number: 8038 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 21:32:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8038 ) Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 17: Code-Review+2 Carrying Michael's +2 -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-Change-Number: 8038 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 21:32:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717 PS2, Line 717: byte_buffer_last_byte_ > Oops, didn't read line 712 ;) I couldn't see an obvious code path that would take us there. Maybe some specially-constructed compressed data that expanded to zero bytes? Will think some more. -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 21:27:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8430 ) Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. Patch Set 3: > Please update the comment as suggested by Phil. Done -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 21:24:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#9). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each partitioned hash join generates a bloom and min-max filter, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8430 ) Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc@303 PS2, Line 303: // record.end_time_us might still be zero if the query is not yet done > nit: Could you update the following comment in client-request-state.h and i Done -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 21:23:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Hello Michael Ho, Philip Zeyliger, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8430 to look at the new patch set (#3). Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. IMPALA-6136: Part 1: Query duration should not be normally negative. The second commit for IMPALA-5599, 1640aa97, introduced a regression where calculating the duration of an in-progress query can result in negative values. This can happen for two reasons: 1. The value of ClientRequestState::end_time_us_ is not initialized in the constructor, and may have some random value until ClientRequestState::Done() is called. 2. If ClientRequestState::end_time_us_ happens to have a positive value less than ClientRequestState::start_time_us_, the query duration will be negative. Here, since the query is still in flight, we need to use the local current time as the end time. The fix has been manually verified by executing long-running queries and checking the query profiles to ensure the durations are not some random junks. A new test case will be added to check for sane time values in a follow-on commit. Since we are using Unix time (system clock) for time stamps, it is still possible for end_time_us_ to be less than start_time_us_ if the system clock gets reset while the query is executing. If there are clients that assume that a query duration is never negative, we really should switch to a monotonic clock to time queries. Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.h 4 files changed, 13 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8430/3 -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717 PS2, Line 717: byte_buffer_last_byte_ > The byte_buffer_read_size_ check above prevents using a stale value from a Oops, didn't read line 712 ;) That does seem like a possibility, and seems worth addressing. But is there a way we can exercise that? -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 21:20:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717 PS2, Line 717: byte_buffer_last_byte_ > given that this doesn't happen when readsize is 0, is it possible to use a The byte_buffer_read_size_ check above prevents using a stale value from a previous byte buffer. There's a related scenario where I can't convince myself that it is or isn't possible: if the last FillByteBuffer() call filled zero bytes but the call before that had a split delimiter at the end of the buffer, we could potentially ignore the split delimiter, even though it was the last thing we read from the range. We could fix that by tracking whether *any* bytes were read - if so, it means that byte_buffer_last_byte_ is valid and is the last byte read. What do you think? Should I try to more explicitly prevent that scenario? I don't think this change makes it more or less likely to hit. -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 21:10:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8405 ) Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8405 ) Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1424/ -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8419 ) Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536 .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1423/ -- To view, visit http://gerrit.cloudera.org:8080/8419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97 Gerrit-Change-Number: 8419 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5541: Reject BATCH SIZE greater than 65536
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8419 ) Change subject: IMPALA-5541: Reject BATCH_SIZE greater than 65536 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd5a2490a73b6915224160d7604b4badc72c1d97 Gerrit-Change-Number: 8419 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 7: (2 comments) I think we'll need to do some more work on testing for the int32/64 patches - we don't have pre-existing test files from parquet-mr. I think we'll have to generate some more test files with parquet-mr for the other cases, and we could consider turning that code into a data generator to generate more test files. From what I could tell Hive doesn't have a knob to generate some of these alternative output encodings. I feel ok with the coverage since we have end-to-end tests then more exhaustive unit tests for the various ways of encoding it. http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391 PS7, Line 391: fixed_len_size Looked again. The variable name (and recycling the argument storage) is confusing. Maybe 'encoded_byte_size'? http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int fixed_len_size, uint8_t* buffer){ I took another look at the standard and it says that the minimum number of bytes required to store the unscaled value should be used: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal I think this means that we should not be including any preceding "0" bytes. I.e. we should not have a fixed_len_size argument and instead determine the size based on the number of leading zero bytes in the value. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:53:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8438/2/be/src/exec/hdfs-text-scanner.cc@717 PS2, Line 717: byte_buffer_last_byte_ given that this doesn't happen when readsize is 0, is it possible to use a stale value here? -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 20:49:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Hello Joe McDonnell, Tim Armstrong, Bikramjeet Vig, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8034 to look at the new patch set (#15). Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder Currently DictDecoder class and DictEncoder class uses std::vector to store the tables mapping codeword to value and vice-versa. It is hard to detect the memory usage by these tables when they becomes very large, since this memory is not accounted by Impala's memory mangement infrastructure. This patch uses the memory tracker of HdfsScanner to track the memory used by dictionary in DictDecoder class. Similary it uses memory tracker of HdfsTableSink to track the memory used by dictionary in DictEncoder class. Memory for the dictionary, stored as std::vector is still allocated from std:allocator but the amount allocated is accounted by introducing a counter which is incremented and decremented as the memory is consumed and released by vector. Testing --- Ran all the backend and end-end tests with no failures. Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/util/dict-encoding.h M be/src/util/dict-test.cc 5 files changed, 140 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/15 -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 15 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h@63 PS12, Line 63: /// dict_encoded_size() bytes. : virtual void Writ > Close() should call ClearIndices() as one of its steps, ClearIndices() stil My bad didn't realize it, now that's fixed. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 14 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:44:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 7: > Looks good to me and it looks like you addressed Dan's comments. > Dan, did you want to take another look? No, I don't plan to look through the details. How confident are we that we have sufficient test coverage with the added tests? If we're confident then, Tim, please do the +2 (or ask for a second review from another reviewer if you think it's warranted). -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 20:40:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has removed Taras Bobrovytsky from this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Removed reviewer Taras Bobrovytsky. -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8305 ) Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src. .. Patch Set 11: (3 comments) http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/client-request-state.h@342 PS11, Line 342: start_time_us_, end_time_us_; > These two may need to be initialized. https://gerrit.cloudera.org/#/c/8430/ http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-http-handler.cc@303 PS11, Line 303: record.end_time_us - record.start_time_us; > This may be broken if end_time_us is not set yet. The old code will use the Fixing via https://gerrit.cloudera.org/#/c/8430/ http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h@622 PS11, Line 622: /// Start and end time of the query, in Unix microseconds. : int64_t start_time_us, end_time_us; > Can these be const ? They are always initialized in the constructor, right The default constructor does not handle this well if we make them constants. We'll have to make all/most of the other fields constant as well if we are going to fix this. -- To view, visit http://gerrit.cloudera.org:8080/8305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c Gerrit-Change-Number: 8305 Gerrit-PatchSet: 11 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 20:21:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8215 ) Change subject: IMPALA-5142 EventSequence displays negative elapsed time. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@326 PS4, Line 326: bool eventlist_sorted = true; : int64_t prev = 0L; : for (const Event& event: events_) { : if (event.second < prev) { : eventlist_sorted = false; : break; : } : prev = event.second; : } : why bother with this? the list is pretty short and this isn't a critical path, so I think it should be fine to always call sort(). (and if you were to do that, you might as well use std::is_sorted()) http://gerrit.cloudera.org:8080/#/c/8215/4/be/src/util/runtime-profile-counters.h@336 PS4, Line 336: [](Event const &event1, nit: how about breaking the line before the [] instead of in the middle of the lambda param list? -- To view, visit http://gerrit.cloudera.org:8080/8215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896 Gerrit-Change-Number: 8215 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 01 Nov 2017 20:17:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala
Sailesh Mukil has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8439 Change subject: IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala .. IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala KRPC has some flags that turn on TLS. This patch sets those to enable TLS communication. Tests are added to rpc-mgr-test. Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a --- M be/src/catalog/catalogd-main.cc M be/src/rpc/authentication-test.cc M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/runtime/exec-env.cc M be/src/service/impala-server.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/util/openssl-util.cc M be/src/util/openssl-util.h 15 files changed, 366 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8439/1 -- To view, visit http://gerrit.cloudera.org:8080/8439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a Gerrit-Change-Number: 8439 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5053: [SECURITY] Make KRPC work with Kerberos
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8270 to look at the new patch set (#2). Change subject: IMPALA-5053: [SECURITY] Make KRPC work with Kerberos .. IMPALA-5053: [SECURITY] Make KRPC work with Kerberos KuduRPC has support for Kerberos. However, since Impala's client transport still uses the Thrift transport stack, we need to make sure that a single security configuration applies to both internal communication (KuduRPC) and external communication (Thrift's TSaslTransport). This patch changes InitAuth() to start Sasl regardless of security configuration, since KRPC uses plain SASL for negotiation on insecure clusters. It also moves some utility code out of authentication.cc into auth-util.cc for resuse by the RpcMgr while enabling kerberos. The MiniKDC related code is moved out of thrift-server-test.cc into a new file called mini-kdc-wrapper.h/cc. This file exposes a new class MiniKdcWrapper which can be easily used by the tests to configure the kerberos environment, create the keytab, start the KDC and also initialize the Impala security library. Tests are added to rpc-mgr-test for kerberos tests over KRPC. thrift-server-test also has a mechanical change to use MiniKdcWrapper. Also tested on a live cluster configured to use kerberos. Change-Id: I8cec5cca5fdb4b1d46bab19e86cb1a8a3ad718fd --- M be/src/rpc/CMakeLists.txt M be/src/rpc/auth-provider.h M be/src/rpc/authentication-test.cc M be/src/rpc/authentication.cc M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.cc M be/src/rpc/thrift-server-test.cc M be/src/testutil/CMakeLists.txt A be/src/testutil/mini-kdc-wrapper.cc A be/src/testutil/mini-kdc-wrapper.h M be/src/util/auth-util.cc M be/src/util/auth-util.h 12 files changed, 427 insertions(+), 172 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8270/2 -- To view, visit http://gerrit.cloudera.org:8080/8270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8cec5cca5fdb4b1d46bab19e86cb1a8a3ad718fd Gerrit-Change-Number: 8270 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376 PS1, Line 376: virtual bool DictionaryReferencesBuffer() const { return false; } Add comment. Make pure if possible. Alternatively, you could do something similar to PageContainsTupleData(). http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272 PS1, Line 272: virtual bool DictionaryReferencesBuffer() const { add "override" http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541 PS1, Line 541: some how about rewording this to "Column readers for types that require ..." http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542 PS1, Line 542: inline bool DictionaryReferencesBufferInline() const { nit: single line http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945 PS1, Line 945: tmp_buffer better name http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947 PS1, Line 947: int uncompressed_size = decompressor_.get() != nullptr Move this to L954 http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951 PS1, Line 951: if (decompressor_.get() == nullptr && !DictionaryReferencesBuffer()) { Can you add a brief comment to explain how the 4 different cases are handled? http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@987 PS1, Line 987: if (DictionaryReferencesBuffer()) { : memcpy(dict_values, data_, data_size); : } Single line :) -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 01 Nov 2017 19:16:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17 PS1, Line 17: whereas the web UI would previously block on the lock. > Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-se Yep, we take the lock if the query is not in a CREATED state. -- Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id, const string& user, bool base64_encoded, stringstream* output) { DCHECK(output != nullptr); // Search for the query id in the active query map { shared_ptr request_state = GetClientRequestState(query_id); if (request_state.get() != nullptr) { // For queries in CREATED state, the profile information isn't populated yet. if (request_state->query_state() == beeswax::QueryState::CREATED) { return Status::Expected("Query plan is not ready."); } lock_guard l(*request_state->lock()); -- I agree this is a very important change from a usability/supportability perspective, to have a partial profile visible to the user. http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26 PS1, Line 26: know the query id : that's necessary to call GetResultSetMetadata() > Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as t I was thinking more along the "profile copy" path Dan suggested, so that we can avoid removing the lock during planning. I thought the latter involved more work since it could affect cancellations. With the current patch, cancellations don't block anymore. when the query is planning, so I think we should make sure to get that part right. I'm fine with this approach, if the cancellation works without any issues. -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 18:41:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: Also, besides the RPCs, let's consider how to handle the webserver cancelation path. Maybe we need to not show the cancel link until we can actually cancel? -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 17:56:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (2 comments) I think this path is worth continuing down. The alternate solution would be to have a profile copy that can be returned before the actual profile is ready, but if releasing the lock works out, that seems simplest. We should check how this minimal profile looks in CM -- there might be some other fields that they require? http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17 PS1, Line 17: whereas the web UI would previously block on the lock. > Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-se Right, this JIRA is a follow up to 1972 to improve things further so that the (partial) query profile can be retrieved. http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h@342 PS1, Line 342: is_planning_ maybe make that is_planning_finished_ (or similar), and initialize it to false and just set to true after planning? Actually, since the goal is to guard against other RPCs to execute until the handle is returned, maybe just make it is_execute_stmt_finished? -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 17:54:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8323 ) Change subject: IMPALA-1575: part 2: yield admission control resources .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8323 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac Gerrit-Change-Number: 8323 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 17:34:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources
Hello anujphadke, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8323 to look at the new patch set (#5). Change subject: IMPALA-1575: part 2: yield admission control resources .. IMPALA-1575: part 2: yield admission control resources This change releases admission control resources more eagerly, once the query has finished actively executing. Some resources (tracked and untracked) are still consumed by the client request as long as it remains open, e.g. memory for control structures and the result cache. However, these resources are relatively small and should not block admission of new queries. The same as in part 1, query execution is considered to be finished under any of the following conditions: 1. The query encounters an error and fails 2. The query is cancelled due to the idle query timeout 3. The query reaches eos (or the DML completes) 4. The client cancels the query without closing the query Admission control resources are released in two ways: 1. by calling AdmissionController::ReleaseQuery() on the coordinator promptly after query execution finishes, instead of waiting for UnregisterQuery(). This means that the query and its memory is no longer considered "admitted". 2. by changing the behaviour of MemTracker::GetPoolMemReserved() so that it is aware of when a query has finished executing and does not consider its entire memory limit to be "reserved". The preconditions for releasing an admitted query are subtle because the queries are being admitted to a distributed system, not just the coordinator. The comment for ReleaseAdmissionControlResources() documents the preconditions and rationale. Note that the preconditions are not weaker than the preconditions of calling UnregisterQuery() before this patch. Testing: TestAdmissionController is extended to end queries in four ways: cancellation by client, idle timeout, the last row being fetched, and the client closing the query. The test uses a mix of all four. After the query ends, all clients wait for the test to complete before closing the query or closing the connection. This ensures that the admission control decisions are based entirely on the query end behavior. This test works for both query admission control and mem_limit admission control and can detect both kinds of admission control resources ("admitted" and "reserved") not being released promptly. This is based on an earlier patch by Joe McDonnell. Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/query-state.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/client-request-state.cc M tests/custom_cluster/test_admission_controller.py 9 files changed, 192 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8323/5 -- To view, visit http://gerrit.cloudera.org:8080/8323 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac Gerrit-Change-Number: 8323 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 17:29:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Vincent Tran has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 5: I will certainly get back to this change. Needed to handle some personal issues and got off track. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 5 Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Wed, 01 Nov 2017 17:18:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: Could you have a look, Taras? I know you recently did a little work on memory management in the text scanners. Hopefully the bug/fix should be fairly clear-cut. -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 16:58:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8430 ) Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. Patch Set 2: Code-Review+2 Please update the comment as suggested by Phil. -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 16:49:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8430 ) Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. Patch Set 2: Code-Review+1 Please see some more comments in https://gerrit.cloudera.org/c/8305/ -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 16:36:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8305 ) Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src. .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8305/11/be/src/service/impala-server.h@622 PS11, Line 622: /// Start and end time of the query, in Unix microseconds. : int64_t start_time_us, end_time_us; Can these be const ? They are always initialized in the constructor, right ? -- To view, visit http://gerrit.cloudera.org:8080/8305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c Gerrit-Change-Number: 8305 Gerrit-PatchSet: 11 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 16:36:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG@9 PS2, Line 9: The bug was that the buffer pointed to by byte_buffer_ could be freed by > Do you mean byte_buffer_ptr_? Yeah I will fix in the next iteration. -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 16:16:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8430 ) Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc@303 PS2, Line 303: // record.end_time_us might still be zero if the query is not yet done nit: Could you update the following comment in client-request-state.h and impala-server.h (once for ClientRequestState, once for QueryStateRecord) to say something about how 0 is a special value. /// Start/end time of the query, in Unix microseconds. int64_t start_time_us_, end_time_us_; -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 16:02:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17 PS1, Line 17: whereas the web UI would previously block on the lock. > I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed) Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-server.cc a bit more clearly, this is replacing "Query plan is not ready" with the actual profile. The profile is useful at this point: it has been populated with the query string, the time that planning started, and so on. -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 15:36:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17 PS1, Line 17: whereas the web UI would previously block on the lock. > After IMPALA-1972, it doesn't block, it just returns an empty profile. I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed). You can go to the web UI and see the queries, but if you click on "Profile", you'll get stuck getting the lock in ImpalaServer::GetRuntimeProfileStr() below. Am I missing something? Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id, const string& user, bool base64_encoded, stringstream* output) { DCHECK(output != nullptr); // Search for the query id in the active query map { shared_ptr request_state = GetClientRequestState(query_id); if (request_state.get() != nullptr) { lock_guard l(*request_state->lock()); http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26 PS1, Line 26: know the query id : that's necessary to call GetResultSetMetadata() > It looks to me like this is against IMPALA-2568. I guess we eventually want Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as the first subtask of that JIRA. It's easier to argue that this one is compatible. As for this one needing to be undone, it's hard to say. I think you solve IMPALA-2568 by having more fine-grained and clear query states/lifecycles. I think this is a step in that direction. What would you suggest? -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 15:32:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8438/2//COMMIT_MSG@9 PS2, Line 9: The bug was that the buffer pointed to by byte_buffer_ could be freed by Do you mean byte_buffer_ptr_? -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 08:52:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6080: clean up table descriptor handling
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8330 ) Change subject: IMPALA-6080: clean up table descriptor handling .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/benchmarks/expr-benchmark.cc File be/src/benchmarks/expr-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/benchmarks/expr-benchmark.cc@57 PS2, Line 57: #include "service/frontend.h" I don't think this include is needed. -- To view, visit http://gerrit.cloudera.org:8080/8330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id427dab0c196b556bd8b2d64ec618403d5cbd4d6 Gerrit-Change-Number: 8330 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 01 Nov 2017 08:05:48 + Gerrit-HasComments: Yes