[Impala-ASF-CR] IMPALA-7102(Part 1): Disable reading of erasure coding by default
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10646 ) Change subject: IMPALA-7102(Part 1): Disable reading of erasure coding by default .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2751/ -- To view, visit http://gerrit.cloudera.org:8080/10646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3 Gerrit-Change-Number: 10646 Gerrit-PatchSet: 3 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Adrian Ng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Jun 2018 05:42:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10827 ) Change subject: IMPALA-7215: Implement a templatized CountingBarrier .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h File be/src/util/counting-barrier.h: http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@34 PS3, Line 34: /// Sends one notification, decrementing the number of pending notifications by one. Add something like: If this is the final notifier, sets the value returned to the waiter to 'promise_value' http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@36 PS3, Line 36: int32_t looks like no caller actually uses this return value and so you've annotated them with discard_result(). You could also just make this return void since no one wants the value. I'm fine either way. http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@42 PS3, Line 42: /// Sets the number of pending notifications to 0 and unblocks Wait(). similarly, add comment about 'promise_value' http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@54 PS3, Line 54: /// Blocks until all notifications are received. comment the return value http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@55 PS3, Line 55: T would be good to return "const T&" here and below (to match Promise.Get()) http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@58 PS3, Line 58: /// case '*timed_out' will be true. comment return value -- To view, visit http://gerrit.cloudera.org:8080/10827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e Gerrit-Change-Number: 10827 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 28 Jun 2018 04:32:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2752/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 7 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 28 Jun 2018 04:12:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 7 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 28 Jun 2018 04:12:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 6: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@336 PS1, Line 336: ) { > I'm curious why you feel this is less readable. As we spoke, my basic reasoning was that if we have a function signature like so: def Func(param..) then that would signify that 'param' is used to do 'Func'. In this case: bool Cancel(query_ctx) that would signify, use 'query_ctx' to 'Cancel'. However, 'query_ctx' is used for an orthogonal use case within the function (which is important, but is just not doing any real 'cancel' work). There are cases where this pattern is unavoidable, but I feel like we could do without it here. The second patch set looks much better though. Thanks for doing that. -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 6 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 28 Jun 2018 02:58:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10827 ) Change subject: IMPALA-7215: Implement a templatized CountingBarrier .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10827/1/be/src/util/counting-barrier.h File be/src/util/counting-barrier.h: http://gerrit.cloudera.org:8080/#/c/10827/1/be/src/util/counting-barrier.h@39 PS1, Line 39: ; > rather than have a default_promise_value_, why not just pass it as a parame Done http://gerrit.cloudera.org:8080/#/c/10827/1/be/src/util/counting-barrier.h@46 PS1, Line 46: eturn; // count_ can legitimately drop below 0 : if (count_.CompareAndSwap(value, 0)) { : promise_.Set(promise_value); > this would be simpler if we just require the promise value for the non-bool Good point. I implemented a TypedCountingBarrier and made CountingBarrier a user of TypedCountingBarrier. -- To view, visit http://gerrit.cloudera.org:8080/10827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e Gerrit-Change-Number: 10827 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 28 Jun 2018 02:49:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10827 to look at the new patch set (#3). Change subject: IMPALA-7215: Implement a templatized CountingBarrier .. IMPALA-7215: Implement a templatized CountingBarrier Currently, our CountingBarrier util only notifies a 'bool' value and uses an underlying Promise. We're seeing cases in code where we might want to be notified of a different kind of Promise (other than bool). We add a templatized class TypedCountingBarrier and convert CountingBarrier to use the TypedCountingBarrier internally. This was identified while working on IMPALA-7163. Testing: Ran 'core' tests. Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e --- M be/src/exec/hdfs-scan-node.cc M be/src/rpc/rpc-mgr-test.cc M be/src/runtime/coordinator.cc M be/src/util/counting-barrier.h M be/src/util/hdfs-bulk-ops.cc 5 files changed, 54 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/3 -- To view, visit http://gerrit.cloudera.org:8080/10827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e Gerrit-Change-Number: 10827 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10827 to look at the new patch set (#2). Change subject: IMPALA-7215: Implement a templatized CountingBarrier .. IMPALA-7215: Implement a templatized CountingBarrier Currently, our CountingBarrier util only notifies a 'bool' value and uses an underlying Promise. We're seeing cases in code where we might want to be notified of a different kind of Promise (other than bool). We add a templatized class TypedCountingBarrier and convert CountingBarrier to use the TypedCountingBarrier internally. This was identified while working on IMPALA-7163. Testing: Ran 'core' tests. Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e --- M be/src/exec/hdfs-scan-node.cc M be/src/rpc/rpc-mgr-test.cc M be/src/runtime/coordinator.cc M be/src/util/counting-barrier.h M be/src/util/hdfs-bulk-ops.cc 5 files changed, 59 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/2 -- To view, visit http://gerrit.cloudera.org:8080/10827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e Gerrit-Change-Number: 10827 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-7102(Part 1): Disable reading of erasure coding by default
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10646 ) Change subject: IMPALA-7102(Part 1): Disable reading of erasure coding by default .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3 Gerrit-Change-Number: 10646 Gerrit-PatchSet: 3 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Adrian Ng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Jun 2018 02:16:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7102(Part 1): Disable reading of erasure coding by default
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10646 ) Change subject: IMPALA-7102(Part 1): Disable reading of erasure coding by default .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2751/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/10646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3 Gerrit-Change-Number: 10646 Gerrit-PatchSet: 3 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Adrian Ng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Jun 2018 02:16:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7095: clean up scan node profiles
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10810 ) Change subject: IMPALA-7095: clean up scan node profiles .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/hdfs-scan-node-base.h@489 PS5, Line 489: per_read_thread > I guess it's rather per_thread_read_throughput , just like in its comment. The name of the counter is "PerReadThreadRawHdfsThroughput", which isn't a great name, but the names of the variables reflect the counter name. Not sure if it's worth changing the counter name - I didn't have a great alternative. http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h File be/src/exec/scan-node.h: http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@46 PS5, Line 46: layet > Nit: layer Done http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@95 PS5, Line 95: > Nit: too much spaces Done http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@234 PS5, Line 234: && > Nit: I think it's not necessary here and it would be nice if it was consist Done http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@256 PS5, Line 256: && > Since this function does not always take ownership of 'row_batch', I think Yeah we had an extended discussion about this pattern at one point: https://gerrit.cloudera.org/#/c/6527/5/be/src/exec/kudu-scan-node.cc@179 I'd find lvalue references confusing since then there's no indication that ownership might be transferred at the callsite. I switched this to just pass in a pointer to the unique_ptr, since that seems to avoid both potential confusions. I think the pattern is harder to avoid in BlockingQueue itself, since it's generic and meant to work for copyable value types too. http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.cc@143 PS5, Line 143: /// Starts counters that should start measuring in the Open() phase. : void StartOpenCounters(); > It seems it is some leftover code. Done -- To view, visit http://gerrit.cloudera.org:8080/10810 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787 Gerrit-Change-Number: 10810 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Jun 2018 00:41:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7095: clean up scan node profiles
Hello Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10810 to look at the new patch set (#6). Change subject: IMPALA-7095: clean up scan node profiles .. IMPALA-7095: clean up scan node profiles Add counters to scan node implementations where they make sense but were missing (e.g. row batch queue counters for multithread Kudu scans) and remove them where they don't make sense (e.g. scanner thread counters for non-multithreaded scans). Refactors the multithreaded Kudu and HDFS scans to share logic via composition (single inheritance doesn't work for this case), which enables the same set of counters to be maintained with shared code. The row batch queueing and thread tracking is now shared. I looked at combining the logic around 'status_', 'lock_' and 'done_' between the two but the details were different enough that it didn't seem worth abstracting. Adds a PeakScannerThreadConcurrency counter - this answers a common question. Fixes RowsRead for data source scans. Fix some of the comments to be more accurate/useful. Testing: Ran exhaustive tests. Ran various types of scans (HDFS, Kudu, HBase, Data source) and inspected the profile output manually. Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787 --- M be/src/exec/data-source-scan-node.cc M be/src/exec/data-source-scan-node.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-scan-node.h M be/src/exec/hbase-table-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scan-node-base.h M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/runtime/fragment-instance-state.cc M be/src/util/blocking-queue.h M be/src/util/thread.h 20 files changed, 466 insertions(+), 351 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10810/6 -- To view, visit http://gerrit.cloudera.org:8080/10810 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787 Gerrit-Change-Number: 10810 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7102(Part 1): Disable reading of erasure coding by default
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10646 ) Change subject: IMPALA-7102(Part 1): Disable reading of erasure coding by default .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd3b1754541262467a6e67068b0b447882a40fb3 Gerrit-Change-Number: 10646 Gerrit-PatchSet: 2 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Adrian Ng Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Jun 2018 00:26:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-110 (part 2): Refactor PartitionedAggregationNode
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10394 ) Change subject: IMPALA-110 (part 2): Refactor PartitionedAggregationNode .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/10394/6/be/src/exec/aggregator.h File be/src/exec/aggregator.h: http://gerrit.cloudera.org:8080/#/c/10394/6/be/src/exec/aggregator.h@26 PS6, Line 26: #include "runtime/mem-tracker.h" Nit: Can we forward-declare MemTracker instead of pulling in the whole header? http://gerrit.cloudera.org:8080/#/c/10394/6/be/src/exec/grouping-aggregator-partition.cc File be/src/exec/grouping-aggregator-partition.cc: PS6: This way of organising the code is much saner, thank you. -- To view, visit http://gerrit.cloudera.org:8080/10394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e7bb583f54aa4add3738bde7f57cf3511ac567e Gerrit-Change-Number: 10394 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 28 Jun 2018 00:22:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 6: Code-Review+1 Fix whitespace glitch, carry. -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 6 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 27 Jun 2018 23:39:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Hello Lars Volker, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10815 to look at the new patch set (#6). Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated Otherwise, if the coordinator to backend CancelFInstances() RPC had failed, the query can hang (and/or finstances can continue running until the query is closed. Testing: - the modified test reproduces the hang without the impalad fix Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M tests/query_test/test_cancellation.py 4 files changed, 51 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/10815/6 -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 6 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 5: Code-Review+1 (2 comments) Carry Lars' +1 http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@366 PS1, Line 366: > Thanks for the explanation. I think the code could be more readable if we r Agree, done. http://gerrit.cloudera.org:8080/#/c/10815/4/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: http://gerrit.cloudera.org:8080/#/c/10815/4/tests/query_test/test_cancellation.py@136 PS4, Line 136: "|".join(filter > remove one "debug_action = " Oops, done. -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 5 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 27 Jun 2018 23:37:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Hello Lars Volker, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10815 to look at the new patch set (#5). Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated Otherwise, if the coordinator to backend CancelFInstances() RPC had failed, the query can hang (and/or finstances can continue running until the query is closed. Testing: - the modified test reproduces the hang without the impalad fix Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M tests/query_test/test_cancellation.py 4 files changed, 51 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/10815/5 -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 5 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10704 ) Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans .. Patch Set 9: More tests fails, we should fix those first. You can do a dry run of the precommit job with https://jenkins.impala.io/job/gerrit-verify-dryrun-external/ -- To view, visit http://gerrit.cloudera.org:8080/10704 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7 Gerrit-Change-Number: 10704 Gerrit-PatchSet: 9 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Jun 2018 23:23:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5981: [DOCS] Documented SET=""
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10816 ) Change subject: IMPALA-5981: [DOCS] Documented SET="" .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/10816/1/docs/topics/impala_set.xml File docs/topics/impala_set.xml: http://gerrit.cloudera.org:8080/#/c/10816/1/docs/topics/impala_set.xml@57 PS1, Line 57: > After checking the impala-shell command and start option doc, I removed imp The other difference between the two sets is that the SQL set has quotes around the strings SQL: set foo="bar"; set foo=""; Shell: set foo=bar; unset foo; http://gerrit.cloudera.org:8080/#/c/10816/2/docs/topics/impala_set.xml File docs/topics/impala_set.xml: http://gerrit.cloudera.org:8080/#/c/10816/2/docs/topics/impala_set.xml@142 PS2, Line 142: This is true for any interface that isn't impala-shell, e.g. Hue. Remove the qualification here? -- To view, visit http://gerrit.cloudera.org:8080/10816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7211405d5cc0a548c05ea5218798591873c14417 Gerrit-Change-Number: 10816 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Jun 2018 23:22:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 4: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@366 PS1, Line 366: rpc_status = DebugAction(query_ctx().client_request.query_options, > No, I wanted to simulate what happens when a single RPC fails. That way we Thanks for the explanation. I think the code could be more readable if we reduced the indent by using continue, both for !client_status.ok() above and for !rpc_status.ok() here. I don't feel that strongly about it though. http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@367 PS1, Line 367: COORD_CANCEL_QUERY_FINSTANCES_RPC > The convention I've been using for the global debug action labels is to pre I agree, with the COORD prefix it actually is clear enough. http://gerrit.cloudera.org:8080/#/c/10815/4/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: http://gerrit.cloudera.org:8080/#/c/10815/4/tests/query_test/test_cancellation.py@136 PS4, Line 136: debug_action = remove one "debug_action = " -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 4 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 27 Jun 2018 23:06:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Hello Lars Volker, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10815 to look at the new patch set (#4). Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated Otherwise, if the coordinator to backend CancelFInstances() RPC had failed, the query can hang (and/or finstances can continue running until the query is closed. Testing: - the modified test reproduces the hang without the impalad fix Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M tests/query_test/test_cancellation.py 4 files changed, 49 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/10815/4 -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 4 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@366 PS1, Line 366: rpc_status = DebugAction(query_ctx.client_request.query_options, > Wouldn't we need to break; the loop if this was not ok()? No, I wanted to simulate what happens when a single RPC fails. That way we can even exercise the retry logic with e.g. COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL@0.3 In the test for failing the RPC altogether, I give it a failure probability of 100%, so it loops 3 times and then gives up, just like what would happen if the RPC really failed. http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@367 PS1, Line 367: COORD_CANCEL_QUERY_FINSTANCES_RPC > Should we name this actions so that it's clear that it's in the sender side The convention I've been using for the global debug action labels is to prefix with the containing module name. So, I think the "COORD" prefix is sufficient for knowing this is the sender side. If we want to add one to the handler, it'd probably be called QUERYSTATE_CANCEL_QUERY_FINSTANCES_RPC. LMK if you feel it's still not clear. http://gerrit.cloudera.org:8080/#/c/10815/1/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: http://gerrit.cloudera.org:8080/#/c/10815/1/tests/query_test/test_cancellation.py@139 PS1, Line 139: if fail_rpc_action != None: > If wait_action != None and fail_rpc_action == None, then debug_action will Sure, if you feel that's more readable to a native python speaker. (I'd have to try it out to know what it's doing but I don't consider myself the most proficient python coder). Note that the extraneous "|" was harmless, but happy to change this, it does seem more python-y http://gerrit.cloudera.org:8080/#/c/10815/1/tests/query_test/test_cancellation.py@206 PS1, Line 206: if wait_action is None and fail_rpc_action is None \ > Then this could be "if not debug_action and ('count'..." Done -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 27 Jun 2018 22:24:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Hello Lars Volker, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10815 to look at the new patch set (#3). Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated Otherwise, if the coordinator to backend CancelFInstances() RPC had failed, the query can hang (and/or finstances can continue running until the query is closed. Testing: - the modified test reproduces the hang without the impalad fix Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M tests/query_test/test_cancellation.py 4 files changed, 50 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/10815/3 -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 3 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10815 ) Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@366 PS1, Line 366: rpc_status = DebugAction(query_ctx.client_request.query_options, Wouldn't we need to break; the loop if this was not ok()? http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@367 PS1, Line 367: COORD_CANCEL_QUERY_FINSTANCES_RPC Should we name this actions so that it's clear that it's in the sender side of the rpc? http://gerrit.cloudera.org:8080/#/c/10815/1/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: http://gerrit.cloudera.org:8080/#/c/10815/1/tests/query_test/test_cancellation.py@139 PS1, Line 139: if fail_rpc_action != None: If wait_action != None and fail_rpc_action == None, then debug_action will and in a "|". You could do this instead: debug_action = "|".join(filter(None, [wait_action, fail_rpc_action])) http://gerrit.cloudera.org:8080/#/c/10815/1/tests/query_test/test_cancellation.py@206 PS1, Line 206: if wait_action is None and fail_rpc_action is None \ Then this could be "if not debug_action and ('count'..." -- To view, visit http://gerrit.cloudera.org:8080/10815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991 Gerrit-Change-Number: 10815 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 27 Jun 2018 21:43:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 ) Change subject: IMPALA-3040: Remove cache directive before dropping a table .. Patch Set 2: Also, thinking a bit more about your theory, are you able to reproduce it by adding Thread.sleep() s in the required places? -- To view, visit http://gerrit.cloudera.org:8080/10792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170 Gerrit-Change-Number: 10792 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Wed, 27 Jun 2018 16:58:39 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 ) Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22 PS1, Line 22: fragments > My problem with changing RuntimeState::LogError() to log only the first err So it sounds like the motivation is to be able to define a "scope" for what is consider to be redundant errors? I agree doing that in LogError() would not be straightforward. My main concern with building more logic on top of LogError() is that the error log stuff seems a bit broken and so adding more complication on top is hard to reason about. Specifically, the use of max_errors query option seems inconsistent. So wondering if we need to step back and rethink how the warning collection works generally rather than adding another layer on top. Anyway, I'm okay with adding this scoped warning collector thingy if you feel it's needed but it needs more explanation in the class comment. http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@234 PS1, Line 234: /// when a new error arrives or when errors need to be flushed. this comment needs more explanation. With the current comment, it's not easy for someone to determine whether they should use this class or just call LogError() directly. You should explain how this differs from just calling LogError(). http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@241 PS1, Line 241: if there is no error isn't it: return true if there is already an error with this error code? -- To view, visit http://gerrit.cloudera.org:8080/10793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa Gerrit-Change-Number: 10793 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Jun 2018 16:46:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 ) Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1435 PS18, Line 1435: / Only reload file metadata of partitions speci > I think patch set 20 reflects something like this. Could you please check a not really. the code in that patch loops over the partitions to delete, and for each such partition, checks if the partition's location is shared amongst more than one. The sharing test depends on that new map in hdfstable, which is maintained incrementally. I'm expecting something like this: 1) build delete partitions index, mapping location to partition 2) for each partition in table (location, delete_partition) = location_map.get(partition) if partition != delete_partition then we have sharing, error. Might want to collect all cases of shared locations violations. -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 20 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Jun 2018 16:07:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7095: clean up scan node profiles
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10810 ) Change subject: IMPALA-7095: clean up scan node profiles .. Patch Set 5: (6 comments) Did an initial pass over the code. Found some minor issues. http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/hdfs-scan-node-base.h@489 PS5, Line 489: per_read_thread I guess it's rather per_thread_read_throughput , just like in its comment. http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h File be/src/exec/scan-node.h: http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@46 PS5, Line 46: layet Nit: layer http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@95 PS5, Line 95: Nit: too much spaces http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@234 PS5, Line 234: && Nit: I think it's not necessary here and it would be nice if it was consistent with EnqueueBatch(). http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.h@256 PS5, Line 256: && Since this function does not always take ownership of 'row_batch', I think it is confusing to see std::move at the call sites. Maybe a non-const l-value reference or a pointer would be a little bit more straightforward. http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/10810/5/be/src/exec/scan-node.cc@143 PS5, Line 143: /// Starts counters that should start measuring in the Open() phase. : void StartOpenCounters(); It seems it is some leftover code. -- To view, visit http://gerrit.cloudera.org:8080/10810 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787 Gerrit-Change-Number: 10810 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Jun 2018 16:02:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 ) Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors .. Patch Set 1: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12 PS1, Line 12: the logs : will contain a new line for every occurrence > But isn't that the point of the change (as you state below). I guess it wo The log lines to eliminate are the ones that report the exact same information. These would be merged to max two lines: 1. one at the first occurrence of the error 2. one when the error collector is flushed - this would contain the number of errors since 1. This logic is probably more complicated then it should be, but it ensures that minimal information is lost. It would be simpler to avoid the flushing logic by logging only the first errors, but as RuntimeState reports the number of occurrences per error code, some way is needed to increment the counters. http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22 PS1, Line 22: fragments > As the class is written today, it's not really tied to RuntimeState. You co My problem with changing RuntimeState::LogError() to log only the first error for a code is that it would completely hide if an error occurs in more than one files/columns. Having one LogCollector per ColumnReader means that the error in one column cannot hide the one in another. An alternate approach would be to keep a LogCollector per HdfsScanner, and keep the last message, and log only if the message was changed. This would simplify the interface and work well with column readers. Another possibility would be to create a class to collect specially data errors. Its LogError() function could expect table/file/column parameters, and use them as the key of a map and count errors separately. -- To view, visit http://gerrit.cloudera.org:8080/10793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa Gerrit-Change-Number: 10793 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Jun 2018 14:14:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7190: Remove unsupported format writer support
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10823 ) Change subject: IMPALA-7190: Remove unsupported format writer support .. Patch Set 1: Code-Review+1 (2 comments) Some minor comments for tests, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/10823/1/tests/query_test/test_compressed_formats.py File tests/query_test/test_compressed_formats.py: http://gerrit.cloudera.org:8080/#/c/10823/1/tests/query_test/test_compressed_formats.py@150 PS1, Line 150: def test_unsupported_writers(self, vector): I would prefer to use unique_database, and remove the drops from unsupported-writers.test http://gerrit.cloudera.org:8080/#/c/10823/1/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/10823/1/tests/shell/test_shell_interactive.py@a404 PS1, Line 404: This line could be left here, but check that "... not in result.stduot". This would make it easier to restore the original test when there will be new deprecated query options. -- To view, visit http://gerrit.cloudera.org:8080/10823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I821dc7495a901f1658daa500daf3791b386c7185 Gerrit-Change-Number: 10823 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Jun 2018 13:13:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 ) Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1435 PS18, Line 1435: / Only reload file metadata of partitions speci > yes, I think drop partitions complexity should be O(http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 20 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Jun 2018 07:27:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3040: Remove cache directive before dropping a table
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10792 ) Change subject: IMPALA-3040: Remove cache directive before dropping a table .. Patch Set 2: Your theory seems plausible to me > I think table is dropped concurrently with Do you know what in test_caching_ddl() is calling this drop (drop db cascade/drop table etc.) ? It does not seem to be using unique_db_fixture and runs serially. So I'm wondering what is triggering a race between load() and drop(). > Now the questions is whether listPartitionNames() returns an empty list if > the table doesn't exist. The first thing to notices is that > listPartitionNames() might throw NoSuchObjectException, so intuitively that > should happen if the table doesn't exist, which is not true. The relevant > code is at > https://github.com/apache/hive/blob/966b83e3b9123bb455572d47878601d60b86999e/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4717. > The NoSuchObjectException is only thrown by fireReadTablePreEvent(), which > is some kind of hook mechanism and might be a no-op in most cases. The > backend implementation is at > https://github.com/apache/hive/blob/966b83e3b9123bb455572d47878601d60b86999e/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3247. > It merely executes a select query which doesn't check if the table exists at > all. So yes, it will return an empty list. You are right about this. I wrote a quick HMSClient class to confirm this. Following prints 0. $javac -cp "fe/target/dependency/*" TestListPartitionNames.java $java -cp "fe/target/dependency/*":$HADOOP_CONF_DIR:. TestListPartitionNames import org.apache.hadoop.hive.metastore.HiveMetaStoreClient; import org.apache.hadoop.hive.conf.HiveConf; import java.util.List; public class TestListPartitionNames { public static void main(String[] args) throws Exception { HiveMetaStoreClient client = new HiveMetaStoreClient( new HiveConf(), null); List parts = client.listPartitionNames("non_existent_db_blah_blah", "foo", (short) -1); System.out.println(parts.size()); } } -- To view, visit http://gerrit.cloudera.org:8080/10792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7701a499405e961456adea63f3592b43bd69170 Gerrit-Change-Number: 10792 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Wed, 27 Jun 2018 06:03:21 + Gerrit-HasComments: No