[Impala-ASF-CR] WIP: IMPALA-4224: execute separate join builds fragments
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14859 ) Change subject: WIP: IMPALA-4224: execute separate join builds fragments .. Patch Set 31: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5538/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Gerrit-Change-Number: 14859 Gerrit-PatchSet: 31 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 29 Jan 2020 02:47:18 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-4224: execute separate join builds fragments
Hello Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14859 to look at the new patch set (#31). Change subject: WIP: IMPALA-4224: execute separate join builds fragments .. WIP: IMPALA-4224: execute separate join builds fragments This enables parallel plans with the join build in a separate fragment and fixes all of the ensuing fallout. After this change, mt_dop plans with joins have separate build fragments. There is still a 1:1 relationship between join nodes and builders, so the builders are only accessed by the join node's thread after it is handed off. This lets us defer the work required to make PhjBuilder and NljBuilder safe to be shared between nodes. Planner changes: * Combined the parallel and distributed planning code paths. * Misc fixes to generate reasonable thrift structures in the query exec requests, i.e. containing the right nodes. * Fixes to resource calculations for the separate build plans. ** Calculate separate join/build resource consumption. ** Simplified the resource estimation by calculating resource consumption for each fragment separately, and assuming that all fragments hit their peak resource consumption at the same time. IMPALA-9255 is the follow-on to make the resource estimation more accurate. Scheduler changes: * Various fixes to handle multiple TPlanExecInfos correctly, which are generated by the planner for the different cohorts. * Add logic to colocate build fragments with parent fragments. Runtime filter changes: * Build sinks now produce runtime filters, which required planner and coordinator fixes to handle. accordingly. DataSink changes: * Close the input plan tree before calling FlushFinal() to release resources. This depends on Send() not holding onto references to input batches, which was true except for NljBuilder. This invariant is documented. Join builder changes: * Add a common base class for PhjBuilder and NljBuilder with functions to handle synchronisation with the join node. * Close plan tree earlier in FragmentInstanceState::Exec() so that peak resource requirements are lower. * The NLJ always copies input batches, so that it can close its input tree. JoinNode changes: * Join node blocks waiting for build-side to be ready, then eventually signals that it's done, allowing the builder to be cleaned up. * NLJ and PHJ nodes handle both the integrated builder and the external builder. There is a 1:1 relationship between the node and the builder, so we don't deal with thread safety yet. * Buffer reservations are transferred between the builder and join node when running with the separate builder. This is not really necessary right now, since it is all single-threaded, but will be important for the shared broadcast. - The builder transfers memory for probe buffers to the join node at the end of each build phase. - At end of each probe phase, reservation needs to be handed back to builder (or released). ExecSummary changes: * The summary logic was modified to handle connecting fragments via join builds. The logic is an extension of what was used for exchanges. Testing: * Enable --unlock_mt_dop for end-to-end tests * Migrate some tests to run as part of end-to-end tests instead of custom cluster. * Add mt_dop dimension to various end-to-end tests to provide coverage of join queries, spill-to-disk and cancellation. * Ran a single node TPC-H and TPC-DS stress test with mt_dop=0 and mt_dop=4. Perf: * Ran TPC-H scale factor 30 locally with mt_dop=0. No significant change. TODO: * Check that the builder is transferring the memory for max-row-size to the probe side. * Can we avoid regressing the mem consumption of NAAJ? Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 --- M be/src/exec/CMakeLists.txt M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.h A be/src/exec/join-builder.cc A be/src/exec/join-builder.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/nested-loop-join-node.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/initial-reservations.cc M be/src/runtime/row-batch.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/spillable-row-batch-queue.h M
[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14538 ) Change subject: IMPALA-4400: aggregate runtime filters locally .. Patch Set 25: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Gerrit-Change-Number: 14538 Gerrit-PatchSet: 25 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 29 Jan 2020 00:58:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14538 ) Change subject: IMPALA-4400: aggregate runtime filters locally .. IMPALA-4400: aggregate runtime filters locally Move RuntimeFilterBank to QueryState(). Implement fine-grained locking for each filter to mitigate any increased lock contention from the change. Make RuntimeFilterBank handle multiple producers of the same filter, e.g. multiple instances of a partitioned join. It computes the expected number of filters upfront then sends the filter to the coordinator once all the local instances have been merged together. The merging can be done in parallel locally to improve latency of filter propagation. Add Or() methods to MinMaxFilter and BloomFilter, since we now need to merge those, not just the thrift versions. Update coordinator filter routing to expect only one instance of a filter from each producer backend and to only send one instance to each consumer backend (instead of sending one per fragment). Update memory reservations and estimates to be lower to account for sharing of filters between fragment instances. mt_dop plans are modified to show these shared and non-shared resources separately. Enable waiting for runtime filters for kudu scanner with mt_dop. Made min/max filters const-correct. Testing * Added unit tests for Or() methods. * Added some additional e2e test coverage for mt_dop queries * Updated planner tests with new estimates and reservation. * Ran a single node 3-impalad stress test with TPC-H kudu and TPC-DS parquet. * Ran exhaustive tests. * Ran core tests with ASAN. Perf * Did a single-node perf run on TPC-H with default settings. No perf change. * Single-node perf run with mt_dop=8 showed significant speedups: +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 10.14 | -7.29% | 5.05 | -11.68%| +--+---+-++++ +--+--+---++-++---++---++-+-+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval| +--+--+---++-++---++---++-+-+ | TPCH(30) | TPCH-Q7 | parquet / none / none | 38.87 | 38.44 | +1.13% | 7.17% | * 10.92% * | 20| +0.72% | 0.72| 0.39| | TPCH(30) | TPCH-Q1 | parquet / none / none | 4.28 | 4.26| +0.50% | 1.92% | 1.09%| 20| +0.03% | 0.31| 1.01| | TPCH(30) | TPCH-Q22 | parquet / none / none | 2.32 | 2.32| +0.05% | 2.01% | 1.89%| 20| -0.03% | -0.36 | 0.08| | TPCH(30) | TPCH-Q15 | parquet / none / none | 3.73 | 3.75| -0.42% | 0.84% | 1.05%| 20| -0.25% | -0.77 | -1.40 | | TPCH(30) | TPCH-Q13 | parquet / none / none | 9.80 | 9.83| -0.38% | 0.51% | 0.80%| 20| -0.32% | -1.30 | -1.81 | | TPCH(30) | TPCH-Q2 | parquet / none / none | 1.98 | 2.00| -1.32% | 1.74% | 2.81%| 20| -0.64% | -1.71 | -1.79 | | TPCH(30) | TPCH-Q6 | parquet / none / none | 1.22 | 1.25| -2.14% | 2.66% | 4.15%| 20| -0.96% | -2.00 | -1.95 | | TPCH(30) | TPCH-Q19 | parquet / none / none | 5.13 | 5.22| -1.65% | 1.20% | 1.40%| 20| -1.76% | -3.34 | -4.02 | | TPCH(30) | TPCH-Q16 | parquet / none / none | 2.46 | 2.56| -4.13% | 2.49% | 1.99%| 20| -4.31% | -4.04 | -5.94 | | TPCH(30) | TPCH-Q9 | parquet / none / none | 81.63 | 85.07 | -4.05% | 4.94% | 3.06%| 20| -5.46% | -3.28 | -3.21 | | TPCH(30) | TPCH-Q10 | parquet / none / none | 5.07 | 5.50| I -7.92% | 0.96% | 1.33%| 20| I -8.51% | -5.27 | -22.14 | | TPCH(30) | TPCH-Q21 | parquet / none / none | 24.00 | 26.24 | I -8.57% | 0.46% | 0.38%| 20| I -9.34% | -5.27 | -67.47 | | TPCH(30) | TPCH-Q18 | parquet / none / none | 8.66 | 9.50| I -8.86% | 0.62% | 0.44%| 20| I -9.75% | -5.27 | -55.17 | | TPCH(30) | TPCH-Q3 | parquet / none / none | 6.01 | 6.70| I -10.19% | 1.01% | 0.90%| 20| I -11.25% | -5.27 |
[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/15023 ) Change subject: IMPALA-9075: Add support for reading zstd text files .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/15023/4/be/src/util/decompress.cc File be/src/util/decompress.cc: http://gerrit.cloudera.org:8080/#/c/15023/4/be/src/util/decompress.cc@665 PS4, Line 665: *stream_end = false; It seems to me that this line is redundant. *stream_end = true is only set if (input_buffer.pos == input_buffer.size) in which case we will break out of the loop. -- To view, visit http://gerrit.cloudera.org:8080/15023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 Gerrit-Change-Number: 15023 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Xiaomeng Zhang Gerrit-Comment-Date: Wed, 29 Jan 2020 00:30:41 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15119 ) Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0 .. Patch Set 1: Code-Review+2 Looks good to me -- To view, visit http://gerrit.cloudera.org:8080/15119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0 Gerrit-Change-Number: 15119 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 28 Jan 2020 23:59:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: > > Does ignore_noninstrumented_modules work on Linux? Looking at the > > LLVM source, https://reviews.llvm.org/D61708 doesn't look like > it's > > been merged. > > > > FWIW, it didn't exist (and certainly didn't work on Linux) when > > Kudu's TSAN support was added, which is why we went the other > > direction and recompile all of our dependencies with > > -fsanitize=thread if doing a TSAN build. > > Maybe https://reviews.llvm.org/D28263 fixed it? > > It definitely does something on my dev machine, which is running > Ubuntu 16.04. mm ignore that review link. I'll try to see if I can find the patch that fixed it, but confirmed it works on my dev machine. -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 22:32:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: > Does ignore_noninstrumented_modules work on Linux? Looking at the > LLVM source, https://reviews.llvm.org/D61708 doesn't look like it's > been merged. > > FWIW, it didn't exist (and certainly didn't work on Linux) when > Kudu's TSAN support was added, which is why we went the other > direction and recompile all of our dependencies with > -fsanitize=thread if doing a TSAN build. Maybe https://reviews.llvm.org/D28263 fixed it? It definitely does something on my dev machine, which is running Ubuntu 16.04. -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 22:30:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: Does ignore_noninstrumented_modules work on Linux? Looking at the LLVM source, https://reviews.llvm.org/D61708 doesn't look like it's been merged. FWIW, it didn't exist (and certainly didn't work on Linux) when Kudu's TSAN support was added, which is why we went the other direction and recompile all of our dependencies with -fsanitize=thread if doing a TSAN build. -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 22:01:59 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0
Attila Jeges has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15119 Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0 .. IMPALA-9279: Bump Kudu version to 3ba5ec5d0 This pulls in Kudu VARCHAR support which is needed for the Impala side of the Kudu/Impala VARCHAR integration. Testing: - Ran the C6 toolchain build job with the kudu version bump for native toolchain to make sure that it builds on all supported platforms. - Built Impala locally with kudu-3ba5ec5d0 and ran test_kudu.py E2E and AnalyzeKuduDDLTest FE tests. Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/19/15119/1 -- To view, visit http://gerrit.cloudera.org:8080/15119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0 Gerrit-Change-Number: 15119 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: (5 comments) >When I run Impala-TSAN locally, it uses the TSAN code under: > '/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'. > I would guess it should be using the LLVM from the toolchain? Although that > class doesn't seem to exist anywhere in the toolchain/ dir. I think that might be where the source was mounted during the toolchain build? >I'm a bit confused about running TSAN builds with/without the > -build_shared_libs option and how TSAN interacts with static vs dynamic > linking (trying to mimic Kudu's TSAN usage > https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385) I don't think I know any more >According to the TSAN docs, everything (including third party libs) should > be build with -fsanitize=thread > (https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code). > Not sure if we are doing that in Impala? We're not, the toolchain is just a regular release build. http://gerrit.cloudera.org:8080/#/c/15116/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15116/1//COMMIT_MSG@10 PS1, Line 10: existing -tsan build flag. -tsan_full is equivalent to the current -tsan Is -tsan_full useful? Wonder if we should just get rid of it. http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/reservation-tracker.h File be/src/runtime/bufferpool/reservation-tracker.h: http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/reservation-tracker.h@316 PS1, Line 316: AtomicInt64 reservation_; I think these members need some explanation that they can be read without holding the lock (otherwise it's confusing whether code should be holding the lock or not). http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/system-allocator.cc File be/src/runtime/bufferpool/system-allocator.cc: http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/system-allocator.cc@124 PS1, Line 124: #if defined(ADDRESS_SANITIZER) || defined(THREAD_SANITIZER) Did you run into this bug? I think it's meant to be fixed: https://bugs.llvm.org/show_bug.cgi?id=32968 http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator-backend-state.cc@179 PS1, Line 179: unique_lock lock(lock_); Could this cause a deadlock if we exit from one of the points below where 'lock_' is held?. I think maybe not since the destructor for lock_guard below runs first. But is there a way we can make the correctness more obvious? http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator.h@292 PS1, Line 292: AtomicBool has_called_wait_; // if true, Wait() was called I think the writer should still hold wait_lock_? -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 21:37:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14538 ) Change subject: IMPALA-4400: aggregate runtime filters locally .. Patch Set 25: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5474/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Gerrit-Change-Number: 14538 Gerrit-PatchSet: 25 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 20:04:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14538 ) Change subject: IMPALA-4400: aggregate runtime filters locally .. Patch Set 25: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Gerrit-Change-Number: 14538 Gerrit-PatchSet: 25 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 20:04:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15023 ) Change subject: IMPALA-9075: Add support for reading zstd text files .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5537/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 Gerrit-Change-Number: 15023 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Xiaomeng Zhang Gerrit-Comment-Date: Tue, 28 Jan 2020 19:58:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14538 ) Change subject: IMPALA-4400: aggregate runtime filters locally .. Patch Set 24: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Gerrit-Change-Number: 14538 Gerrit-PatchSet: 24 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 19:16:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files
Xiaomeng Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15023 ) Change subject: IMPALA-9075: Add support for reading zstd text files .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/15023/3/tests/custom_cluster/test_hive_text_codec_interop.py File tests/custom_cluster/test_hive_text_codec_interop.py: http://gerrit.cloudera.org:8080/#/c/15023/3/tests/custom_cluster/test_hive_text_codec_interop.py@26 PS3, Line 26: > flake8: F401 'tests.util.filesystem_utils.get_fs_path' imported but unused Done http://gerrit.cloudera.org:8080/#/c/15023/3/tests/custom_cluster/test_hive_text_codec_interop.py@63 PS3, Line 63: > flake8: E501 line too long (106 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/15023/3/tests/query_test/test_compressed_formats.py File tests/query_test/test_compressed_formats.py: http://gerrit.cloudera.org:8080/#/c/15023/3/tests/query_test/test_compressed_formats.py@264 PS3, Line 264: > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/15023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 Gerrit-Change-Number: 15023 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Xiaomeng Zhang Gerrit-Comment-Date: Tue, 28 Jan 2020 19:13:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files
Xiaomeng Zhang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/15023 ) Change subject: IMPALA-9075: Add support for reading zstd text files .. IMPALA-9075: Add support for reading zstd text files In this patch, we add support for reading zstd encoded text files. This includes: 1. support reading zstd file written by Hive which uses streaming. 2. support reading zstd file compressed by standard zstd library which uses block. To support decompressing both formats, a function ProcessBlockStreaming is added in zstd decompressor. Testing done: Added two backend tests: 1. streaming decompress test. 2. large data test for both block and streaming decompress. Added two end to end tests: 1. hive and impala integration. For four compression codecs, write in hive and read from impala. 2. zstd library and impala integration. Copy a zstd lib compressed file to HDFS, and read from impala. Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/decompress.cc M be/src/util/decompress.h M bin/rat_exclude_files.txt A testdata/data/text_large_zstd.zst A tests/custom_cluster/test_hive_text_codec_interop.py M tests/query_test/test_compressed_formats.py 10 files changed, 248 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/15023/4 -- To view, visit http://gerrit.cloudera.org:8080/15023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 Gerrit-Change-Number: 15023 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Xiaomeng Zhang
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5536/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 19:07:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: A few things I could use input on: * When I run Impala-TSAN locally, it uses the TSAN code under: '/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'. I would guess it should be using the LLVM from the toolchain? Although that class doesn't seem to exist anywhere in the toolchain/ dir. * I'm a bit confused about running TSAN builds with/without the -build_shared_libs option and how TSAN interacts with static vs dynamic linking (trying to mimic Kudu's TSAN usage https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385) * According to the TSAN docs, everything (including third party libs) should be build with -fsanitize=thread (https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code). Not sure if we are doing that in Impala? -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 18:37:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8800: Added support of Kudu DATE type to Impala
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14705 ) Change subject: IMPALA-8800: Added support of Kudu DATE type to Impala .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@363 PS8, Line 363: Unpartitioned Kudu tables are inefficient for large data sizes. > Kudu range partitions are analyzed here: https://github.com/apache/impala/b I agree that it would preferable to just fix this in this patch, unless it turns out to be highly non-trivial (which seems unlikely to me) -- To view, visit http://gerrit.cloudera.org:8080/14705 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91656749a58ac769b54c2a63bdd4f85c89520b32 Gerrit-Change-Number: 14705 Gerrit-PatchSet: 12 Gerrit-Owner: Volodymyr Verovkin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Volodymyr Verovkin Gerrit-Comment-Date: Tue, 28 Jan 2020 18:31:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15116 ) Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/common/init.cc File be/src/common/init.cc: http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/common/init.cc@423 PS1, Line 423: // https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code), line too long (95 > 90) -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Jan 2020 18:23:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs
Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15116 Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs .. IMPALA-5904: Add tsan_full option and fix several TSAN bugs This patch adds an additional build flag -tsan_full in addition to the existing -tsan build flag. -tsan_full is equivalent to the current -tsan behavior, and -tsan is changed to set the ignore_noninstrumented_modules flag to true. ignore_noninstrumented_modules causes TSAN to ignore any modules that are not TSAN-instrumented. This is necessary to get TSAN to play nicely with Java, since Java is not TSAN-instrumented (see https://wiki.openjdk.java.net/display/tsan/Main and JDK-8208520). While this might decrease the number of issues surfaced by TSAN, it drastically decreases the amount of noise produced by TSAN because the JVM is not running TSAN-instrumented code. Without this flag set to true, almost every single backend test fails with the error: WARNING: ThreadSanitizer: data race (pid=12939) Write of size 1 at 0x7fcbe379c4c6 by thread T31: #0 strncpy /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:650 (unifiedbetests+0x1b2a4ad) #1 (libjvm.so+0x90e706) This patch fixes various TSAN bugs (e.g. data races) reported while running backend tests and E2E against a TSAN build (it does not make Impala completely TSAN-clean). This patch makes the following changes: * Fixes several bugs involving issues with updating shared variables between threads * Fixes a few race conditions in test classes * Where possible, existing locks are used to fix any data races; in cases where the locking logic is non-trivial, atomics are used * There are a few places where variables are marked as 'volatile' presumably for synchronization purposes; TSAN flags these 'volatile' variables as unsafe, and according to https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rconc-volatile using 'volatile' for synchronization is dangerous; in these cases, the 'volatile' variables are changed to 'atomic' variables * This patch adds a suppression file (bin/tsan-suppresions.txt) similar to the UBSAN suppresion file (bin/ubsan-suppresions.txt) Testing: * Ran exhaustive tests * Manually re-ran backend tests against a TSAN build and made sure the reported errors are gone Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/init.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/bufferpool/system-allocator.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-stress.h M be/src/runtime/io/scan-range.cc M be/src/service/session-expiry-test.cc M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/util/runtime-profile-test.cc M be/src/util/stopwatch.h M bin/run-backend-tests.sh A bin/tsan-suppressions.txt M buildall.sh M tests/common/environ.py M tests/webserver/test_web_pages.py 30 files changed, 199 insertions(+), 94 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/15116/1 -- To view, visit http://gerrit.cloudera.org:8080/15116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Gerrit-Change-Number: 15116 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683 PS3, Line 683: dst_batch->tuple_data_pool() As we have discussed offline, this is no longer correct. Using the current batch's mempool means that the memory can be reclaimed by the consumer of this row batch (the caller of this node's GetNext()) after the current GetNext() call returned. So it cannot be used in future batches. Before your optimization, only those string were allocated here, which were used in the current row batch, so there was no problem. Now if batch_one starts the read rows from the Orc vector, and batch_two finishes it, the blob will belong to batch_one, so it can be already freed/reused when batch_two starts to use it. The solution is to give the scanner its own mempool, use that to initialize the blobs, and attach the blobs to the last row batch that reads rows from the given vector. For Parquet this is mempool is https://github.com/apache/impala/blob/4fa6b5260d9e28dee63a87de3bea1434706a9d05/be/src/exec/parquet/parquet-column-chunk-reader.h#L140 Parquet dictionaries use a different pool: https://github.com/apache/impala/blob/4fa6b5260d9e28dee63a87de3bea1434706a9d05/be/src/exec/parquet/hdfs-parquet-scanner.h#L430 Doing this in Orc will be much simpler, as the library hides compression/decoding of data, but the way buffers are attached to row batches should be the same. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 28 Jan 2020 16:24:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8800: Added support of Kudu DATE type to Impala
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14705 ) Change subject: IMPALA-8800: Added support of Kudu DATE type to Impala .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/14705/8/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: http://gerrit.cloudera.org:8080/#/c/14705/8/fe/src/main/java/org/apache/impala/util/KuduUtil.java@387 PS8, Line 387: isSupportedKeyType Date should be added here. See my comment in kudu_create.test http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@363 PS8, Line 363: Unpartitioned Kudu tables are inefficient for large data sizes. > I see. Thank you for clarifying on this and filing the issue. Kudu range partitions are analyzed here: https://github.com/apache/impala/blob/4fa6b5260d9e28dee63a87de3bea1434706a9d05/fe/src/main/java/org/apache/impala/analysis/RangePartition.java#L156 My first tip to solve the issue is to add dates as valid keys in KuduUtil.java - isSupportedKeyType() It is possible that further changes will be needed (timestamp columns already have some special hacks). I would prefer to add support for date range partitions in this change, as it seems a fairly common use case to me. -- To view, visit http://gerrit.cloudera.org:8080/14705 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91656749a58ac769b54c2a63bdd4f85c89520b32 Gerrit-Change-Number: 14705 Gerrit-PatchSet: 12 Gerrit-Owner: Volodymyr Verovkin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Volodymyr Verovkin Gerrit-Comment-Date: Tue, 28 Jan 2020 15:17:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9329: Don't add nested columns in TableMaskingView
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9329: Don't add nested columns in TableMaskingView .. Patch Set 4: Code-Review-1 (1 comment) > Patch Set 4: Code-Review+1 > > The patch looks good to me. Thanks for adding the tests. Just for my > understanding, do the nested columns never appear if any of the primitive > columns are masked? Sorry, I found this patch can't fix failures in queries like "select id, nested_struct.a from functional_parquet.complextypestbl". I prefer to fix IMPALA-9330 directly and I'm working on it. http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java File fe/src/main/java/org/apache/impala/authorization/TableMask.java: http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java@79 PS4, Line 79: SlotRef(Lists.newArrayList(colName)) > I think I understand the rest of the patch, but I don't know the effects of It's a bug to use SlotRef(String) since the colName is used as the alias and the rawPath_ is set to null. We should use SlotRef(List) as we do in sql-parser.cup. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 28 Jan 2020 14:55:46 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/15072 ) Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java artifacts .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af Gerrit-Change-Number: 15072 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Tue, 28 Jan 2020 14:51:30 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts
Attila Jeges has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15072 ) Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java artifacts .. IMPALA-9265: Support for toolchain Kudu to provide Java artifacts The build script was modified to generate Kudu JARs and add them to the Kudu tarball. redhat6 and redhat7 docker images were modified to update Java 8 to a newer version that is suitable for building the Java artifacts. ubuntu1404 docker image was modified to include CA certificate file for Java. Testing: Ran the C6 toolchain build job to verify that native-toolchain builds on all supported platforms. Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af Reviewed-on: http://gerrit.cloudera.org:8080/15072 Reviewed-by: Joe McDonnell Tested-by: Attila Jeges --- M docker/all/postinstall.sh A docker/redhat/Centos7-Vault.repo M docker/redhat6.df M docker/redhat7.df M docker/ubuntu1404.df M source/kudu/build.sh 6 files changed, 45 insertions(+), 6 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Attila Jeges: Verified -- To view, visit http://gerrit.cloudera.org:8080/15072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af Gerrit-Change-Number: 15072 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-9329: Don't add nested columns in TableMaskingView
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9329: Don't add nested columns in TableMaskingView .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java File fe/src/main/java/org/apache/impala/authorization/TableMask.java: http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java@79 PS4, Line 79: SlotRef(Lists.newArrayList(colName)) I think I understand the rest of the patch, but I don't know the effects of changing this constructor from string to array list. Is is this needed for the new tests to pass, or this an unrelated change? -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 28 Jan 2020 14:42:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4080 [part 1]: Move codegen code from aggregation exec nodes to their plan nodes
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15053 ) Change subject: IMPALA-4080 [part 1]: Move codegen code from aggregation exec nodes to their plan nodes .. Patch Set 6: Code-Review+2 (10 comments) Sorry for responding so late. I found a few nits (mainly in cut'n'pasted code) when a I went through the code. The general direction seems good to me. http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/aggregator.h File be/src/exec/aggregator.h: http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/aggregator.h@132 PS6, Line 132: const AggregatorConfig& config, const std::string& name); nit: indent +2 http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h@126 PS6, Line 126: ~GroupingAggregatorConfig() {} please add 'override' http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h@165 PS6, Line 165: nit: double space http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h@176 PS6, Line 176: // function. nit: +1 / http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.h File be/src/exec/hash-table.h: http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.h@51 PS6, Line 51: class ScalarExprsResultsRowLayout is a struct, which lead to the complaints by clang tidy. http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.cc@951 PS6, Line 951: nit: extra space http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h File be/src/exec/non-grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h@46 PS6, Line 46: ~NonGroupingAggregatorConfig() {} please add 'override' http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h@57 PS6, Line 57: Assumeing typo: assuming http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h@101 PS6, Line 101: refernce typo: reference http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exprs/scalar-expr.h@203 PS6, Line 203: static ScalarExprsResultsRowLayout ComputeResultsLayout( : const vector& exprs); optional: this could be a constructor in ScalarExprsResultsRowLayout -- To view, visit http://gerrit.cloudera.org:8080/15053 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58f52a262ac7d0af259d5bcda72ada93a851d3b2 Gerrit-Change-Number: 15053 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Jan 2020 14:23:07 + Gerrit-HasComments: Yes