[Impala-ASF-CR] WIP: IMPALA-4224: execute separate join builds fragments

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14859 )

Change subject: WIP: IMPALA-4224: execute separate join builds fragments
..


Patch Set 31:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5538/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
Gerrit-Change-Number: 14859
Gerrit-PatchSet: 31
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 29 Jan 2020 02:47:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-4224: execute separate join builds fragments

2020-01-28 Thread Tim Armstrong (Code Review)
Hello Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14859

to look at the new patch set (#31).

Change subject: WIP: IMPALA-4224: execute separate join builds fragments
..

WIP: IMPALA-4224: execute separate join builds fragments

This enables parallel plans with the join build in a
separate fragment and fixes all of the ensuing fallout.
After this change, mt_dop plans with joins have separate
build fragments. There is still a 1:1 relationship between
join nodes and builders, so the builders are only accessed
by the join node's thread after it is handed off. This lets
us defer the work required to make PhjBuilder and NljBuilder
safe to be shared between nodes.

Planner changes:
* Combined the parallel and distributed planning code paths.
* Misc fixes to generate reasonable thrift structures in the
  query exec requests, i.e. containing the right nodes.
* Fixes to resource calculations for the separate build plans.
** Calculate separate join/build resource consumption.
** Simplified the resource estimation by calculating resource
   consumption for each fragment separately, and assuming that
   all fragments hit their peak resource consumption at the
   same time. IMPALA-9255 is the follow-on to make the resource
   estimation more accurate.

Scheduler changes:
* Various fixes to handle multiple TPlanExecInfos correctly,
  which are generated by the planner for the different cohorts.
* Add logic to colocate build fragments with parent fragments.

Runtime filter changes:
* Build sinks now produce runtime filters, which required
  planner and coordinator fixes to handle.
   accordingly.

DataSink changes:
* Close the input plan tree before calling FlushFinal() to release
  resources. This depends on Send() not holding onto references
  to input batches, which was true except for NljBuilder. This
  invariant is documented.

Join builder changes:
* Add a common base class for PhjBuilder and NljBuilder with
  functions to handle synchronisation with the join node.
* Close plan tree earlier in FragmentInstanceState::Exec()
  so that peak resource requirements are lower.
* The NLJ always copies input batches, so that it can close
  its input tree.

JoinNode changes:
* Join node blocks waiting for build-side to be ready,
  then eventually signals that it's done, allowing the builder
  to be cleaned up.
* NLJ and PHJ nodes handle both the integrated builder and
  the external builder. There is a 1:1 relationship between
  the node and the builder, so we don't deal with thread safety
  yet.
* Buffer reservations are transferred between the builder and join
  node when running with the separate builder. This is not really
  necessary right now, since it is all single-threaded, but will
  be important for the shared broadcast.
  - The builder transfers memory for probe buffers to the join node
at the end of each build phase.
  - At end of each probe phase, reservation needs to be handed back
to builder (or released).

ExecSummary changes:
* The summary logic was modified to handle connecting fragments
  via join builds. The logic is an extension of what was used
  for exchanges.

Testing:
* Enable --unlock_mt_dop for end-to-end tests
* Migrate some tests to run as part of end-to-end tests instead of
  custom cluster.
* Add mt_dop dimension to various end-to-end tests to provide
  coverage of join queries, spill-to-disk and cancellation.
* Ran a single node TPC-H and TPC-DS stress test with mt_dop=0
  and mt_dop=4.

Perf:
* Ran TPC-H scale factor 30 locally with mt_dop=0. No significant
  change.

TODO:
* Check that the builder is transferring the memory for max-row-size
  to the probe side.
* Can we avoid regressing the mem consumption of NAAJ?

Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.h
A be/src/exec/join-builder.cc
A be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/nested-loop-join-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/initial-reservations.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/spillable-row-batch-queue.h
M 

[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
..


Patch Set 25: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Gerrit-Change-Number: 14538
Gerrit-PatchSet: 25
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 29 Jan 2020 00:58:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
..

IMPALA-4400: aggregate runtime filters locally

Move RuntimeFilterBank to QueryState(). Implement fine-grained
locking for each filter to mitigate any increased lock
contention from the change.

Make RuntimeFilterBank handle multiple producers of the
same filter, e.g. multiple instances of a partitioned
join. It computes the expected number of filters upfront
then sends the filter to the coordinator once all the
local instances have been merged together. The merging
can be done in parallel locally to improve latency of
filter propagation.

Add Or() methods to MinMaxFilter and BloomFilter, since
we now need to merge those, not just the thrift versions.

Update coordinator filter routing to expect only one
instance of a filter from each producer backend and
to only send one instance to each consumer backend
(instead of sending one per fragment).

Update memory reservations and estimates to be lower
to account for sharing of filters between fragment
instances. mt_dop plans are modified to show these
shared and non-shared resources separately.

Enable waiting for runtime filters for kudu scanner
with mt_dop.

Made min/max filters const-correct.

Testing
* Added unit tests for Or() methods.
* Added some additional e2e test coverage for mt_dop queries
* Updated planner tests with new estimates and reservation.
* Ran a single node 3-impalad stress test with TPC-H kudu and
  TPC-DS parquet.
* Ran exhaustive tests.
* Ran core tests with ASAN.

Perf
* Did a single-node perf run on TPC-H with default settings. No perf change.
* Single-node perf run with mt_dop=8 showed significant speedups:

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(30) | parquet / none / none | 10.14   | -7.29% | 5.05   | 
-11.68%|
+--+---+-++++

+--+--+---++-++---++---++-+-+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval|
+--+--+---++-++---++---++-+-+
| TPCH(30) | TPCH-Q7  | parquet / none / none | 38.87  | 38.44   |   +1.13% 
  |   7.17%   | * 10.92% * | 20|   +0.72%   | 0.72| 0.39|
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.28   | 4.26|   +0.50% 
  |   1.92%   |   1.09%| 20|   +0.03%   | 0.31| 1.01|
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.32   | 2.32|   +0.05% 
  |   2.01%   |   1.89%| 20|   -0.03%   | -0.36   | 0.08|
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.73   | 3.75|   -0.42% 
  |   0.84%   |   1.05%| 20|   -0.25%   | -0.77   | -1.40   |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.80   | 9.83|   -0.38% 
  |   0.51%   |   0.80%| 20|   -0.32%   | -1.30   | -1.81   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.98   | 2.00|   -1.32% 
  |   1.74%   |   2.81%| 20|   -0.64%   | -1.71   | -1.79   |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.22   | 1.25|   -2.14% 
  |   2.66%   |   4.15%| 20|   -0.96%   | -2.00   | -1.95   |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 5.13   | 5.22|   -1.65% 
  |   1.20%   |   1.40%| 20|   -1.76%   | -3.34   | -4.02   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.46   | 2.56|   -4.13% 
  |   2.49%   |   1.99%| 20|   -4.31%   | -4.04   | -5.94   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 81.63  | 85.07   |   -4.05% 
  |   4.94%   |   3.06%| 20|   -5.46%   | -3.28   | -3.21   |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.07   | 5.50| I -7.92% 
  |   0.96%   |   1.33%| 20| I -8.51%   | -5.27   | -22.14  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 24.00  | 26.24   | I -8.57% 
  |   0.46%   |   0.38%| 20| I -9.34%   | -5.27   | -67.47  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 8.66   | 9.50| I -8.86% 
  |   0.62%   |   0.44%| 20| I -9.75%   | -5.27   | -55.17  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 6.01   | 6.70| I 
-10.19%  |   1.01%   |   0.90%| 20| I -11.25%  | -5.27   | 

[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files

2020-01-28 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15023 )

Change subject: IMPALA-9075: Add support for reading zstd text files
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15023/4/be/src/util/decompress.cc
File be/src/util/decompress.cc:

http://gerrit.cloudera.org:8080/#/c/15023/4/be/src/util/decompress.cc@665
PS4, Line 665: *stream_end = false;
It seems to me that this line is redundant.
*stream_end = true
is only set if (input_buffer.pos == input_buffer.size)
in which case we will break out of the loop.



--
To view, visit http://gerrit.cloudera.org:8080/15023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
Gerrit-Change-Number: 15023
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Xiaomeng Zhang 
Gerrit-Comment-Date: Wed, 29 Jan 2020 00:30:41 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0

2020-01-28 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15119 )

Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0
..


Patch Set 1: Code-Review+2

Looks good to me


--
To view, visit http://gerrit.cloudera.org:8080/15119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
Gerrit-Change-Number: 15119
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 28 Jan 2020 23:59:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

> > Does ignore_noninstrumented_modules work on Linux? Looking at the
 > > LLVM source, https://reviews.llvm.org/D61708 doesn't look like
 > it's
 > > been merged.
 > >
 > > FWIW, it didn't exist (and certainly didn't work on Linux) when
 > > Kudu's TSAN support was added, which is why we went the other
 > > direction and recompile all of our dependencies with
 > > -fsanitize=thread if doing a TSAN build.
 >
 > Maybe https://reviews.llvm.org/D28263 fixed it?
 >
 > It definitely does something on my dev machine, which is running
 > Ubuntu 16.04.

mm ignore that review link. I'll try to see if I can find the patch that fixed 
it, but confirmed it works on my dev machine.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 22:32:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

> Does ignore_noninstrumented_modules work on Linux? Looking at the
 > LLVM source, https://reviews.llvm.org/D61708 doesn't look like it's
 > been merged.
 >
 > FWIW, it didn't exist (and certainly didn't work on Linux) when
 > Kudu's TSAN support was added, which is why we went the other
 > direction and recompile all of our dependencies with
 > -fsanitize=thread if doing a TSAN build.

Maybe https://reviews.llvm.org/D28263 fixed it?

It definitely does something on my dev machine, which is running Ubuntu 16.04.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 22:30:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

Does ignore_noninstrumented_modules work on Linux? Looking at the LLVM source, 
https://reviews.llvm.org/D61708 doesn't look like it's been merged.

FWIW, it didn't exist (and certainly didn't work on Linux) when Kudu's TSAN 
support was added, which is why we went the other direction and recompile all 
of our dependencies with -fsanitize=thread if doing a TSAN build.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 22:01:59 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0

2020-01-28 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15119


Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0
..

IMPALA-9279: Bump Kudu version to 3ba5ec5d0

This pulls in Kudu VARCHAR support which is needed for the Impala
side of the Kudu/Impala VARCHAR integration.

Testing:
- Ran the C6 toolchain build job with the kudu version bump for
  native toolchain to make sure that it builds on all supported
  platforms.
- Built Impala locally with kudu-3ba5ec5d0 and ran test_kudu.py
  E2E and AnalyzeKuduDDLTest FE tests.

Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/19/15119/1
--
To view, visit http://gerrit.cloudera.org:8080/15119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
Gerrit-Change-Number: 15119
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

(5 comments)

>When I run Impala-TSAN locally, it uses the TSAN code under: 
> '/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'.
>  I would guess it should be using the LLVM from the toolchain? Although that 
> class doesn't seem to exist anywhere in the toolchain/ dir.

I think that might be where the source was mounted during the toolchain build?

>I'm a bit confused about running TSAN builds with/without the 
> -build_shared_libs option and how TSAN interacts with static vs dynamic 
> linking (trying to mimic Kudu's TSAN usage 
> https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385)
I don't think I know any more

>According to the TSAN docs, everything (including third party libs) should 
> be build with -fsanitize=thread 
> (https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code).
>  Not sure if we are doing that in Impala?
We're not, the toolchain is just a regular release build.

http://gerrit.cloudera.org:8080/#/c/15116/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15116/1//COMMIT_MSG@10
PS1, Line 10: existing -tsan build flag. -tsan_full is equivalent to the 
current -tsan
Is -tsan_full useful? Wonder if we should just get rid of it.


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/reservation-tracker.h@316
PS1, Line 316:   AtomicInt64 reservation_;
I think these members need some explanation that they can be read without 
holding the lock (otherwise it's confusing whether code should be holding the 
lock or not).


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/system-allocator.cc
File be/src/runtime/bufferpool/system-allocator.cc:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/system-allocator.cc@124
PS1, Line 124: #if defined(ADDRESS_SANITIZER) || defined(THREAD_SANITIZER)
Did you run into this bug? I think it's meant to be fixed: 
https://bugs.llvm.org/show_bug.cgi?id=32968


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator-backend-state.cc@179
PS1, Line 179:   unique_lock lock(lock_);
Could this cause a deadlock if we exit from one of the points below where 
'lock_' is held?. I think maybe not since the destructor for lock_guard below 
runs first. But is there a way we can make the correctness more obvious?


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator.h@292
PS1, Line 292:   AtomicBool has_called_wait_;  // if true, Wait() was called
I think the writer should still hold wait_lock_?



--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 21:37:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
..


Patch Set 25:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5474/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Gerrit-Change-Number: 14538
Gerrit-PatchSet: 25
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 20:04:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
..


Patch Set 25: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Gerrit-Change-Number: 14538
Gerrit-PatchSet: 25
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 20:04:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15023 )

Change subject: IMPALA-9075: Add support for reading zstd text files
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5537/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
Gerrit-Change-Number: 15023
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Xiaomeng Zhang 
Gerrit-Comment-Date: Tue, 28 Jan 2020 19:58:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally

2020-01-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
..


Patch Set 24: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Gerrit-Change-Number: 14538
Gerrit-PatchSet: 24
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 19:16:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files

2020-01-28 Thread Xiaomeng Zhang (Code Review)
Xiaomeng Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15023 )

Change subject: IMPALA-9075: Add support for reading zstd text files
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15023/3/tests/custom_cluster/test_hive_text_codec_interop.py
File tests/custom_cluster/test_hive_text_codec_interop.py:

http://gerrit.cloudera.org:8080/#/c/15023/3/tests/custom_cluster/test_hive_text_codec_interop.py@26
PS3, Line 26:
> flake8: F401 'tests.util.filesystem_utils.get_fs_path' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/15023/3/tests/custom_cluster/test_hive_text_codec_interop.py@63
PS3, Line 63:
> flake8: E501 line too long (106 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/15023/3/tests/query_test/test_compressed_formats.py
File tests/query_test/test_compressed_formats.py:

http://gerrit.cloudera.org:8080/#/c/15023/3/tests/query_test/test_compressed_formats.py@264
PS3, Line 264:
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/15023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
Gerrit-Change-Number: 15023
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Xiaomeng Zhang 
Gerrit-Comment-Date: Tue, 28 Jan 2020 19:13:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9075: Add support for reading zstd text files

2020-01-28 Thread Xiaomeng Zhang (Code Review)
Xiaomeng Zhang has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/15023 )

Change subject: IMPALA-9075: Add support for reading zstd text files
..

IMPALA-9075: Add support for reading zstd text files

In this patch, we add support for reading zstd encoded text files.
This includes:
1. support reading zstd file written by Hive which uses streaming.
2. support reading zstd file compressed by standard zstd library which
uses block.
To support decompressing both formats, a function ProcessBlockStreaming
is added in zstd decompressor.

Testing done:
Added two backend tests:
1. streaming decompress test.
2. large data test for both block and streaming decompress.
Added two end to end tests:
1. hive and impala integration. For four compression codecs, write in
hive and read from impala.
2. zstd library and impala integration. Copy a zstd lib compressed file
to HDFS, and read from impala.

Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/decompress.cc
M be/src/util/decompress.h
M bin/rat_exclude_files.txt
A testdata/data/text_large_zstd.zst
A tests/custom_cluster/test_hive_text_codec_interop.py
M tests/query_test/test_compressed_formats.py
10 files changed, 248 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/15023/4
--
To view, visit http://gerrit.cloudera.org:8080/15023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
Gerrit-Change-Number: 15023
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Xiaomeng Zhang 


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5536/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 19:07:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

A few things I could use input on:

* When I run Impala-TSAN locally, it uses the TSAN code under: 
'/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'.
 I would guess it should be using the LLVM from the toolchain? Although that 
class doesn't seem to exist anywhere in the toolchain/ dir.

* I'm a bit confused about running TSAN builds with/without the 
-build_shared_libs option and how TSAN interacts with static vs dynamic linking 
(trying to mimic Kudu's TSAN usage 
https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385)

* According to the TSAN docs, everything (including third party libs) should be 
build with -fsanitize=thread 
(https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code).
 Not sure if we are doing that in Impala?


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 18:37:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8800: Added support of Kudu DATE type to Impala

2020-01-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14705 )

Change subject: IMPALA-8800: Added support of Kudu DATE type to Impala
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@363
PS8, Line 363: Unpartitioned Kudu tables are inefficient for large data sizes.
> Kudu range partitions are analyzed here: https://github.com/apache/impala/b
I agree that it would preferable to just fix this in this patch, unless it 
turns out to be highly non-trivial (which seems unlikely to me)



--
To view, visit http://gerrit.cloudera.org:8080/14705
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91656749a58ac769b54c2a63bdd4f85c89520b32
Gerrit-Change-Number: 14705
Gerrit-PatchSet: 12
Gerrit-Owner: Volodymyr Verovkin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Volodymyr Verovkin 
Gerrit-Comment-Date: Tue, 28 Jan 2020 18:31:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/common/init.cc@423
PS1, Line 423:   // 
https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code),
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jan 2020 18:23:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15116


Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..

IMPALA-5904: Add tsan_full option and fix several TSAN bugs

This patch adds an additional build flag -tsan_full in addition to the
existing -tsan build flag. -tsan_full is equivalent to the current -tsan
behavior, and -tsan is changed to set the ignore_noninstrumented_modules
flag to true. ignore_noninstrumented_modules causes TSAN to ignore any
modules that are not TSAN-instrumented. This is necessary to get TSAN to
play nicely with Java, since Java is not TSAN-instrumented (see
https://wiki.openjdk.java.net/display/tsan/Main and JDK-8208520). While
this might decrease the number of issues surfaced by TSAN, it drastically
decreases the amount of noise produced by TSAN because the JVM is not
running TSAN-instrumented code. Without this flag set to true, almost
every single backend test fails with the error:

WARNING: ThreadSanitizer: data race (pid=12939)
  Write of size 1 at 0x7fcbe379c4c6 by thread T31:
#0 strncpy 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:650
 (unifiedbetests+0x1b2a4ad)
#1   (libjvm.so+0x90e706)

This patch fixes various TSAN bugs (e.g. data races) reported while
running backend tests and E2E against a TSAN build (it does not make
Impala completely TSAN-clean). This patch makes the following changes:
* Fixes several bugs involving issues with updating shared variables
  between threads
* Fixes a few race conditions in test classes
* Where possible, existing locks are used to fix any data races; in cases
  where the locking logic is non-trivial, atomics are used
* There are a few places where variables are marked as 'volatile'
  presumably for synchronization purposes; TSAN flags these 'volatile'
  variables as unsafe, and according to
  
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rconc-volatile
  using 'volatile' for synchronization is dangerous; in these cases, the
  'volatile' variables are changed to 'atomic' variables
* This patch adds a suppression file (bin/tsan-suppresions.txt) similar to
  the UBSAN suppresion file (bin/ubsan-suppresions.txt)

Testing:
* Ran exhaustive tests
* Manually re-ran backend tests against a TSAN build and made sure the
reported errors are gone

Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/init.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-stress.h
M be/src/runtime/io/scan-range.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/util/runtime-profile-test.cc
M be/src/util/stopwatch.h
M bin/run-backend-tests.sh
A bin/tsan-suppressions.txt
M buildall.sh
M tests/common/environ.py
M tests/webserver/test_web_pages.py
30 files changed, 199 insertions(+), 94 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/15116/1
--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683
PS3, Line 683:  dst_batch->tuple_data_pool()
As we have discussed offline, this is no longer correct. Using the current 
batch's mempool means that the memory can be reclaimed by the consumer of this 
row batch (the caller of this node's GetNext()) after the current GetNext() 
call returned. So it cannot be used in future batches.

Before your optimization, only those string were allocated here, which were 
used in the current row batch, so there was no problem. Now if batch_one starts 
the read rows from the Orc vector, and batch_two finishes it, the blob will 
belong to batch_one, so it can be already freed/reused when batch_two starts to 
use it.

The solution is to give the scanner its own mempool, use that to initialize the 
blobs, and attach the blobs to the last row batch that reads rows from the 
given vector.

For Parquet this is mempool is 
https://github.com/apache/impala/blob/4fa6b5260d9e28dee63a87de3bea1434706a9d05/be/src/exec/parquet/parquet-column-chunk-reader.h#L140

Parquet dictionaries use a different pool: 
https://github.com/apache/impala/blob/4fa6b5260d9e28dee63a87de3bea1434706a9d05/be/src/exec/parquet/hdfs-parquet-scanner.h#L430

Doing this in Orc will be much simpler, as the library hides 
compression/decoding of data, but the way buffers are attached to row batches 
should be the same.



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 28 Jan 2020 16:24:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8800: Added support of Kudu DATE type to Impala

2020-01-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14705 )

Change subject: IMPALA-8800: Added support of Kudu DATE type to Impala
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14705/8/fe/src/main/java/org/apache/impala/util/KuduUtil.java
File fe/src/main/java/org/apache/impala/util/KuduUtil.java:

http://gerrit.cloudera.org:8080/#/c/14705/8/fe/src/main/java/org/apache/impala/util/KuduUtil.java@387
PS8, Line 387: isSupportedKeyType
Date should be added here. See my comment in kudu_create.test


http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/14705/8/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@363
PS8, Line 363: Unpartitioned Kudu tables are inefficient for large data sizes.
> I see.  Thank you for clarifying on this and filing the issue.
Kudu range partitions are analyzed here: 
https://github.com/apache/impala/blob/4fa6b5260d9e28dee63a87de3bea1434706a9d05/fe/src/main/java/org/apache/impala/analysis/RangePartition.java#L156

My first tip to solve the issue is to add dates as valid keys in KuduUtil.java 
- isSupportedKeyType()

It is possible that further changes will be needed (timestamp columns already 
have some special hacks).

I would prefer to add support for date range partitions in this change, as it 
seems a fairly common use case to me.



--
To view, visit http://gerrit.cloudera.org:8080/14705
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91656749a58ac769b54c2a63bdd4f85c89520b32
Gerrit-Change-Number: 14705
Gerrit-PatchSet: 12
Gerrit-Owner: Volodymyr Verovkin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Volodymyr Verovkin 
Gerrit-Comment-Date: Tue, 28 Jan 2020 15:17:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9329: Don't add nested columns in TableMaskingView

2020-01-28 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15108 )

Change subject: IMPALA-9329: Don't add nested columns in TableMaskingView
..


Patch Set 4: Code-Review-1

(1 comment)

> Patch Set 4: Code-Review+1
>
> The patch looks good to me. Thanks for adding the tests. Just for my 
> understanding, do the nested columns never appear if any of the primitive 
> columns are masked?

Sorry, I found this patch can't fix failures in queries like "select id, 
nested_struct.a from functional_parquet.complextypestbl". I prefer to fix 
IMPALA-9330 directly and I'm working on it.

http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java
File fe/src/main/java/org/apache/impala/authorization/TableMask.java:

http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java@79
PS4, Line 79: SlotRef(Lists.newArrayList(colName))
> I think I understand the rest of the patch, but I don't know the effects of
It's a bug to use SlotRef(String) since the colName is used as the alias and 
the rawPath_  is set to null. We should use SlotRef(List) as we do in 
sql-parser.cup.



--
To view, visit http://gerrit.cloudera.org:8080/15108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791
Gerrit-Change-Number: 15108
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 28 Jan 2020 14:55:46 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-28 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Tue, 28 Jan 2020 14:51:30 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-28 Thread Attila Jeges (Code Review)
Attila Jeges has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..

IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

The build script was modified to generate Kudu JARs and add them
to the Kudu tarball.

redhat6 and redhat7 docker images were modified to update Java 8
to a newer version that is suitable for building the Java
artifacts.

ubuntu1404 docker image was modified to include CA certificate
file for Java.

Testing:
Ran the C6 toolchain build job to verify that native-toolchain
builds on all supported platforms.

Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Reviewed-on: http://gerrit.cloudera.org:8080/15072
Reviewed-by: Joe McDonnell 
Tested-by: Attila Jeges 
---
M docker/all/postinstall.sh
A docker/redhat/Centos7-Vault.repo
M docker/redhat6.df
M docker/redhat7.df
M docker/ubuntu1404.df
M source/kudu/build.sh
6 files changed, 45 insertions(+), 6 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Attila Jeges: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-9329: Don't add nested columns in TableMaskingView

2020-01-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15108 )

Change subject: IMPALA-9329: Don't add nested columns in TableMaskingView
..


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java
File fe/src/main/java/org/apache/impala/authorization/TableMask.java:

http://gerrit.cloudera.org:8080/#/c/15108/4/fe/src/main/java/org/apache/impala/authorization/TableMask.java@79
PS4, Line 79: SlotRef(Lists.newArrayList(colName))
I think I understand the rest of the patch, but I don't know the effects of 
changing this constructor from string to array list. Is is this needed for the 
new tests to pass, or this an unrelated change?



--
To view, visit http://gerrit.cloudera.org:8080/15108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791
Gerrit-Change-Number: 15108
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 28 Jan 2020 14:42:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080 [part 1]: Move codegen code from aggregation exec nodes to their plan nodes

2020-01-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15053 )

Change subject: IMPALA-4080 [part 1]: Move codegen code from aggregation exec 
nodes to their plan nodes
..


Patch Set 6: Code-Review+2

(10 comments)

Sorry for responding so late. I found a few nits (mainly in cut'n'pasted code) 
when a I went through the code. The general direction seems good to me.

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/aggregator.h
File be/src/exec/aggregator.h:

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/aggregator.h@132
PS6, Line 132: const AggregatorConfig& config, const std::string& name);
nit: indent +2


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h@126
PS6, Line 126:   ~GroupingAggregatorConfig() {}
please add 'override'


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h@165
PS6, Line 165:
nit: double space


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/grouping-aggregator.h@176
PS6, Line 176:   // function.
nit: +1 /


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.h@51
PS6, Line 51: class
ScalarExprsResultsRowLayout is a struct, which lead to the complaints by clang 
tidy.


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/hash-table.cc@951
PS6, Line 951:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h
File be/src/exec/non-grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h@46
PS6, Line 46:   ~NonGroupingAggregatorConfig() {}
please add 'override'


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h@57
PS6, Line 57: Assumeing
typo: assuming


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exec/non-grouping-aggregator.h@101
PS6, Line 101: refernce
typo: reference


http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/15053/6/be/src/exprs/scalar-expr.h@203
PS6, Line 203:   static ScalarExprsResultsRowLayout ComputeResultsLayout(
 :   const vector& exprs);
optional: this could be a constructor in ScalarExprsResultsRowLayout



--
To view, visit http://gerrit.cloudera.org:8080/15053
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I58f52a262ac7d0af259d5bcda72ada93a851d3b2
Gerrit-Change-Number: 15053
Gerrit-PatchSet: 6
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 14:23:07 +
Gerrit-HasComments: Yes