[Impala-ASF-CR] IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8

2017-06-22 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8
..


Patch Set 6:

> Maybe the associated JIRA is a better place to have this
 > discussion, but I am trying to understand the purpose of this
 > patch. Why do we need FE tests to run against Java 8? Java 8 will
 > be officially supported in C6 and until then we can't use any of
 > the Java 8 features. Once we do switch to Java 8, can't we simply
 > change the test results? There is only one case where order really
 > matters (authorization).

C6 is a Cloudera matter, not one for Apache Impala.

-- 
To view, visit http://gerrit.cloudera.org:8080/7073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iad8e1dccec3a51293a109c420bd2b88b9d1e0625
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8

2017-06-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8
..


Patch Set 6:

Maybe the associated JIRA is a better place to have this discussion, but I am 
trying to understand the purpose of this patch. Why do we need FE tests to run 
against Java 8? Java 8 will be officially supported in C6 and until then we 
can't use any of the Java 8 features. Once we do switch to Java 8, can't we 
simply change the test results? There is only one case where order really 
matters (authorization).

-- 
To view, visit http://gerrit.cloudera.org:8080/7073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iad8e1dccec3a51293a109c420bd2b88b9d1e0625
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 2:

Build failed due to what looks like problems with the Ubuntu package mirrors 
hosted on AWS. Starting it again.

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/782/

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-22 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7232/4//COMMIT_MSG
Commit Message:

Line 34: 
Comment that we are changing the tests to explicitly set num_disks=1 for now 
because the test code isn't yet aware of how many disks are actually on the 
test system and the test code also isn't updated to actually use multiple disks.
We should also have the JIRA to reference.


http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr-stress-test.cc
File be/src/runtime/disk-io-mgr-stress-test.cc:

Line 31: const int NUM_DISKS = 1;
comment why we are setting this to 1 for now, i.e. that the test code isn't yet 
aware of how many disks are actually on the test system and the test code also 
isn't updated to actually use multiple disks.

We should also have the JIRA to reference.


http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr-test.cc
File be/src/runtime/disk-io-mgr-test.cc:

Line 212:   const int num_disks = 1;
leave a TODO with the JIRA where we'll change this, same for the other cases 
below.


http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

PS4, Line 87: // Rotational disks should have 1 thread per disk to minimize 
seeks.  Non-rotational
: // don't have this penalty and benefit from multiple concurrent 
IO requests.
: static const int THREADS_PER_ROTATIONAL_DISK = 1;
: static const int THREADS_PER_FLASH_DISK = 8;
I don't think there's any reason to keep these separately and then check if 
FLAGS_x are non-zero. Let's just make these the default value of those flags, 
then we can simplify the logic in Init()


PS4, Line 263: 
num_io_threads_per_rotational_disk_(FLAGS_num_io_threads_per_rotational_disk),
 : 
num_io_threads_per_solid_state_disk_(FLAGS_num_io_threads_per_solid_state_disk),
I think we can configure num_io_threads_per_rotational_disk_ and 
num_io_threads_per_solid_state_disk_ based on the flags once, rather than 
having to have the logic in Init() multiple times.

E.g.

if (FLAGS_num_io_threads_per_rotational_disk > 0) {
  num_io_threads_per_rotational_disk_ = 
FLAGS_num_io_threads_per_rotational_disk;
} else if (FLAGS_num_io_threads_per_disk > 0) {
  num_io_threads_per_rotational_disk_ = FLAGS_num_io_threads_per_disk;
} else {
  num_io_threads_per_rotational_disk_ = DEFAULT_NUM_PER_ROT_DISK;
}

same for ssd.


Line 278: " disks";
To be consistent we should also warn if the value is negative


PS4, Line 367: } else if (num_io_threads_per_rotational_disk_ != 0 && 
DiskInfo::is_rotational(i)) {
 :   num_threads_per_disk = num_io_threads_per_rotational_disk_;
 : } else if (num_io_threads_per_solid_state_disk_ != 0 && 
!DiskInfo::is_rotational(i)) {
 :   num_threads_per_disk = 
num_io_threads_per_solid_state_disk_;
 : } else if (FLAGS_num_threads_per_disk != 0) {
 :   num_threads_per_disk = FLAGS_num_threads_per_disk;
 : } else if (DiskInfo::is_rotational(i)) {
 :   num_threads_per_disk = THREADS_PER_ROTATIONAL_DISK;
 : } else {
 :   num_threads_per_disk = THREADS_PER_FLASH_DISK;
 : }
After my suggestion in the constructor this becomes greatly simplified.


-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 2: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/781/

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-22 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#4).

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..

IMPALA-5240: Allow config of number of disk I/O threads per disk type

Currently Impala defaults to 8 threads per flash disk and 1 thread per
rotational disk. This can be overridden with --num_threads_per_disk,
but that sets the thread count independent of disk type.

This would allow control of the number of disk I/O threads spawned
independently for solid state disks using
"--num_io_threads_per_solid_state_disk" and for rotational disks using
"--num_io_threads_per_rotational_disk" as startup parameters.

Testing:
Added backend tests that verify if "num_threads_per_disk",
"num_io_threads_per_solid_state_disk" and
"num_io_threads_per_rotational_disk" are used to spawn threads
appropriately

TODO (Request comments on this additional change):
Additionally made some changes to fix a bug, where impala would crash
if num_disks are set more than the number of logical disks on system and
num_threads_per_disk is not set. This bug also lets the user create more
disk queues than the num of logical disks which would eventually never be
used. As a part of this effort, existing test cases in disk-io-mgr-test
were identified that exploited this bug and hence have been fixed in
this commit. Moreover, after this fix, if num_disks is set to more than
the number of logical disks then it will default to max available disks
and log a warning.

Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
---
M be/src/runtime/disk-io-mgr-stress-test.cc
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/thread.cc
M be/src/util/thread.h
M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
8 files changed, 393 insertions(+), 341 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] DRAFT - IMPALA-5498: Support for partial sorts

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: DRAFT - IMPALA-5498: Support for partial sorts
..


Patch Set 1:

(2 comments)

The overall approach makes sense to me.

Happy to talk about the computeResourceProfile() stuff in the planner if that 
would be helpful.

http://gerrit.cloudera.org:8080/#/c/7267/1/be/src/exec/partial-sort-node.cc
File be/src/exec/partial-sort-node.cc:

Line 104:   RETURN_IF_ERROR(sorter_->Open());
I think you'll want to wait until the next call into GetNext() to re-open the 
sorter. At this point row_batch may still be holding onto memory that Open() 
will want to use.


Line 115:   do {
I think it would make sense to open the sorter here (if it's not already open).


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] DRAFT - IMPALA-5498: Support for partial sorts

2017-06-22 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: DRAFT - IMPALA-5498: Support for partial sorts
..


Patch Set 1:

(4 comments)

I think the interface changes make sense. I'm not digging into the code details 
much yet.

http://gerrit.cloudera.org:8080/#/c/7267/1/be/src/exec/partial-sort-node.h
File be/src/exec/partial-sort-node.h:

PS1, Line 30: a single tuple
can you clarify, this is a bit unclear


PS1, Line 34: If a merge phase was performed in the sort, sorted rows are deep 
copied into
: /// the output batch.
I thought there wouldn't need to be a merge


http://gerrit.cloudera.org:8080/#/c/7267/1/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

PS1, Line 353:   2: required bool use_top_n;
this will move to TSortType, right?


PS1, Line 355: 3: optional i64 offset
I don't think we'll need to implement the behavior to support this for partial 
sorts, and if we don't we should be clear about what this applies to: top N 
only I believe


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG
Commit Message:

PS3, Line 15: a plan node
> Unless we are sure this is a common user scenario, I lean towards a default
If users don't have any queries small enough for this to kick in then there's 
no chance of them regressing. All the small queries I've looked at show a nice 
benefit from this so the two like scenarios in my mind are:
* all queries are too large and it has no effect
* there are small queries and it noticably improves the response time of the 
majority of them (maybe regressing a small fraction to some degree)

I suspect that these kind of queries are pretty common if people are doing 
exploratory work with hand-written queries or queries generated by tools (one 
of the motivations for this was that a BI tool was generating queries with 
joins with complex expressions over moderate amounts of data). I've seen 
queries that would benefit from this on other clusters as well.

I'd be reluctant to merge this if it's off by default since realistically most 
users won't know to turn it on. I think shipping optimisations off by default 
either means our defaults are bad or we don't have enough confidence in the 
optimisation to justify the work. I'm open to adjusting the threshold to a 
lower number.


http://gerrit.cloudera.org:8080/#/c/7153/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

Line 497: 
query_options->__set_disable_codegen_rows_threshold(atoi(value.c_str()));
> Consider using StringParser::StringToInt() like in other places here, so we
Done


http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 255:   // If the number of rows per node is below the threshold codegen 
will be automatically
> If the number of rows processed per node ...
Done


http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 279:   // If the number of rows per node is below the threshold and 
disable_codegen is unset,
> If the number of rows processed per node
Done


http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 151: 
> Needs PlannerTests. You can do something like PlannerTest.testComputeStatsM
Happy to do this but will defer the work until we have a clear decision about 
whether this is worth doing.


Line 152: checkForDisableCodegen(rootFragment.getPlanRoot());
> Add comment why we have to do this on the fragmented plan
Done


Line 477: boolean isSmallQuery = visitor.valid() && 
visitor.getMaxRowsProcessed() < threshold;
> remove visitor.valid() and inline result into the if
Done


Line 492:   private void checkForDisableCodegen(PlanNode planRoot) {
> planRoot -> distributedPlan
Done


Line 493: // Determine the maximum number of rows processed by a instance 
of a plan node.
> This comments reads like the comment on MaxRowsProcessedVisitor.maxRowsProc
Reworked this comment to explain the motivation for using the number of rows 
per node as a threshold.


http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 32:   // True if we should abort because if we don't have valid estimates
> remove second "if"
Done


Line 47: if (caller instanceof HashJoinNode || caller instanceof 
NestedLoopJoinNode) {
> if (caller instanceof JoinNode) foundJoinNode_ = true;
Done


Line 60:   || (missingStats && !scan.hasLimit() && 
scan.getConjuncts().isEmpty())) {
> Why would the existence of scan conjuncts make our estimate valid despite m
In that case the scan returns every row it scans and hits the limit after 
processing a bounded amount of data.


Line 66:   Math.max(maxRowsProcessedPerNode_, numRows / numNodes);
> Nit: Do the division as double and take the ceil, otherwise it seems like w
Done


Line 77:   Math.max(maxRowsProcessedPerNode_, numRows / numNodes);
> Nit: Do the division as double and take the ceil, otherwise it seems like w
Done


Line 81:   public boolean valid() {
> single line
Done


Line 96: return foundJoinNode_;
> Preconditions.checkState(valid_); seems appropriate here as well
Done


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2:  QUERY
> Any reason not to make these PlannerTests?
To confirm that it's actually plumbed all the way through to the backend.



[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..

IMPALA-5483: Automatically disable codegen for small queries

This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.

This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.

Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).

I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s.  To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.

Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)

Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.

Added an e2e test to make sure that the optimisation is applied in
the backend.

Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.

Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
A testdata/workloads/functional-query/queries/QueryTest/codegen.test
A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
A tests/query_test/test_codegen.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_udfs.py
17 files changed, 209 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/compress.h
M be/src/util/decompress.h
A tests/custom_cluster/test_compact_catalog_updates.py
7 files changed, 168 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7268

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Results:
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/compress.h
M be/src/util/decompress.h
A tests/custom_cluster/test_compact_catalog_updates.py
7 files changed, 168 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-22 Thread Alex Behm (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6527

to look at the new patch set (#7).

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..

IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.

Summary of changes:
- Slightly change code flow for initial scan range that
  only parses the file header. The new code sets
  'only_parsing_header_' in Open() and then honors
  that flag in GetNextInternal(). Before, all the logic
  was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
  RowBatch is passed.

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scan-node.h
M be/src/util/blocking-queue.h
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
26 files changed, 694 insertions(+), 816 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 7: Code-Review+1

Keep Tim's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/6/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 183:   DCHECK(row_batch != nullptr);
> Brief comment to explain this? It's subtle.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-22 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#4).

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..

IMPALA-5240: Allow config of number of disk I/O threads per disk type

Currently Impala defaults to 8 threads per flash disk and 1 thread per
rotational disk. This can be overridden with --num_threads_per_disk,
but that sets the thread count independent of disk type.

This would allow control of the number of disk I/O threads spawned
independently for solid state disks using
"--num_io_threads_per_solid_state_disk" and for rotational disks using
"--num_io_threads_per_rotational_disk" as startup parameters.

Testing:
Added backend tests that verify if "num_threads_per_disk",
"num_io_threads_per_solid_state_disk" and
"num_io_threads_per_rotational_disk" are used to spawn threads
appropriately

Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
---
M be/src/runtime/disk-io-mgr-stress-test.cc
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/thread.cc
M be/src/util/thread.h
M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
8 files changed, 393 insertions(+), 341 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes

2017-06-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5503: [DOCS] Document how to specify 
coordinator/executor nodes
..


Patch Set 2: Code-Review+2

(3 comments)

Much better, thanks.

http://gerrit.cloudera.org:8080/#/c/7237/2/docs/topics/impala_new_features.xml
File docs/topics/impala_new_features.xml:

PS2, Line 77:  
nit: extra space


http://gerrit.cloudera.org:8080/#/c/7237/2/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

Line 338: statestored daemon.
for metadata updates.


PS2, Line 377:  
nit: extra space


-- 
To view, visit http://gerrit.cloudera.org:8080/7237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


IMPALA-4866: Hash join node does not apply limits correctly

Hash join node currently does not apply the limits correctly.
This issue gets masked most of the times since the planner sticks
an exchange node on top of most of the joins. This issue gets
exposed when NUM_NODES=1.

Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Reviewed-on: http://gerrit.cloudera.org:8080/6778
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/partitioned-hash-join-node.cc
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
A 
testdata/workloads/functional-query/queries/QueryTest/single-node-joins-with-limits-exhaustive.test
M tests/query_test/test_join_queries.py
4 files changed, 69 insertions(+), 9 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] DRAFT - IMPALA-5498: Support for partial sorts

2017-06-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: DRAFT - IMPALA-5498: Support for partial sorts
..


Patch Set 1:

(2 comments)

I haven't looked much at the Planner changes needed, eg. in 
SortNode.computeResourceProfile, but this patch gives a good idea of the 
interface changes needed.

http://gerrit.cloudera.org:8080/#/c/7267/1/be/src/exec/partial-sort-node.cc
File be/src/exec/partial-sort-node.cc:

Line 41: Status PartialSortNode::Init(const TPlanNode& tnode, RuntimeState* 
state) {
There's a lot of boiler plate here and in Prepare, Open, and Codegen that could 
be shared with SortNode.


http://gerrit.cloudera.org:8080/#/c/7267/1/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

Line 65:   private final boolean useTopN_;
This will be replaced with TSortType.TOPN


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] DRAFT - IMPALA-5498: Support for partial sorts

2017-06-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7267

Change subject: DRAFT - IMPALA-5498: Support for partial sorts
..

DRAFT - IMPALA-5498: Support for partial sorts

Impala currently supports total sorts (the entire set of data
is sorted) and top-n sorts (only the highest/lowest n elements
are sorted). This patch adds the ability to do partial sorts,
where the data is divided up into some number of subsets, each
of which is sorted individually.

It accomplishes this by adding a new exec node, PartialSortNode.
When PartialSortNode::GetNext() is called, it retrieves input
up to its memory limit, uses the existing Sorter class to sort
it, and outputs it. This is faster than a total sort with SortNode
as it avoids the need to spill if the input is larger than the
memory limit.

In the planner, the SortNode plan node is used, with an enum value
indicating if it is a total or partial sort.

As a first use case, partial sort is used where a total sort was
used previously for inserts into Kudu.

This patch is a work in progress, and needs to be polished and tested.

Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
A be/src/exec/partial-sort-node.cc
A be/src/exec/partial-sort-node.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
9 files changed, 328 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/7267/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes

2017-06-22 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-5503: [DOCS] Document how to specify 
coordinator/executor nodes
..

IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes

Cf. IMPALA-3807 and IMPALA-5147.

Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_components.xml
M docs/topics/impala_new_features.xml
M docs/topics/impala_scalability.xml
4 files changed, 129 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/7237/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

PS 10 is a rebase and commit message update

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 986 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Increase explain level for TPC-H and TPC-DS to provide visibility into
the memory estimates on those data sets.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 986 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314: // default buffer size, e.g. with small dimension tables.
> That's correct. Ultimately the planner can only guess because we do the sch
Filed IMPALA-5565. I'm still a little hazy on the exact scope of what needs to 
be fixed so it might be helpful to add any ideas you had in there.


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186: if (getChild(1).getCardinality() == -1 || 
getChild(1).getAvgRowSize() == -1
> Agree
Done. Made it check for <= 0 since there are a couple of bugs in the numNodes 
calculation that I fixed in https://gerrit.cloudera.org/#/c/7223/ where it 
could return 0.


Line 190: } else {
> Add a Preconditions check that asserts MT_DOP=0 with a comment that the est
We still need to generate plans in planner tests, so I made the DCHECK 
conditional on whether it's a test environment.


http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 629:   protected final static long getDefaultSpillableBufferBytes() {
> Why this indirection?
To make the intent clear at the callsites. The HDFS read size and the spillable 
buffer size will be decoupled in IMPALA-3200. It will be easier to fix that up 
if we make the distinction in the planner already.


http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test:

Line 2: select *
> use straight_join for these tests
Done


Line 761: select string_col, count(*)
> Add a similar test with no stats for the join case (or integrate into this 
I have a test above " Join with no stats for right input - should use default 
buffers." Or did you have an additional test in mind?


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

Line 529: ResourceProfile buildProfile =
> This logic still isn't right.
Fixed/clarified in PS6


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

Line 529: // build side.
This logic still isn't right.


http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java:

Line 163:   throws InternalException, NotImplementedException {
Remove extra throws


http://gerrit.cloudera.org:8080/#/c/7223/6/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java:

PS6, Line 102:  throws InternalException, NotImplementedException
can remove these


PS6, Line 163: InternalException, NotImplementedException
Can remove these.


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/781/

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 2: Code-Review+2

On second thought, I am confident to +2 this.

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Alex Behm (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7264

to look at the new patch set (#2).

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..

IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

For queries where plan generation is terminated early due to LIMIT 0
or similar, some tuples may not have a mem layout because no PlanNode
has been generated to materialize them. The fix is to make
recomputeMemLayout() a no-op if the tuple does not have an existing
mem layout.

Testing:
- added regression test

Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
3 files changed, 12 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7264/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7264/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 245: collSlotRef.getDesc().getParent().recomputeMemLayout();
> Not sure I follow the suggestion. I think the Preconditions check in recomp
I thought to replace the Precondition with

if (!hasMemLayout_) return;

That way we may be more resilient to similar errors in the future, but also may 
not catch them. It's more of a question why not to do this then a suggestion.


-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7264/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 245: collSlotRef.getDesc().getParent().recomputeMemLayout();
> Have you considered returning early instead of the Precondition in recomput
Not sure I follow the suggestion. I think the Preconditions check in 
recomputeMemLayout() id definitely correct. We should not unnecessarily compute 
a mem layout.

Where do you suggest I return early? Returning from here is not correct, 
because you need to finish the loop.


-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..


Patch Set 1:

(1 comment)

Thank you for fixing this so quickly!

http://gerrit.cloudera.org:8080/#/c/7264/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 245: collSlotRef.getDesc().getParent().recomputeMemLayout();
Have you considered returning early instead of the Precondition in 
recomputeMemLayout()?


-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4622: Add ALTER COLUMN statement.

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4622: Add ALTER COLUMN statement.
..


IMPALA-4622: Add ALTER COLUMN statement.

Kudu recently added the ability to alter a column's default value
and storage attributes (KUDU-861). This patch adds the ability to
modify these from Impala using ALTER.

It also supports altering a column's comment for non-Kudu tables.

It does not support setting a column to be a primary key or
changing a column's nullability, because those are not supported on
the Kudu side yet.

Syntax:
ALTER TABLE  ALTER [COLUMN] 
  SET   [  [ ...]]
where  is one of:
  - DEFAULT, BLOCK_SIZE, ENCODING, COMPRESSION (Kudu tables)
  - COMMENT (non-Kudu tables)
ALTER TABLE  ALTER [COLUMN]  DROP DEFAULT

This is similar to the existing CHANGE statement:
ALTER TABLE  CHANGE   
  [COMMENT ]
but the new syntax is more natural for setting column properties
when the column name and type are not being changed. Both ALTER
COLUMN and CHANGE COLUMN operations use AlterTableAlterColStmt and
are sent to the catalog as ALTER_COLUMN operations.

Testing:
- Added FE tests to ParserTest and AnalyzeDDLTest
- Added EE tests to test_kudu.py

Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Reviewed-on: http://gerrit.cloudera.org:8080/6955
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AlterTableAlterColStmt.java
D fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
M tests/query_test/test_kudu.py
12 files changed, 456 insertions(+), 147 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4622: Add ALTER COLUMN statement.

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4622: Add ALTER COLUMN statement.
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

2017-06-22 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7264

Change subject: IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.
..

IMPALA-5562: Only recomputeMemLayout() if tuple has a layout.

For queries where plan generation is terminated early due to LIMIT 0
or similar, some tuples may not have a mem layout because no PlanNode
has been generated to materialize them. For such tuples we should not
call recomputeMemLayout().

Testing:
- added regression test

Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
---
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
2 files changed, 12 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7264/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I08548c6bfa7dbf4655e55636605bebb89d2a2239
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-22 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7254/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 275:   while (!output_iterator_.AtEnd()) {
> It doesn't matter too much so long as we can convince ourselves it's correc
If Prepare() fails, output_iterator_.AtEnd() should be true. I can add a DCHECK 
in Prepare().


-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/780/

-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-22 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 5:

Rebased. 
Also, corrected the result of related query in nested-types-subplan.test.

-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-22 Thread anujphadke (Code Review)
Hello Dan Hecht, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6778

to look at the new patch set (#5).

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..

IMPALA-4866: Hash join node does not apply limits correctly

Hash join node currently does not apply the limits correctly.
This issue gets masked most of the times since the planner sticks
an exchange node on top of most of the joins. This issue gets
exposed when NUM_NODES=1.

Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
---
M be/src/exec/partitioned-hash-join-node.cc
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
A 
testdata/workloads/functional-query/queries/QueryTest/single-node-joins-with-limits-exhaustive.test
M tests/query_test/test_join_queries.py
4 files changed, 69 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6778/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-22 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7203/3/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

Line 343:   /// Use pointer to avoid inclusion of timestampvalue.h and avoid 
clang issues.
> Given the names are so different that it's not obvious (yet the names are o
Done


http://gerrit.cloudera.org:8080/#/c/7203/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 371:   // String containing a timestamp (in UTC) set at the query 
submission time.
> document same point in time as 'now_string'
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-22 Thread Bikramjeet Vig (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7203

to look at the new patch set (#4).

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..

IMPALA-3504: UDF for current timestamp in UTC

This change adds a UDF "utc_timestamp" which returns the current
date and time in UTC. Example query:

select utc_timestamp();

+---+
| utc_timestamp()   |
+---+
| 2017-06-15 17:36:39.290773000 |
+---+

Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M common/function-registry/impala_functions.py
M common/thrift/ImpalaInternalService.thrift
9 files changed, 67 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7203/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4622: Add ALTER COLUMN statement.

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4622: Add ALTER COLUMN statement.
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/779/

-- 
To view, visit http://gerrit.cloudera.org:8080/6955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4622: Add ALTER COLUMN statement.

2017-06-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4622: Add ALTER COLUMN statement.
..


Patch Set 10: Code-Review+2

Rebased, carrying forward

-- 
To view, visit http://gerrit.cloudera.org:8080/6955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2782: Allow impala-shell to connect directly to impalad when configured with load balancer and kerberos.

2017-06-22 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-2782: Allow impala-shell to connect directly to impalad 
when configured with load balancer and kerberos.
..


Patch Set 2:

Added default option as None. Still missing a test though, I will make sure to 
work on the test after I get feedback.

-- 
To view, visit http://gerrit.cloudera.org:8080/7241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2782: Allow impala-shell to connect directly to impalad when configured with load balancer and kerberos.

2017-06-22 Thread Vincent Tran (Code Review)
Vincent Tran has uploaded a new patch set (#2).

Change subject: IMPALA-2782: Allow impala-shell to connect directly to impalad 
when configured with load balancer and kerberos.
..

IMPALA-2782: Allow impala-shell to connect directly to impalad when
configured with load balancer and kerberos.

This change adds an impala-shell option -b . This allows
user to optionally specify the load-balancer's host:port so that
impala-shell will accept a direct connection to impala daemons in
a kerberized cluster.

Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9
---
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/option_parser.py
4 files changed, 12 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/7241/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5553: Fix expr-test in release builds
..


IMPALA-5553: Fix expr-test in release builds

expr-test fails in release build as it uses
Literal::CreateLiteral() to create literal
expressions. Literal::CreateLiteral() wraps
ParseString() with DCHECK() so it becomes
a no-op in release builds.

This change fixes the issue by moving
Literal::CreateLiteral() from literal.cc to
expr-test.cc as it's only used by the test
code. Also replaces DCHECK() wrapped around
ParseString() with EXPECT_TRUE().

Verified the fix by building and running
a release build of expr-test.

Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Reviewed-on: http://gerrit.cloudera.org:8080/7255
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
3 files changed, 114 insertions(+), 102 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5553: Fix expr-test in release builds
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No