[Impala-ASF-CR] IMPALA-5768: Better developer documentation

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5768: Better developer documentation
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16d34cb4fa0c60c5ad6d9c8764cc0ec21c5cb368
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 14 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool

2017-08-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7639

Change subject: MPALA-5776: Write partial tuple to the correct mempool
..

MPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool_, which is where the constant length partial tuple data is
written.

Testing:
- Ran some tests locally on ASAN build.
- No new tests were added, because it is difficult to construct test
  cases due to the issue being non-deterministic.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
1 file changed, 11 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4737: Prevent SIGUSR1 from killing daemons when minidumps are disabled

2017-08-09 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7631

Change subject: IMPALA-4737: Prevent SIGUSR1 from killing daemons when 
minidumps are disabled
..

IMPALA-4737: Prevent SIGUSR1 from killing daemons when minidumps are disabled

If a user disabled minidumps before this change, we did not register the
signal handler for SIGUSR1 at all. Sending SIGUSR1 to a daemon would
subsequently kill it.

This change registers a dummy handler to ignore the signal if minidumps
are disabled.

Change-Id: I13d866a2eec832500131954a7f693c33585ea51e
---
M be/src/util/minidump.cc
M tests/custom_cluster/test_breakpad.py
2 files changed, 33 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7631/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7631
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I13d866a2eec832500131954a7f693c33585ea51e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4833: Compute precise per-host reservation size
..


Patch Set 1:

(3 comments)

Overall I think this looks good - obviously it needs some tests but that's my 
main concern.

I played around a bit and it seemed to work. E.g. this contrived query resulted 
in different reservations per node:

select straight_join * from tpch_parquet.lineitem join (select 
o_shippriority, count(distinct o_orderkey) from tpch_parquet.orders group by 1) 
v on o_shippriority = l_orderkey;

Per Host Min Reservation: tarmstrong-box:22000(36.12 MB) 
tarmstrong-box:22001(36.12 MB) tarmstrong-box:22002(1.06 MB)

http://gerrit.cloudera.org:8080/#/c/7630/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 484:   6: i64 min_reservation_bytes
Are these optional or required? Not sure what the default is in thrift, we seem 
to be explicit elsewhere.


http://gerrit.cloudera.org:8080/#/c/7630/1/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

Line 66:   7: optional i64 min_reservation_bytes
It's kind-of confusing that these are optional, since the backend handles this 
case in exactly the same way as if they were zero. It might be clearer to make 
these required and set them to zero in PlanFragment.java if the profile is 
invalid. 

That way the invalid resource profile handling is more contained in 
PlanFragment. The profiles should only be invalid for join build sinks in 
parallel plans, which is explained over there.


http://gerrit.cloudera.org:8080/#/c/7630/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 365:   // Different fragments do not synchronize their Open() and 
Close(), so the backend
I think this comment is important for understand why we sum up the fragments. 
Can we reference this in coordinator-backend-state.cc briefly?


-- 
To view, visit http://gerrit.cloudera.org:8080/7630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4863/IMPALA-5311: Correctly account the file type and compression codec

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4863/IMPALA-5311: Correctly account the file type and 
compression codec
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7245/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 885: ss << file_format << "/" << "Unknown"<< "(Skipped):" << 
it->second << " ";
nit: missing space before <<


PS5, Line 888: <<
nit: align <<'s vertically (it's in the style guide somewhere).


http://gerrit.cloudera.org:8080/#/c/7245/5/testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test
File 
testdata/workloads/functional-query/queries/QueryTest/hdfs_scanner_profile.test:

Line 9:  QUERY
I missed this in an earlier pass but we don't have a regression test for 
IMPALA-4863. Can we add tests for the runtime filter partition case, maybe for 
both text and parquet. E.g. this query works for me:

set runtime_filter_wait_time_ms=10;
select count(*) 
from tpcds.store_sales
join tpcds.date_dim on ss_sold_date_sk = d_date_sk
where d_qoy=1


-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5764: Allow overriding packaged components

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5764: Allow overriding packaged components
..


Patch Set 1:

Ping?

-- 
To view, visit http://gerrit.cloudera.org:8080/7581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I95c2662e6f62adc924cc5de7a371202126046545
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5768: Better developer documentation

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5768: Better developer documentation
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1027/

-- 
To view, visit http://gerrit.cloudera.org:8080/7350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16d34cb4fa0c60c5ad6d9c8764cc0ec21c5cb368
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5768: Better developer documentation

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5768: Better developer documentation
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16d34cb4fa0c60c5ad6d9c8764cc0ec21c5cb368
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-3208: max row size option

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: PREVIEW: IMPALA-3208: max_row_size option
..


Patch Set 6:

This is ready for review except for the missing end-to-end tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/7629
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-3208: max row size option

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: PREVIEW: IMPALA-3208: max_row_size option
..

PREVIEW: IMPALA-3208: max_row_size option

This is a preview because it is missing tests. I have manually tested
it and it is behaving it as expected so far.

Adds support for a "max_row_size" query option that instructs Impala
to reserve enough memory to process rows of the specified size. For
spilling operators, the planner reserves enough memory to process
rows of this size. The advantage of this compared to simply
specifying larger values for min_spillable_buffer_size and
default_spillable_buffer_size is that operators may be able to
handler larger rows without increasing the size of all their
buffers.

The default value is 512KB. I picked that number because it doesn't
increase minimum reservations *too* much but should be large enough
for almost all reasonable workloads.

This is implemented in the aggs and joins using the variable page size
support added to BufferedTupleStream in an earlier commit. The synopsis
is that each stream requires reservation for one default-sized page
per read and write iterator, and temporarily requires reservation
for a max-sized page when reading or writing larger pages. The
max-sized write reservation is released immediately after the row
is appended and the max-size read reservation is released after
advancing to the next row.

The sorter and analytic simply use max-sized buffers for all pages
in the stream.

Testing:
Updated existing planner tests to reflect default max_row_size. Added
new planner tests to test the effect of the query option.

Added "set" test to check validation of query option.
* TODO: Add end-to-end tests exercising all operators with large rows
  with and without spilling.

Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
A fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
31 files changed, 690 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7629/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7629
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] PREVIEW: IMPALA-3208: max row size option

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: PREVIEW: IMPALA-3208: max_row_size option
..

PREVIEW: IMPALA-3208: max_row_size option

This is a preview because it is missing tests. I have manually tested
it and it is behaving it as expected so far.

Adds support for a "max_row_size" query option that instructs Impala
to reserve enough memory to process rows of the specified size. For
spilling operators, the planner reserves enough memory to process
rows of this size. The advantage of this compared to simply
specifying larger values for min_spillable_buffer_size and
default_spillable_buffer_size is that operators may be able to
handler larger rows without increasing the size of all their
buffers.

This is implemented using the variable page size support added to
BufferedTupleStream in an earlier commit. The synopsis is that each
stream requires reservation for one default-sized page per read and
write iterator, and temporarily requires reservation for a max-sized
page when reading or writing larger pages. The max-sized write
reservation is released immediately after the row is appended and
the max-size read reservation is released after advancing to the
next row.

This means that in the aggs and joins we require one max-size read
buffer for the read stream and one max-size write buffer that can be
used to append a large value to any stream.

The sorter and analytic are simpler: there we simply use the max-sized
buffers for all pages in the stream.

Testing:
Updated existing planner tests to reflect default max_row_size. Added
new planner tests to test the effect of the query option.

Added "set" test to check validation of query option.
* TODO: Add end-to-end tests exercising all operators with large rows
  with and without spilling.

Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
A fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
31 files changed, 690 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7629/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7629
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size

2017-08-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4833: Compute precise per-host reservation size
..


Patch Set 1:

TODO:
* more specific test case(s), validate new profile field for per-host min 
reservations

-- 
To view, visit http://gerrit.cloudera.org:8080/7630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size

2017-08-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7630

Change subject: IMPALA-4833: Compute precise per-host reservation size
..

IMPALA-4833: Compute precise per-host reservation size

Before this change, the per-host reservation size was computed
by the Planner. However, scheduling happens after planning,
so the Planner must assume that all fragments run on all
hosts, and the reservation size is likely much larger than
it needs to be.

This moves the computation of the per-host reservation size
to the BE where it can be computed more precisely. This also
includes a number of plan/profile changes.

Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.cc
M be/src/service/client-request-state.cc
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
18 files changed, 211 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7630/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5744: Add 'use krpc' flag and create DataStream interface

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5744: Add 'use_krpc' flag and create DataStream interface
..


IMPALA-5744: Add 'use_krpc' flag and create DataStream interface

This patch introduces a 'use_krpc' flag and creates an abstract
interface for the DataStreamRecvr/Mgr.

The 'use_krpc' flag defaults to 'false'. Cluster startup will abort
with an error if the flag is switched to 'true'.

The DataStreamSender implements the same virtual interface as the
DataSink, so a pure virtual class for the DataStreamSender would
essentially be an empty class. Therefore, it is not implemented.

The new interfaces are pure virtual base classes and are named
DataStream*Base. Using pure virtual base classes may result in code
duplication but the current approach is to optimize for the eventual
removal of the thrift implementations.

Stubs for the Krpc implementations are also introduced and are named
KrpcDataStream*. They currently only abort with a fatal error if they
are used. Their actual implementations will be filled in a later
patch.

When the time comes to standardize on the KRPC implementations of
DataStream*Base, we will get rid of the DataStream*Base classes and
the Thrift versions of the classes and rename KrpcDataStream* to
DataStream*. We will also rename all the references that the clients
have to DataStream*Base to DataStream*.

Also did some spurious includes cleanup.

Change-Id: I5d52245154e910529a68f53049520238eca16241
Reviewed-on: http://gerrit.cloudera.org:8080/7542
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins
---
M be/src/exec/data-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/data-stream-mgr-base.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
A be/src/runtime/data-stream-recvr-base.h
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-mgr.cc
A be/src/runtime/krpc-data-stream-mgr.h
A be/src/runtime/krpc-data-stream-recvr.cc
A be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/impala-server.cc
20 files changed, 436 insertions(+), 39 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Sailesh Mukil: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5d52245154e910529a68f53049520238eca16241
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l
..


Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Reviewed-on: http://gerrit.cloudera.org:8080/7626
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Ho: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id66508040bcc7745b7c68b62ace71ae1d394c1b4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests
..


IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests

If we ask OpenSSL to use a cipher suite that's not compatible with
TLSv1.0, it will fail on machines where TLSv1.1+ is not
supported (i.e. those with OpenSSL v1.0.0).

Fix tests to only use TLSv1.0-compatible cipher suites, picked from
https://wiki.openssl.org/index.php/Manual:Ciphers(1)#TLS_v1.0_cipher_suites.

Confirmed that tests start servers with TLSv1.0 support. Before this
patch, servers would be silently upgraded to TLSv1.2 only (i.e. the
minimum version that supported the requested cipher suite).

Change-Id: Id66508040bcc7745b7c68b62ace71ae1d394c1b4
Reviewed-on: http://gerrit.cloudera.org:8080/7624
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M be/src/rpc/thrift-server-test.cc
1 file changed, 15 insertions(+), 7 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id66508040bcc7745b7c68b62ace71ae1d394c1b4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5417: make I/O buffer queue fixed-size

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5417: make I/O buffer queue fixed-size
..


Patch Set 7: Code-Review+1

Carry +1

-- 
To view, visit http://gerrit.cloudera.org:8080/7408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If7cc3f7199f5320db00b7face97a96cdadb6f83f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

2017-08-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5744: Add 'use krpc' flag and create DataStream interface

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5744: Add 'use_krpc' flag and create DataStream interface
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1026/

-- 
To view, visit http://gerrit.cloudera.org:8080/7542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d52245154e910529a68f53049520238eca16241
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5744: Add 'use krpc' flag and create DataStream interface

2017-08-09 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5744: Add 'use_krpc' flag and create DataStream interface
..


Patch Set 13: Code-Review+2

More clang-tidy fixes.

Rebase, carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/7542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d52245154e910529a68f53049520238eca16241
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5744: Add 'use krpc' flag and create DataStream interface

2017-08-09 Thread Sailesh Mukil (Code Review)
Hello Impala Public Jenkins, Henry Robinson, Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7542

to look at the new patch set (#13).

Change subject: IMPALA-5744: Add 'use_krpc' flag and create DataStream interface
..

IMPALA-5744: Add 'use_krpc' flag and create DataStream interface

This patch introduces a 'use_krpc' flag and creates an abstract
interface for the DataStreamRecvr/Mgr.

The 'use_krpc' flag defaults to 'false'. Cluster startup will abort
with an error if the flag is switched to 'true'.

The DataStreamSender implements the same virtual interface as the
DataSink, so a pure virtual class for the DataStreamSender would
essentially be an empty class. Therefore, it is not implemented.

The new interfaces are pure virtual base classes and are named
DataStream*Base. Using pure virtual base classes may result in code
duplication but the current approach is to optimize for the eventual
removal of the thrift implementations.

Stubs for the Krpc implementations are also introduced and are named
KrpcDataStream*. They currently only abort with a fatal error if they
are used. Their actual implementations will be filled in a later
patch.

When the time comes to standardize on the KRPC implementations of
DataStream*Base, we will get rid of the DataStream*Base classes and
the Thrift versions of the classes and rename KrpcDataStream* to
DataStream*. We will also rename all the references that the clients
have to DataStream*Base to DataStream*.

Also did some spurious includes cleanup.

Change-Id: I5d52245154e910529a68f53049520238eca16241
---
M be/src/exec/data-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/data-stream-mgr-base.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
A be/src/runtime/data-stream-recvr-base.h
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-mgr.cc
A be/src/runtime/krpc-data-stream-mgr.h
A be/src/runtime/krpc-data-stream-recvr.cc
A be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/impala-server.cc
20 files changed, 436 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/7542/13
-- 
To view, visit http://gerrit.cloudera.org:8080/7542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d52245154e910529a68f53049520238eca16241
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1025/

-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

2017-08-09 Thread Matthew Jacobs (Code Review)
Hello Impala Public Jenkins, Thomas Tauber-Marshall,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7626

to look at the new patch set (#2).

Change subject: Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l
..

Bump Kudu version to 943b1ae, and OpenSSL to 1.0.2l

Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
---
M bin/impala-config.sh
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7626/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] Bump Kudu version to 943b1ae

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 943b1ae
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1024/

-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 943b1ae

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 943b1ae
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1024/

-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests

2017-08-09 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests
..


Patch Set 1:

I also tried this on a machine with OpenSSL 1.0.0.

-- 
To view, visit http://gerrit.cloudera.org:8080/7624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id66508040bcc7745b7c68b62ace71ae1d394c1b4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests

2017-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1023/

-- 
To view, visit http://gerrit.cloudera.org:8080/7624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id66508040bcc7745b7c68b62ace71ae1d394c1b4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5708: Test failure with invalid exec summary

2017-08-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5708: Test failure with invalid exec summary
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7627/1/tests/beeswax/impala_beeswax.py
File tests/beeswax/impala_beeswax.py:

PS1, Line 218: impala_shell.py
Ugh, just saw this. If you take a look at shell/impala_shell.py -> 
shell/impala_client.py , you'll see this was already handled.

Before we diverge any further, can you ping Michael Brown/David K and see if 
they have any thoughts for factoring this common python code into a library 
that we can re-use between tests and the shell?


-- 
To view, visit http://gerrit.cloudera.org:8080/7627
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id52ac62da2b01f9e163e97cbe4590f8db6b663d2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests

2017-08-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5781: Only use TLSv1.0-compatible ciphers for tests
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id66508040bcc7745b7c68b62ace71ae1d394c1b4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412 Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions pointing to the same filesystem location.
..


Patch Set 1:

(2 comments)

Looks like MJ and I collided. I think we mostly agree here.

http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

PS1, Line 80: avro
> does this need to be avro? seems like text would work, and this test class 
The bug affects Seq/RC/Avro only.


PS1, Line 110: note, that shortcoming on avro writer would result the inserted 
value as NULL,
 : # but the point is the second column for partition i
> This isn't a clear sentence; why does the avro writer insert NULL?
The Avro writer is broken. It always represents null values as the second 
element of the union, ignoring any schema. But the default avro schema in the 
frontend represents null values as the first element of the union.

I guess we could flip the 0 and 1 in HdfsAvroTableWriter::AppendField() and at 
least make the writer usable for this test.


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412 Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.

2017-08-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions pointing to the same filesystem location.
..


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

Line 90:   context->partition_descriptor()->id(),stream_->filename()));
nit:space after ,


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS1, Line 266: string
Do we need to call the string constructor explicitly? Seems a bit weird.


PS1, Line 634: name) {
Nit: long line > 90 chars


Line 648: void* HdfsScanNodeBase::GetFileMetadata(
Nit: I think this line fits in 90 characters.


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 197:   /// Allocate a new scan range object, stored in the runtime state's 
object pool. For
This comment needs updating since partition_id is now required.


Line 360:   struct pair_hash {
We have some utilities like this already in util/container-util.h, let's move 
it there.


Line 363: return std::hash{}(p.second);
Let's hash both values in the pair and combine them with boost::hash_combine. 
The current approach probably works ok for this use case but it seems better to 
implement a general-purpose pair hash that can be reused, given it's not much 
more work.


http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

Line 43: """Regression test for IMPALA-597. Verifies Impala is able to 
properly read
Can we run this test for Parquet too?

TestInsertQueries has text and parquet in its test matrix and uses the "STORED 
AS" clause too, so we can copy its approach.


Line 71: data = self.execute_scalar("select sum(i), sum(j) from %s" % 
FQ_TBL_NAME)
Can we run this query with num_nodes=1 to verify that the same bug doesn't 
exist with text file?


Line 80:   def test_multiple_partitions_same_location_avro(self, vector, 
unique_database):
Test looks good, I think we should just make sure we address the full test 
coverage gap for the other file formats too.

Let's run this test for the other affected formats with an unsupported writers 
- SequenceFile.

If we change the test matrix for this class as I suggest above, we probably 
want to pull out this test into a separate class with a different test matrix.


PS1, Line 81: imapala
Impala


Line 110: # (note, that shortcoming on avro writer would result the 
inserted value as NULL,
How about we only query the second column to avoid this complication?


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412 Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.

2017-08-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions pointing to the same filesystem location.
..


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/7625/1//COMMIT_MSG
Commit Message:

PS1, Line 7: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions
   : pointing to the same filesystem location.
   : 
   : The maps storing file descriptors and file metadata were using 
filename as a key.
   : Multiple partitions pointing to the same filesystem location 
resulted that these
   : map entries were occasionally overwritted by the other partition 
poing to the same.
   : 
   : As a solution the map key was enhanced to contain a pair of 
partition ID and file name.
wrap at 60cols in git commit msgs


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

PS1, Line 163: ,s
space


PS1, Line 309: 
4 spaces


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 613: context_->partition_descriptor()->id(), filename());
4 spaces


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS1, Line 266: string
remove string constructor here


PS1, Line 266: ,
space


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS1, Line 237:   /// Returns nullptr if the search doesn't find the descriptor.
not true; it has a dcheck


PS1, Line 360: pair_hash
misleading name because the hash is computed on the value.

why is it necessary to provide a custom hash?


PS1, Line 369: pair
Can you typedef this and using that here and in the .cc file ?


PS1, Line 381: pair
same


http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

PS1, Line 80: avro
does this need to be avro? seems like text would work, and this test class is 
using the text/none dimension (see l37)


PS1, Line 81: ad
as?


PS1, Line 109: output
'output' isn't clear (it doesn't match up with the queries). say this is what 
would get returned by 
 select i, j from %s


PS1, Line 110: note, that shortcoming on avro writer would result the inserted 
value as NULL,
 : # but the point is the second column for partition i
This isn't a clear sentence; why does the avro writer insert NULL?


PS1, Line 122: output:
 : # [NULL, 1] 3 times
 : # [NULL, 2] 3 times
again, this is confusing


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 943b1ae

2017-08-09 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bump Kudu version to 943b1ae
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 943b1ae

2017-08-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7626

Change subject: Bump Kudu version to 943b1ae
..

Bump Kudu version to 943b1ae

Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7626/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I43a510c07b877cb18c3170a034cf9322baef16c0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5778: clarify --read size option.

2017-08-09 Thread Balazs Jeszenszky (Code Review)
Balazs Jeszenszky has posted comments on this change.

Change subject: IMPALA-5778: clarify --read_size option.
..


Patch Set 1: Code-Review+1

Looks good.

-- 
To view, visit http://gerrit.cloudera.org:8080/7623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c20a9d55f89170b11f569c90b7f2949ddbe4211
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412 Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.

2017-08-09 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7625

Change subject: IMPALA-5412 Scan returns wrong partition-column values when 
scanning multiple partitions pointing to the same filesystem location.
..

IMPALA-5412 Scan returns wrong partition-column values when scanning multiple 
partitions
pointing to the same filesystem location.

The maps storing file descriptors and file metadata were using filename as a 
key.
Multiple partitions pointing to the same filesystem location resulted that these
map entries were occasionally overwritted by the other partition poing to the 
same.

As a solution the map key was enhanced to contain a pair of partition ID and 
file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M tests/metadata/test_partition_metadata.py
7 files changed, 109 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab