[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#13).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
16 files changed, 653 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/13
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 13
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#12).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
16 files changed, 653 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/12
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1091
PS11, Line 1091:   private Map> 
PrintDictionaryConjuncts(
> * return void
made it consistent.



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 11
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 06:37:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1767: [DOCS] Document new Boolean operators

2018-01-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8942 )

Change subject: IMPALA-1767: [DOCS] Document new Boolean operators
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8942
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb
Gerrit-Change-Number: 8942
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 06:19:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6388: Fix the Union node number of hosts estimation

2018-01-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9017 )

Change subject: IMPALA-6388: Fix the Union node number of hosts estimation
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9017/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test:

http://gerrit.cloudera.org:8080/#/c/9017/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@3561
PS1, Line 3561: # IMPALA-6388: Verify that the order of the union operands does 
not impact the
move these into union.test

you should be able to use the recently added QUERYOPTIONS section to use a 
higher explain level for only those tests

see runtime-filter-query-options.test for an example of using the new 
QUERYOPTIONS section


http://gerrit.cloudera.org:8080/#/c/9017/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@3563
PS1, Line 3563: select coalesce(int_col, int_col) from functional.alltypes
why coalesce()?



--
To view, visit http://gerrit.cloudera.org:8080/9017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0
Gerrit-Change-Number: 9017
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Fri, 12 Jan 2018 06:15:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 11: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1091
PS11, Line 1091:   private Map> 
PrintDictionaryConjuncts(
* return void
* the typical pattern we use for explain helper functions is:

String getFooBarExplainString(String prefix);

see examples like getTableStatsExplainString(), getNodeExplainString()

would be good to change this helper to e consistent



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 11
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 06:09:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#11).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
16 files changed, 654 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/11
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 11
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 10:

tests found a bad dep, next patch fixes it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 10
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 05:35:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6388: Fix the Union node number of hosts estimation

2018-01-11 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9017


Change subject: IMPALA-6388: Fix the Union node number of hosts estimation
..

IMPALA-6388: Fix the Union node number of hosts estimation

Before this patch, we would estimate the number of hosts for the union
node by looking only at the first union operand. This is obviously
incorrect and lead us to underestimate the value.

We fix the problem by setting the estimate to be the maximum of its
children.

Testing:
- Added a planner test that reproduces the issue

Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0
---
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
2 files changed, 95 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/9017/1
--
To view, visit http://gerrit.cloudera.org:8080/9017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I51e1ecca8dbc84b2b5a72708667b2799d00279f0
Gerrit-Change-Number: 9017
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Michael Ho (Code Review)
Hello Sailesh Mukil, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8991

to look at the new patch set (#5).

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..

IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

KRPC in general tends to put more pressure on the thread
caches due to allocations of more small objects (i.e. <1MB).
While some of them are being addressed in KUDU-1865, it's shown
that the following TCMalloc workarounds will provide reasonable
performance with KRPC:

- TCMALLOC_TRANSFER_NUM_OBJ:
   - maximum number of object per classe type to transfer between
 thread and central caches.
   - the default value of 512 in 2.5.2 seems to cause the spin lock
 in the central cache to be held for too long with KRPC. 2.6.0
 and latter reverts this value to 32 by default.

- TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
  - total amount of memory allocated to all thread caches in bytes
  - the default value is 32MB. We need to bump it to 1GB which is the
internal cap in TCMalloc.

This change upgrades GPerfTools/TCMalloc to 2.6.3 to pick up the
change of the default value of TCMALLOC_TRANSFER_NUM_OBJ.

In addition, when KRPC is enabled and 
FLAGS_TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
has the default value of 0, we will automatically bump the thread cache sizes
to 1GB. Without these workarounds, stress test with KRPC will grind to a halt
due to contention for the spinlock in TCMalloc's central cache. With these
workarounds, the stress test completes within the same ballpark as thrift.

Also did a perf run with Thrift. The regression in TPCH-Q2 is mostly due to 
sensitivity
in runtime filter timing and the avg can be dragged up due to a bad run when 
filters
arrive late. No regression as measured in targeted-perf.

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 18.93   | -0.84% | 10.08  | 
+1.45% |
++---+-++++

++--+---++-++++-+---+
| Workload   | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
++--+---++-++++-+---+
| TPCH(_300) | TPCH-Q2  | parquet / none / none | 6.28   | 3.25| R 
+93.41%  | * 49.77% * | * 12.47% * | 1   | 3 |
| TPCH(_300) | TPCH-Q4  | parquet / none / none | 5.00   | 4.77|   
+4.83%   |   0.41%|   0.03%| 1   | 3 |
| TPCH(_300) | TPCH-Q13 | parquet / none / none | 21.29  | 20.69   |   
+2.90%   |   0.55%|   0.37%| 1   | 3 |
| TPCH(_300) | TPCH-Q11 | parquet / none / none | 1.73   | 1.71|   
+0.94%   |   1.69%|   2.85%| 1   | 3 |
| TPCH(_300) | TPCH-Q14 | parquet / none / none | 6.03   | 5.99|   
+0.76%   |   0.00%|   0.95%| 1   | 3 |
| TPCH(_300) | TPCH-Q16 | parquet / none / none | 6.97   | 6.93|   
+0.58%   |   0.74%|   0.73%| 1   | 3 |
| TPCH(_300) | TPCH-Q3  | parquet / none / none | 29.15  | 29.03   |   
+0.40%   |   1.63%|   1.39%| 1   | 3 |
| TPCH(_300) | TPCH-Q1  | parquet / none / none | 14.01  | 13.96   |   
+0.34%   |   1.28%|   0.51%| 1   | 3 |
| TPCH(_300) | TPCH-Q6  | parquet / none / none | 1.27   | 1.27|   
-0.03%   |   3.69%|   0.07%| 1   | 3 |
| TPCH(_300) | TPCH-Q9  | parquet / none / none | 30.99  | 31.13   |   
-0.45%   |   0.54%|   0.19%| 1   | 3 |
| TPCH(_300) | TPCH-Q5  | parquet / none / none | 48.03  | 48.33   |   
-0.63%   |   4.72%|   0.11%| 1   | 3 |
| TPCH(_300) | TPCH-Q7  | parquet / none / none | 46.85  | 47.41   |   
-1.18%   |   1.59%|   0.46%| 1   | 3 |
| TPCH(_300) | TPCH-Q8  | parquet / none / none | 7.92   | 8.03|   
-1.39%   |   3.67%|   5.63%| 1   | 3 |
| TPCH(_300) | TPCH-Q19 | parquet / none / none | 30.98  | 31.51   |   
-1.67%   |   1.33%|   0.82%| 1   | 3 |
| TPCH(_300) | TPCH-Q18 | parquet / none / none | 33.55  | 34.13   |   
-1.71%   |   1.15%|   1.46%| 1   | 3 |
| TPCH(_300) | TPCH-Q10 | parquet / none / none | 9.46   | 9.64|   
-1.82%   |   0.63%|   0.

[Impala-ASF-CR] IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char length() from length()

2018-01-11 Thread John Russell (Code Review)
John Russell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9014


Change subject: IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char_length() from 
length()
..

IMPALA-2172, IMPALA-6391: [DOCS] Distinguish char_length() from length()

Modify both char_length() and length() usage notes to say when they
return the same or different results.

Include the same example, showing both STRING and CHAR types,
under both functions.

Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625
---
M docs/shared/impala_common.xml
M docs/topics/impala_string_functions.xml
2 files changed, 61 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/9014/1
--
To view, visit http://gerrit.cloudera.org:8080/9014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625
Gerrit-Change-Number: 9014
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8991 )

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..


Patch Set 4:

PS4 moves the initialization to ExecEnv::InitBufferPool() instead. Fixed up a 
couple of BE tests which didn't initialize this TCMalloc setting as part of 
setup.


--
To view, visit http://gerrit.cloudera.org:8080/8991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e
Gerrit-Change-Number: 8991
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 12 Jan 2018 02:32:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Michael Ho (Code Review)
Hello Sailesh Mukil, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8991

to look at the new patch set (#4).

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..

IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

KRPC in general tends to put more pressure on the thread
caches due to allocations of more small objects (i.e. <1MB).
While some of them are being addressed in KUDU-1865, it's shown
that the following TCMalloc workarounds will provide reasonable
performance with KRPC:

- TCMALLOC_TRANSFER_NUM_OBJ:
   - maximum number of object per classe type to transfer between
 thread and central caches.
   - the default value of 512 in 2.5.2 seems to cause the spin lock
 in the central cache to be held for too long with KRPC. 2.6.0
 and latter reverts this value to 32 by default.

- TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
  - total amount of memory allocated to all thread caches in bytes
  - the default value is 32MB. We need to bump it to 1GB which is the
internal cap in TCMalloc.

This change upgrades GPerfTools/TCMalloc to 2.6.3 to pick up the
change of the default value of TCMALLOC_TRANSFER_NUM_OBJ.

In addition, when KRPC is enabled and 
FLAGS_TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
has the default value of 0, we will automatically bump the thread cache sizes
to 1GB. Without these workarounds, stress test with KRPC will grind to a halt
due to contention for the spinlock in TCMalloc's central cache. With these
workarounds, the stress test completes within the same ballpark as thrift.

Also did a perf run with Thrift. The regression in TPCH-Q2 is mostly due to 
sensitivity
in runtime filter timing and the avg can be dragged up due to a bad run when 
filters
arrive late. No regression as measured in targeted-perf.

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 18.93   | -0.84% | 10.08  | 
+1.45% |
++---+-++++

++--+---++-++++-+---+
| Workload   | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
++--+---++-++++-+---+
| TPCH(_300) | TPCH-Q2  | parquet / none / none | 6.28   | 3.25| R 
+93.41%  | * 49.77% * | * 12.47% * | 1   | 3 |
| TPCH(_300) | TPCH-Q4  | parquet / none / none | 5.00   | 4.77|   
+4.83%   |   0.41%|   0.03%| 1   | 3 |
| TPCH(_300) | TPCH-Q13 | parquet / none / none | 21.29  | 20.69   |   
+2.90%   |   0.55%|   0.37%| 1   | 3 |
| TPCH(_300) | TPCH-Q11 | parquet / none / none | 1.73   | 1.71|   
+0.94%   |   1.69%|   2.85%| 1   | 3 |
| TPCH(_300) | TPCH-Q14 | parquet / none / none | 6.03   | 5.99|   
+0.76%   |   0.00%|   0.95%| 1   | 3 |
| TPCH(_300) | TPCH-Q16 | parquet / none / none | 6.97   | 6.93|   
+0.58%   |   0.74%|   0.73%| 1   | 3 |
| TPCH(_300) | TPCH-Q3  | parquet / none / none | 29.15  | 29.03   |   
+0.40%   |   1.63%|   1.39%| 1   | 3 |
| TPCH(_300) | TPCH-Q1  | parquet / none / none | 14.01  | 13.96   |   
+0.34%   |   1.28%|   0.51%| 1   | 3 |
| TPCH(_300) | TPCH-Q6  | parquet / none / none | 1.27   | 1.27|   
-0.03%   |   3.69%|   0.07%| 1   | 3 |
| TPCH(_300) | TPCH-Q9  | parquet / none / none | 30.99  | 31.13   |   
-0.45%   |   0.54%|   0.19%| 1   | 3 |
| TPCH(_300) | TPCH-Q5  | parquet / none / none | 48.03  | 48.33   |   
-0.63%   |   4.72%|   0.11%| 1   | 3 |
| TPCH(_300) | TPCH-Q7  | parquet / none / none | 46.85  | 47.41   |   
-1.18%   |   1.59%|   0.46%| 1   | 3 |
| TPCH(_300) | TPCH-Q8  | parquet / none / none | 7.92   | 8.03|   
-1.39%   |   3.67%|   5.63%| 1   | 3 |
| TPCH(_300) | TPCH-Q19 | parquet / none / none | 30.98  | 31.51   |   
-1.67%   |   1.33%|   0.82%| 1   | 3 |
| TPCH(_300) | TPCH-Q18 | parquet / none / none | 33.55  | 34.13   |   
-1.71%   |   1.15%|   1.46%| 1   | 3 |
| TPCH(_300) | TPCH-Q10 | parquet / none / none | 9.46   | 9.64|   
-1.82%   |   0.63%|   0.

[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#10).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
16 files changed, 656 insertions(+), 213 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/10
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 10
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778: }
> Makes sense. The function comment incorrectly states they are added to 'non
fixed the comment-- yes, that was inconsistent.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809
PS8, Line 809:   if (!state_->query_options().parquet_dictionary_filtering || 
dict_filter_map_.empty()) {
> My point was about whether child readers are included or not. In this code
clarified in the comment for the members.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845
PS8, Line 845:   // where some data pages are dictionary-encoded and others are 
plain
> Agree. Would be good to add a Preonditions check to the FE SlotDescriptor c
added to FE's SlotDescriptor.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222
PS8, Line 222:   typedef std::map>
> Makes sense.
added to TupleDescriptor; its the one that currently has more of an overview.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548
PS8, Line 548:   private boolean isCollectionNotEmpty(TupleDescriptor desc) {
> I'm still in favor of removing. Reasons:
makes sense. inlined it.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102
PS8, Line 1102: // in the same order as the normal conjuncts. Sort the 
indices so that the
> Sorry this was my being vague. I meant creating a helper function for addin
Done


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS8, Line 1116: for (Integer idx : totalIdxList) {
> The perTupleConjuncts already have a new list, so no member is modified wou
yup, completely missed that.


http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639
PS9, Line 639: List tupleIds = Lists.newArrayList();
> remove, Preconditions check in L646 must trivially be true after L645
Done


http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330
PS8, Line 330:parquet statistics predicates: c_custkey > 0, o.o_orderkey > 
0, l.l_partkey > 0
> that pattern makes more sense to me as well, so the oddity is really an exi
added a jira to track it.



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 02:29:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..

IMPALA-6290: limit ScannerContext to 1 buffer at a time

This is a prerequisite for constraining the number of buffers per scan
range. Before this patch, calling ReadBytes(), SkipBytes(), etc could
cause an arbitrary number of I/O buffers to accumulate in
'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for
a range and then called ReadBytes(30MB), we would hit resource
exhaustion as soon as 3 buffers were accumulated in
'completed_io_buffers_'.

The fix is to avoid accumulating any buffers in 'completed_io_buffers_'.
Instead of adding them to 'completed_io_buffers_', completed buffers
are just returned to the I/O manager. It turned out that this did not
weaken the ScannerContext's guarantees about memory lifetime, because
ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each
time it was called regardless. I checked that this behaviour wasn't
a bug by inspecting the scanner code. I could not find any cases
where scanners depended on returned memory remaining valid beyond
the next Read*()/Get*()/Skip*() call on the stream.

This change makes that lifetime explicit in the comments. A
side-effect of this fix is that scanners do not need to call
ReleaseCompletedResources() in CommitRows() and means that the
ScannerContext only ever needs to hold one I/O buffer at a time.

This change also reimplements SkipBytes() to avoid it accumulating
memory in the boundary buffer for large skip sizes.

Also clarifies some of the invariants in ScannerContext. E.g. some
places assumed io_buffer_ != NULL, but that is no longer needed.

Testing:
Ran core tests with ASAN and exhaustive tests with DEBUG.

Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Reviewed-on: http://gerrit.cloudera.org:8080/8814
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/io/disk-io-mgr.cc
10 files changed, 315 insertions(+), 209 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 12 Jan 2018 02:06:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2397: Use atomics for metrics

2018-01-11 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9012


Change subject: IMPALA-2397: Use atomics for metrics
..

IMPALA-2397: Use atomics for metrics

This change removes the spinlock in IntGauge and IntCounter
and uses AtomicInt64 instead. As shown in IMPALA-2397, multiple
threads can be contending for the spinlocks of some global metrics
under concurrent queries.

This change also breaks up SimpleMetric into two subclasses:
- SimpleProperty:
  - a value store for any primitive type (int,float,string etc).
  - atomic read and write via value() and set_value() respectively.

- AtomicMetric:
  - the basis of IntGauge and IntCounter. Support atomic increment
of the metric value via Increment() interface.
  - atomic read and write via value() and set_value() respectively.
  - only support int64_t type.

With this change, DoubleGauge is removed from the code base as the
existing use case doesn't make use of the Increment() interface so
replacing it with DoubleProperty is sufficient.

Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
---
M be/src/exec/external-data-source-executor.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/thrift-server.cc
M be/src/runtime/client-cache.cc
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/scheduler.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/util/impalad-metrics.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M be/src/util/metrics-test.cc
M be/src/util/metrics.h
M be/src/util/thread.cc
M common/thrift/metrics.json
22 files changed, 267 insertions(+), 246 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/9012/1
--
To view, visit http://gerrit.cloudera.org:8080/9012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I48dfa5443cd771916b53541a0ffeaf1bcc7e7606
Gerrit-Change-Number: 9012
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-01-11 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#4).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range datastructures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/client-request-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
9 files changed, 343 insertions(+), 178 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/4
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-01-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc@582
PS3, Line 582: ComputeScanRangeAssignment
> This function now does both assignment and generation of scan ranges. Also,
I moved it to QueryScheduler... that seems to be the place where data about 
scheduling is stored (including the request with scan ranges from the FE). Now, 
the idea is to first generate these ranges (done in client request), then 
schedule. If there is a better place to store such per request info, let me 
know and I can easily move it there.
The only thing the scheduler knows about all of this right now is to stitch 
together computed and given ranges-- that's a bit gorpy (L258) and I plan to 
fix it if this flow/organization looks ok.


http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift@195
PS3, Line 195: struct THdfsFileSplitSpec {
 :   1: required CatalogObjects.THdfsFileDesc file_desc
 :   // -1 means not-splittable
 :   2: required i64 max_block_size
 :   3: required i64 partition_id
 : }
> THdfsFileDesc and THdfsFileSplit have quite a bit of intersection wrt the i
I did this for two reasons:
1) I expect this Spec to be the only thing that comes out of the FE for hdfs 
files (unexpanded) and THdfsFileSplit will only be present on the BE 
(expanded). Currently, I'm doing this just for s3/adls/local, but I'm thinking 
to apply it to all (not this change) hdfs files.
2) I preferred to use the type-checker to tell me which representation I have 
(expanded vs. unexpanded). Otherwise, I'd have to peer into THdfsFileSplit to 
determine this state, which seems more error-prone.


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@131
PS3, Line 131: new int[] {}
> I believe you don't have to do that. You can probably pass null and then av
Done


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@338
PS3, Line 338: its
> typo: it's
Done


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@666
PS3, Line 666: private void generateScanRangeSpecs
> No intention to be picky but can you move this and tansformBlocksToScanRang
Done


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@694
PS3, Line 694:* TScanRangeLocationLists are added scanRanges_. If disk ids 
are expected in blocks
> to
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 01:19:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2018-01-11 Thread John Russell (Code Review)
Hello Lars Volker, Michael Brown, David Knupp, Csaba Ringhofer,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8771

to look at the new patch set (#4).

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..

IMPALA-5736: [DOCS] Document --query_option for impala-shell

Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_query_options.xml
M docs/topics/impala_set.xml
M docs/topics/impala_shell_options.xml
4 files changed, 49 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8771/4
--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2018-01-11 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8771 )

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml
File docs/topics/impala_shell_options.xml:

http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml@627
PS3, Line 627: 
--config_file=path_to_config_file, to 
easily select between many
> Somewhere in this section you should mention the new [impala.query_options]
OK. It is mentioned alongside --query_option / -Q on lines 288-293 but I'll add 
a specific example.

Hmm, if I put live_progress=true into the [impala.query_options] section, I get:

Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
10a0212fef8829ffea0e2ec48eb19e894012c981)
LIVE_PROGRESS is not supported for the impalad being connected to, ignoring.

I never saw that before. I wonder if it's a limitation of 
[impala.query_options] that it doesn't work for the 2 special 'shell options'. 
I'll play it safe and just use a single query option, mem_limit, in my example.



--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:45:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-11 Thread Joe McDonnell (Code Review)
Hello Philip Zeyliger, Zach Amsden,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9009

to look at the new patch set (#2).

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..

IMPALA-6386: Invalidate metadata at table level for dataload

Dataload currently executes bin/load-data.py for TPC-H,
TPC-DS, and functional-query concurrently. One of the final
steps for bin/load-data.py is to run a global "invalidate
metadata". Global "invalidate metadata" commands are known
to cause problem on concurrent systems. See IMPALA-5087.
For dataload, if TPC-H executes "invalidate metadata" while
TPC-DS is still creating tables and adding partitions,
the TPC-DS executor might erroneously believe that a table
does not exist.

This changes dataload to invalidate metadata at an
individual table level rather than globally. This
prevents the concurrency issue.

This also changes the names of some of the intermediate
SQL files generated by generate-schema-statements.py
and consumed by load-data.py to make them less confusing.

Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
---
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
2 files changed, 16 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/9009/2
--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-11 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318
PS1, Line 318: create_filename = 'load-%s-impala-generated' % 
load_file_substr
> This isn't your fault, but it seems particularly fragile that we have to ge
I agree. The SQL generation should also generate a dependency graph, which 
should drive execution. Arbitrary filenames are a bad interface.

I'm fixing the filenames so that create statements are in a 
create-%s-impala-generated and load is in a load-%s-impala-generated. This 
makes it a bit less weird.



--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:43:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778: }
> Makes sense. The function comment incorrectly states they are added to 'non
Please don't wait for my change! I intend to wait for this to get merged and 
then rebase.



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 8
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:31:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 9:

(9 comments)

Change looks pretty good to me

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778: }
> no, I intended to omit the collection readers. this partitions all nested s
Makes sense. The function comment incorrectly states they are added to 
'non_filtering'.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809
PS8, Line 809:   if (!state_->query_options().parquet_dictionary_filtering || 
dict_filter_map_.empty()) {
> yup... I assumed that if its a mix of collection + scalars is ok since that
My point was about whether child readers are included or not. In this code path 
we only add the top-level readers. In the other code path we recursively 
collect all nested readers as well.

I think we should either be consistent or add a comment explaining that in some 
cases only the top-level readers are added.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845
PS8, Line 845:   // where some data pages are dictionary-encoded and others are 
plain
> that's correct, removed. I think I carried that conservative check over fro
Agree. Would be good to add a Preonditions check to the FE SlotDescriptor 
c'tors.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222
PS8, Line 222:   typedef std::map>
> I'll comment here on the  vs slotid decision here (covers
Makes sense.

Would be good to clarify this in TupleDescriptor and/or SlotDescriptor as you 
suggested.

SlotIds and TupleIds are unique within a query. A SlotId belongs to exactly one 
TupleId.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548
PS8, Line 548:   private boolean isCollectionNotEmpty(TupleDescriptor desc) {
> its used in a couple of places and I wanted to keep the same checks in both
I'm still in favor of removing. Reasons:

The first Preconditions check only seems useful in the context of this 
function, and even so you'll get the expected "false" return value if we probe 
for null.

The second Preconditions check is not necessary because the type of a tuple is 
always StructType (see type_ member in TupleDescriptor).


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102
PS8, Line 1102: // in the same order as the normal conjuncts. Sort the 
indices so that the
> Done
Sorry this was my being vague. I meant creating a helper function for adding 
the dictionary filtering part of the explain string. It's a good chunk of code 
that deals with that part.

I think the grouping an be done in that function and doesn't need a separate 
helper.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS8, Line 1116: for (Integer idx : totalIdxList) {
> didn't want to mutate a class member for the purpose of printing diags.
The perTupleConjuncts already have a new list, so no member is modified would 
be modified.


http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639
PS9, Line 639: List tupleIds = Lists.newArrayList();
remove, Preconditions check in L646 must trivially be true after L645


http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330
PS8, Line 330:parquet statistics predicates: c_custkey > 0, o.o_orderkey > 
0, l.l_partkey > 0
> hmm... actually I followed the pattern on L324-5 which separates the predic
that pattern makes more sense to me as well, so the oddity is really an 
existing issue with the "parquet statistics predicates"



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd

[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-11 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318
PS1, Line 318: create_filename = 'load-%s-impala-generated' % 
load_file_substr
> The '%s-impala-generated' currently matches files like 'load-functional-que
This isn't your fault, but it seems particularly fragile that we have to 
generate identical filenames in multiple pieces of python code.



--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:24:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-11 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318
PS1, Line 318: create_filename = 'load-%s-impala-generated' % 
load_file_substr
> create_filename = 'load'
The '%s-impala-generated' currently matches files like 
'load-functional-query-exhaustive-impala-generated-avro-def-block.sql'. It is 
matching by asking if create_filename is in the filename. So, this match works, 
but '%s-impala-generated' (filled in => 
'functional-query-exhaustive-impala-generated') is not particularly specific. 
Right now, we don't have any other filenames of this form, but my new 
'invalidate-%s-impala-generated.sql' files match this condition. So, to 
distinguish between the invalidate file and the preexisting files, I need to 
make the condition more exact.

The file names are bad. I'll look into renaming some things.



--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:16:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

2018-01-11 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#8).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set 
of characters.
..

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 175 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/8
--
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 8
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

2018-01-11 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set 
of characters.
..


Patch Set 7:

(4 comments)

Thanks for the comments. Please see patch set #8

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@441
PS7, Line 441: IS_CONSTANT
> We can possibly get rid of this after IMPALA-6380 is fixed.
Agree. Will revisit this aspect of the code.


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@443
PS7, Line 443: chars_to_trim.is_null
> As discussed offline, we cannot make any assumption about the length of Str
As discussed, this seems like a bad DCHECK anyway - because '' and NULL are 
legit values of chars_to_trim, even when the arguments come from column 
elements. Removing the DCHECK, and instead check for zero-length or NULL 
chars_to_trim, as these do not modify the target string.

Have added code to check for NULL chars_to_trim in TrimPrepare().


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@87
PS7, Line 87:   static void TrimPrepare(FunctionContext*, 
FunctionContext::FunctionStateScope);
> A quick comment on what this Prepare() function do would be useful.
Done


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@149
PS7, Line 149: template 
> Please add a comment on what these template parameters stand for.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 7
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:14:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala

2018-01-11 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8439

to look at the new patch set (#3).

Change subject: IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala
..

IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala

KRPC has some flags that turn on TLS. This patch sets those to enable
TLS communication.

Tests are added to rpc-mgr-test.

Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a
---
M be/src/catalog/catalogd-main.cc
M be/src/rpc/authentication-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/runtime/exec-env.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/util/openssl-util.cc
M be/src/util/openssl-util.h
15 files changed, 326 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8439/3
--
To view, visit http://gerrit.cloudera.org:8080/8439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a
Gerrit-Change-Number: 8439
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala

2018-01-11 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8439

to look at the new patch set (#2).

Change subject: IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala
..

IMPALA-5054: [SECURITY] Enable KRPC w/ TLS in Impala

KRPC has some flags that turn on TLS. This patch sets those to enable
TLS communication.

Tests are added to rpc-mgr-test.

Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a
---
M be/src/catalog/catalogd-main.cc
M be/src/rpc/authentication-test.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/runtime/exec-env.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/util/openssl-util.cc
M be/src/util/openssl-util.h
15 files changed, 327 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8439/2
--
To view, visit http://gerrit.cloudera.org:8080/8439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a14a44fdea9ab668f3714eb69fdb188bce38f5a
Gerrit-Change-Number: 8439
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-11 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9009 )

Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/9009/1/bin/load-data.py@318
PS1, Line 318: create_filename = 'load-%s-impala-generated' % 
load_file_substr
create_filename = 'load'

 followed by

load_filename = '...

Seems strange.  Also, I don't see a corresponding change to this anywhere else, 
seems like there should be one?



--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:03:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2018-01-11 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8771 )

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..


Patch Set 3:

(1 comment)

What is here now is fine, but I think one piece of functionality is missing. 
See the inline comment.

http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml
File docs/topics/impala_shell_options.xml:

http://gerrit.cloudera.org:8080/#/c/8771/3/docs/topics/impala_shell_options.xml@627
PS3, Line 627:
Somewhere in this section you should mention the new [impala.query_options] 
setting in impalarc. See for example 
https://gerrit.cloudera.org/#/c/8038/19/shell/option_parser.py L22.



--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:00:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: [DOCS] Document --query option for impala-shell

2018-01-11 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8771 )

Change subject: IMPALA-5736: [DOCS] Document --query_option for impala-shell
..


Patch Set 3:

Tagging in Mike B for possible +2, since I see he reviewed the code gerrit.


--
To view, visit http://gerrit.cloudera.org:8080/8771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5fa4fc27d6566e87fdabe57edc176133d586a84b
Gerrit-Change-Number: 8771
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Thu, 11 Jan 2018 23:40:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(6 comments)

Did a pass over this, focusing on the backend implementation.

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@31
PS1, Line 31: NULL
Prefer nullptr in new code.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@32
PS1, Line 32:   child_row_idx_(0),
We've mostly been switching to initialising member variables to constants 
inline in the class definition.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56
PS1, Line 56:   if (child_row_batch_->num_rows() == 0) {
> Is it possible/ok for this condition to be false? See my comment in scalar-
I think this needs to be a while loop that resets the batch after each 
iteration. It's valid for GetNext() to return a row batch with 0 rows even 
before eos. This is sometimes an implementation artefact and sometimes to flush 
memory resources attached to the batch.

So if GetNext() returns 0 rows and would have returned 1 rows on the next call, 
I think this would incorrectly fail. Or if GetNext() returned 1 row && !eos, 
then 0 rows && eos.

I think the two valid termination conditions for this node are if it hits eos 
or if it sees > 1 row.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78
PS1, Line 78:   child_row_batch_->TransferResourceOwnership(row_batch);
> Just an idea, not sure if a good one: it may be better to deep copy the row
I think it's the start of a good idea. I originally suggested modelling this 
after SelectNode(), which is a streaming node that passes through its input 
rows.

However, if we deep copy the output row, we could implement this as a blocking 
node that consumes its input in Open(), copies the single surviving row, then 
closes the input tree.

Blocking nodes can reduce resource requirements for a plan because the planner 
knows that operators above and below the blocking node in the same fragment 
don't execute concurrently. There's a virtual function in the frontend ExecNode 
implementation that determines if it's blocking or not.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@735
PS1, Line 735:* @throws ImpalaException
We don't generally have @throws declarations in Java comments


http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839
PS1, Line 839:  QUERY
Maybe add a test that sets num_nodes=1 to make sure that the single-node 
versino of the plans are executable.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 23:32:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1709/


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Jan 2018 23:29:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6386: Invalidate metadata at table level for dataload

2018-01-11 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9009


Change subject: IMPALA-6386: Invalidate metadata at table level for dataload
..

IMPALA-6386: Invalidate metadata at table level for dataload

Dataload currently executes bin/load-data.py for TPC-H,
TPC-DS, and functional-query concurrently. One of the final
steps for bin/load-data.py is to run a global "invalidate
metadata". Global "invalidate metadata" commands are known
to cause problem on concurrent systems. See IMPALA-5087.
For dataload, if TPC-H executes "invalidate metadata" while
TPC-DS is still creating tables and adding partitions,
the TPC-DS executor might erroneously believe that a table
does not exist.

This changes dataload to invalidate metadata at an
individual table level rather than globally. This
prevents the concurrency issue.

Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
---
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
2 files changed, 13 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/9009/1
--
To view, visit http://gerrit.cloudera.org:8080/9009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
Gerrit-Change-Number: 9009
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8971 )

Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool
..


Patch Set 2:

(21 comments)

I think this is looking good overall.

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator-backend-state.cc@410
PS2, Line 410:   return res.__isset.status ? Status(res.status) : Status::OK();
Why not always set TPublishFilterParams.status. This seems to be adding another 
way to represent Status::OK()


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1179
PS2, Line 1179:   // Cancel query if the result of PublishFilter rpc 
returns a non-ok status.
Nit: slightly redundant wording with result/returns. E.g. "if the PublishFilter 
RPC returns a non-ok status"


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1182
PS2, Line 1182: "This Error was encountered in Fragment $0 at 
$1:$2", fragment_idx,
nit: don't need caps on Error/Fragment.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/coordinator.cc@1184
PS2, Line 1184: Cancel(&status);
If we're going to cancel the query there's no point sending out more filters so 
we could just break out of the loop. Alternatively, we could just log the error 
to LOG(ERROR) and continue, since successful delivery of a filter isn't 
required for correctness.

I can see pros and cons for failing the query but don't jave a strong opinion 
about which behaviour is better. It may be simpler to just log the error to 
reduce the amount of logic on the error path.

Or we could go even further and just log the error on the backend where it 
happens.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@111
PS2, Line 111: input
output?


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@118
PS2, Line 118: *&
Let's just use a double pointer for the output to be consistent with the rest 
of the code.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.h@165
PS2, Line 165: total_filter_mem_required_
I think total_bloom_filter_mem_required_ would be more consistent with the rest 
of the variables.


http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/8971/1/be/src/runtime/runtime-filter-bank.cc@189
PS1, Line 189:   if (buffer_pool_client_.GetUnusedReservation() < 
required_space) {
> yes its shouldn't fail, I just put it there as an additional check to make
I don't see this change in PS2


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a78
PS2, Line 78:
Hmm so we were unnecessarily re-allocating filters if they were already 
registered, right?


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/runtime/runtime-filter-bank.cc@a80
PS2, Line 80:
Was the lock removed accidentally?


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@71
PS2, Line 71:   void Close();
Maybe mention that Close() is safe to call multiple times or if Init() wasn't 
called or Init() failed.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.h@111
PS2, Line 111:   /// buffer pool allocation failed and the filter should be 
discarded.
Is the change in this function's behaviour needed now?


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@30
PS2, Line 30:   : always_false_(true),
Maybe initialize the constant ones at their definition while we're here. I find 
that a bit more readable.


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@63
PS2, Line 63: directory_
directory_ != nullptr, to be consistent with the rest of the code


http://gerrit.cloudera.org:8080/#/c/8971/2/be/src/util/bloom-filter.cc@72
PS2, Line 72:   if (directory_) {
directory_ != nullptr, to be consistent with the rest of the code


http://gerrit.cloudera.org:8080/#/c/8971/2/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/8971/2/common/thrift/ImpalaInternalService.thrift@833
PS2, Line 833:   1: opt

[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9000 )

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..

IMPALA-6384: RequestPoolService should honor custom group mapping config

Due to the way in which we instantiate fair scheduler allocation
loader, we donot read the config overrides from the HDFS config
files.

This is an unexpected behavior from users' POV since we typically
support overrides like custom user -> group mapping via HDFS
config (for ex: LDAPGroupsMapping) that eventually affects the
query -> pool assignment.

Fix: This patch loads the hadoop default configuration so that the
underlying QueuePlacementPolicy is based on user specified overrides.

Testing (manual): Changed the core-site.xml to use LDAPGroupsMapping
instead of the default ShellBasedUnixGroupsMapping and confirmed that
the correct group mapping plugin is loaded, by adding additional logging.

Also, modified TestRequestPoolService to assert that the core-site xml
overrides are loaded.

Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Reviewed-on: http://gerrit.cloudera.org:8080/9000
Reviewed-by: Bharath Vissapragada 
Tested-by: Impala Public Jenkins
---
M 
common/yarn-extras/src/main/java/org/apache/impala/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
M fe/src/main/java/org/apache/impala/util/RequestPoolService.java
M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
4 files changed, 32 insertions(+), 2 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9000 )

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:52:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.

2018-01-11 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8930

to look at the new patch set (#4).

Change subject: IMPALA-6307: CTAS statement fails with duplicate column 
exception.
..

IMPALA-6307: CTAS statement fails with duplicate column exception.

A CTAS statement with a 'partition by' clause causes the statement
to fail with a duplicate column name exception. This is happening
because on expression rewrite, the partition defs state is not reset.

IMPALA-5796 added TableDef::reset(). This patch expands the method by
adding calls to reset ColumnDefs and PartitionColumnDefs.

Testing:
  * Regression test added to AnalyzeDDLTest.
  * Exhaustive Jenkins build and test.

Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
5 files changed, 27 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8930/4
--
To view, visit http://gerrit.cloudera.org:8080/8930
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
Gerrit-Change-Number: 8930
Gerrit-PatchSet: 4
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.

2018-01-11 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8930 )

Change subject: IMPALA-6307: CTAS statement fails with duplicate column 
exception.
..


Patch Set 3:

(1 comment)

Thanks for the comment. Have added another test case, this one for Kudu.

http://gerrit.cloudera.org:8080/#/c/8930/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java:

http://gerrit.cloudera.org:8080/#/c/8930/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java@58
PS3, Line 58: kuduPartitionParams_.clear();
> This is new. Do we have any existing tests with a CTAS on Kudu where the se
Added a new CTAS test for Kudu, which is the same repro test case tailored for 
Kudu.



--
To view, visit http://gerrit.cloudera.org:8080/8930
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
Gerrit-Change-Number: 8930
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:44:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1710/


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:24:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 14: Code-Review+2

Sorry for the churn here - it looks like I changed the expected output for 
those tests based on a broken local data load


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:24:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Lars Volker, Zoram Thanga, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8814

to look at the new patch set (#14).

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..

IMPALA-6290: limit ScannerContext to 1 buffer at a time

This is a prerequisite for constraining the number of buffers per scan
range. Before this patch, calling ReadBytes(), SkipBytes(), etc could
cause an arbitrary number of I/O buffers to accumulate in
'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for
a range and then called ReadBytes(30MB), we would hit resource
exhaustion as soon as 3 buffers were accumulated in
'completed_io_buffers_'.

The fix is to avoid accumulating any buffers in 'completed_io_buffers_'.
Instead of adding them to 'completed_io_buffers_', completed buffers
are just returned to the I/O manager. It turned out that this did not
weaken the ScannerContext's guarantees about memory lifetime, because
ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each
time it was called regardless. I checked that this behaviour wasn't
a bug by inspecting the scanner code. I could not find any cases
where scanners depended on returned memory remaining valid beyond
the next Read*()/Get*()/Skip*() call on the stream.

This change makes that lifetime explicit in the comments. A
side-effect of this fix is that scanners do not need to call
ReleaseCompletedResources() in CommitRows() and means that the
ScannerContext only ever needs to hold one I/O buffer at a time.

This change also reimplements SkipBytes() to avoid it accumulating
memory in the boundary buffer for large skip sizes.

Also clarifies some of the invariants in ScannerContext. E.g. some
places assumed io_buffer_ != NULL, but that is no longer needed.

Testing:
Ran core tests with ASAN and exhaustive tests with DEBUG.

Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/io/disk-io-mgr.cc
10 files changed, 315 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/14
--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has removed a vote on this change.

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..


Removed Code-Review+2 by Sailesh Mukil 
--
To view, visit http://gerrit.cloudera.org:8080/8991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e
Gerrit-Change-Number: 8991
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8991 )

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..


Patch Set 3:

> (1 comment)

To avoid extra complication then, wouldn't it be easier to move the 
initialization somewhere else?


--
To view, visit http://gerrit.cloudera.org:8080/8991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e
Gerrit-Change-Number: 8991
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:21:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1707/


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:16:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8991 )

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8991/3/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/8991/3/be/src/common/init.cc@262
PS3, Line 262:   MallocExtension::instance()->SetNumericProperty(
 :   "tcmalloc.aggressive_memory_decommit", 1);
This actually causes a linking issue in libfesupport.so which isn't linked 
against the tcmalloc library. FE tests will fail due to unresolved symbols.

Two possible ways of fixing it:
- Move this initialization somewhere
- Fix CMakeList.txt of libCommon.a to add a dependency on tcmallocstatic but 
this will make IMPALA_LINK_LIBS_DYNAMIC_TARGETS moot as it explicitly avoids 
linking against tcmalloc library due to the use of JVM but I cannot quite 
understand the reasoning of the comments as we statically link against tcmalloc 
library:


# The above link list does not include tcmalloc. This is because the Impala JVM 
support
# libraries (libfesupport, libloggingsupport) cannot use tcmalloc in all cases. 
When they
# are started up by the FE (for tests) the jvm has already made allocations 
before
# tcmalloc can be loaded. In all other binaries, we can use tcmalloc except the 
address
# sanitizer build. Address sanitizer is incompatible with tcmalloc (they both 
intercept
# malloc/free)



--
To view, visit http://gerrit.cloudera.org:8080/8991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e
Gerrit-Change-Number: 8991
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:12:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5519: Allocate Runtime filter memory from Buffer pool

2018-01-11 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8971


Change subject: IMPALA-5519: Allocate Runtime filter memory from Buffer pool
..

IMPALA-5519: Allocate Runtime filter memory from Buffer pool

This patch adds changes to the planner to account for memory used by
bloom filters at the fragment instance level. Also adds changes to
allocate memory for the bloom filters from the buffer pool.

Testing:
- Modified Planner Tests and end to end tests to account for memory
  reservation for the runtime filters.
- Modified backend tests and benchmarks to use the bufferpool for
  bloom filter allocation.
- Add an end to end test.
- Ran rest of the core tests.

Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
---
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/impala-internal-service.cc
M be/src/util/backend-gflag-util.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/BackendGflags.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
43 files changed, 868 insertions(+), 559 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/8971/2
--
To view, visit http://gerrit.cloudera.org:8080/8971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iea2759665fb2e8bef9433014a8d42a7ebf99ce1f
Gerrit-Change-Number: 8971
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8986 )

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 11 Jan 2018 21:49:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8986 )

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..

IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Reviewed-on: http://gerrit.cloudera.org:8080/8986
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins
---
M docs/shared/impala_common.xml
M docs/topics/impala_disable_row_runtime_filtering.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_max_num_runtime_filters.xml
M docs/topics/impala_runtime_bloom_filter_size.xml
M docs/topics/impala_runtime_filter_max_size.xml
M docs/topics/impala_runtime_filter_min_size.xml
M docs/topics/impala_runtime_filtering.xml
8 files changed, 71 insertions(+), 6 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8986 )

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/185/


--
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 11 Jan 2018 21:39:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3193: Show table's comment on show tables

2018-01-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8851 )

Change subject: IMPALA-3193: Show table's comment on show tables
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8851/5/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/8851/5/common/thrift/Frontend.thrift@70
PS5, Line 70: Arguments to getTableNamesAndComments
I think we can clean up the API a bit. Instead of having getTableNames and 
getTableNamesAndComments, how about we create a single getTablesInfo call that 
takes a TGetTablesInfoParams (renamed TGetTablesParams) argument and returns a 
TGetTablesInfoResult (renamed TGetTablesResult). The latter returns a 
list, where TTableInfo has a required name and optional comment 
field. That way, we avoid the checks around the length of list of names and 
list of comments and it's easier to expand it if in the future we decide to add 
additional information such as table creation time, owner, etc.



--
To view, visit http://gerrit.cloudera.org:8080/8851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44f814af05db6f3c027718ade9f474f8b8153bcd
Gerrit-Change-Number: 8851
Gerrit-PatchSet: 5
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Thu, 11 Jan 2018 21:38:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3942: Fix wronly escaped string literal in front-end

2018-01-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8818 )

Change subject: IMPALA-3942: Fix wronly escaped string literal in front-end
..


Patch Set 6:

(6 comments)

Code looks good to me, thanks for the contribution!

I have just one more round of comments, all about keeping your comments clear 
and concise. After that, we should be able to pull a +2-er in

http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG@7
PS6, Line 7: wronly
wrongly


http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG@15
PS6, Line 15:  and that's why we just choose one
: to always normalize to
remove this, covered by the paragraph below.


http://gerrit.cloudera.org:8080/#/c/8818/6//COMMIT_MSG@19
PS6, Line 19: .
... to single quotes


http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@41
PS6, Line 41:  It is not always possible to determine if a string "should" be
:   // single or double quoted(e.g. concat('a', "b")) and that's 
why we just choose one
:   // to always normalize to.
This second sentence is not necessary, this is explained better in the larger 
comment below anyways.


http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@99
PS6, Line 99: .
:   // Here is a background why we should deploy single quotes as a 
default
Its nice to keep comments short, so this can just be reduced to 'because:'


http://gerrit.cloudera.org:8080/#/c/8818/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@103
PS6, Line 103: It is easy to know if a given string is wrapped by single/double 
quotes, but
Similarly, in the interest of being concise, this can be removed.



--
To view, visit http://gerrit.cloudera.org:8080/8818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc4b5f5d8ffaa8feb96a466959427a04b3b06fec
Gerrit-Change-Number: 8818
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 21:14:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8986 )

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 11 Jan 2018 20:59:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command

2018-01-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" 
command
..


Patch Set 8:

(3 comments)

Almost there

http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java:

http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72
PS8, Line 72:   throw new AnalysisException("ALTER TABLE SET ROW FORMAT 
is only supported " +
The new analyzer tests don't cover this scenario, please add a test. You can 
use functional.alltypesmixedformat.


http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@73
PS8, Line 73:   "on TEXT or SEQUENCE file formats: " + 
partition.getPartitionName());
Let's add a sentence "Conflicting partition: " + name and include the format of 
the partition

otherwise it's not obvious what the partition name means


http://gerrit.cloudera.org:8080/#/c/8928/8/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@82
PS8, Line 82: "on TEXT or SEQUENCE file formats: " + 
tbl.getFullName());
let's include the format of the table



--
To view, visit http://gerrit.cloudera.org:8080/8928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Gerrit-Change-Number: 8928
Gerrit-PatchSet: 8
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Thu, 11 Jan 2018 20:53:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1767: [DOCS] Document new Boolean operators

2018-01-11 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8942 )

Change subject: IMPALA-1767: [DOCS] Document new Boolean operators
..


Patch Set 5:

Adding Alex, who reviewed the code gerrit. Looking for a +2.


--
To view, visit http://gerrit.cloudera.org:8080/8942
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb
Gerrit-Change-Number: 8942
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 11 Jan 2018 20:25:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8986 )

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml
File docs/topics/impala_runtime_filtering.xml:

http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@181
PS2, Line 181: ture representing a minimum and ma
> This is the only part I see that doesn't make sense for min-max filters, as
Done


http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@203
PS2, Line 203: gher, the default for runtime filtering is the 
GLOBAL setting.
 :   
> I find this sentence confusing, as Kudu isn't identifying the matching rows
Done



--
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 11 Jan 2018 20:16:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread John Russell (Code Review)
Hello Thomas Tauber-Marshall, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8986

to look at the new patch set (#3).

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..

IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
---
M docs/shared/impala_common.xml
M docs/topics/impala_disable_row_runtime_filtering.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_max_num_runtime_filters.xml
M docs/topics/impala_runtime_bloom_filter_size.xml
M docs/topics/impala_runtime_filter_max_size.xml
M docs/topics/impala_runtime_filter_min_size.xml
M docs/topics/impala_runtime_filtering.xml
8 files changed, 71 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/8986/3
--
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] Fix typo in test observability.

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9003 )

Change subject: Fix typo in test_observability.
..

Fix typo in test_observability.

Should fix "NameError: global name 'dgb_str' is not defined".

Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8
Reviewed-on: http://gerrit.cloudera.org:8080/9003
Reviewed-by: Alex Behm 
Reviewed-by: Zoram Thanga 
Tested-by: Impala Public Jenkins
---
M tests/query_test/test_observability.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Zoram Thanga: Looks good to me, but someone else must approve
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8
Gerrit-Change-Number: 9003
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] Fix typo in test observability.

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9003 )

Change subject: Fix typo in test_observability.
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8
Gerrit-Change-Number: 9003
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 20:10:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#9).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of  to
conjuncts that are dictionary filterable. The backend is
adjusted to use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/collection-value-builder.h
M be/src/runtime/scoped-buffer.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
14 files changed, 650 insertions(+), 208 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/9
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2018-01-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 8:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.h@420
PS8, Line 420:   /// dictionary filtering. The definition/repetition levels are 
are populated by the
> are are
Done


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778: }
> FWIW, I discovered today I needed to make some similar adjustments to the c
will take a look at your change more closely.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778: }
> Did you intend to add the collection_reader to non_filtering?
no, I intended to omit the collection readers. this partitions all nested 
scalar readers.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809
PS8, Line 809:   if (!state_->query_options().parquet_dictionary_filtering || 
dict_filter_map_.empty()) {
> Looks like non_dict_filterable_readers_ is inconsistently populated. With d
yup... I assumed that if its a mix of collection + scalars is ok since that's 
what works today (perhaps modified by tim's change). And I don't see a problem 
with flattening down to just the scalar readers.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845
PS8, Line 845:   if (tuple_desc == nullptr) tuple_desc = 
scan_node_->tuple_desc();
> Why this check? I believe all SlotDescriptors require a parent. We even DHC
that's correct, removed. I think I carried that conservative check over from 
the frontend where the parent is not required to be non-null in the ctor. 
should it be required?


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222
PS8, Line 222:   typedef std::map, 
std::vector>
> Why is the key a pair and not just SlotId or a SlotDescriptor*?
I'll comment here on the  vs slotid decision here (covers 
several related comments in the thrift file and hdfsscannode). I suspected that 
slotid is unique across tuples so with your other comments and various other 
implications baked into various apis, I've gone ahead and made this change. 
However, with a two-level naming scheme (tuple/slot), I was a bit unclear on 
this and did not find it stated explicitly. I would've expected this in 
frontend's TupleDescriptor-- should I add a clarifying blurb there?


http://gerrit.cloudera.org:8080/#/c/8775/8/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8775/8/common/thrift/PlanNodes.thrift@245
PS8, Line 245:   9: optional list dictionary_filter_conjuncts
> Not sure I understand why this needed to change. SlotIds are unique, so wha
Done. See comment in hdfs-scanner for why I went down this route.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@115
PS8, Line 115:  * each value present in the row group's dictionary
> ... prune a row group by evaluating the conjuncts against the respective co
Done


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@200
PS8, Line 200:   private Map, List> 
dictionaryFilterConjuncts_ =
> Unfortunate that we have to use a Pair as key since conceptually the pair h
Done. See comment in hdfs-scanner for why I went down this route.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@543
PS8, Line 543:* These filters are added by analysis (see: 
SelectStmt#registerIsNotEmptyPredicates).
> Move this comment to notEmptyCollections_
Done


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548
PS8, Line 548:   private boolean isCollectionNotEmpty(TupleDescriptor desc) {
> Why do we need this function? Can't we just check notEmptyCollections_.cont
its used in a couple of places and I wanted to keep the same checks in both 
(and not repeat the checks).


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@638
PS8, Line 638:   private void addDictionaryFilter(Analyzer analyzer, Expr 
conjunct, int conjunct_idx) {
> conjunctIdx
Done


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apac

[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3

2018-01-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8890 )

Change subject: IMPALA-3526: update FE tests to pass on S3
..


Patch Set 5:

alex, pls have a look and let me know if you have additional comments.


--
To view, visit http://gerrit.cloudera.org:8080/8890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141
Gerrit-Change-Number: 8890
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 11 Jan 2018 20:01:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3

2018-01-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8890 )

Change subject: IMPALA-3526: update FE tests to pass on S3
..


Patch Set 5: Code-Review+2

I am fine with the change to unblock the tests. Feel free to check with Alex in 
case he has any comments/concerns.


--
To view, visit http://gerrit.cloudera.org:8080/8890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141
Gerrit-Change-Number: 8890
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:54:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1709/


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:52:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:51:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:51:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6363: avoid cscope build races

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9007 )

Change subject: IMPALA-6363: avoid cscope build races
..


Patch Set 1:

Yeah, I figure we could go ahead with this and see if the problem stops 
reproducing.


--
To view, visit http://gerrit.cloudera.org:8080/9007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195
Gerrit-Change-Number: 9007
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:46:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6363: avoid cscope build races

2018-01-11 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9007 )

Change subject: IMPALA-6363: avoid cscope build races
..


Patch Set 1: Code-Review+1

It seems like this couldn't hurt even if IMPALA-6363 does not happen much. I 
had the luxury of checking a CentOS 5 host, and the -ignore_readdir_race flag 
does exist there.


--
To view, visit http://gerrit.cloudera.org:8080/9007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195
Gerrit-Change-Number: 9007
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:41:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6363: avoid cscope build races

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9007


Change subject: IMPALA-6363: avoid cscope build races
..

IMPALA-6363: avoid cscope build races

Use the -ignore_readdir_race flag for find so that find doesn't fail if
a directory disappears under it. From what I could tell the flag has
been in GNU find for a long time and is also available in other OS
flavours like BSD and OS X.

Make the step depend on gen-deps so that it can index thrift, protobuf,
etc, output.

Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195
---
M CMakeLists.txt
M bin/gen-cscope.sh
2 files changed, 6 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/9007/1
--
To view, visit http://gerrit.cloudera.org:8080/9007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I22bdb7c64036cb88a8a10907af35c5e3a55a9195
Gerrit-Change-Number: 9007
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

2018-01-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9006 )

Change subject: IMPALA-6268: 
KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt@310
PS2, Line 310: # Tests that run any security related tests need to link this in 
to override the
> This is pretty opaque. Could you provide a comment here with what's going o
It's pretty well explained in be/src/kudu/security/krb5_realm_override.cc, so I 
added a comment to refer that file.



--
To view, visit http://gerrit.cloudera.org:8080/9006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
Gerrit-Change-Number: 9006
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:29:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

2018-01-11 Thread Sailesh Mukil (Code Review)
Hello Philip Zeyliger,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9006

to look at the new patch set (#3).

Change subject: IMPALA-6268: 
KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
..

IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

On systems that have Kerberos 1.11 or earlier, service principals with
IP addresses are not supported due to a bug:

http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603

Since our BE tests use such principals, they fail on older platforms with the
above mentioned kerberos versions.

Kudu fixed this by adding a workaround which overrides krb5_realm_override.

https://github.com/cloudera/kudu/commit/ba2ae3de4a7c43ff2f5873e822410e066ea99667

However, when we moved Kudu's security library into Impala, we did not
add the appropriate build flags that allow it to be used. This patch fixes
that.

Testing: Verified that the failing test runs successfully on CentOs 6.4
with Kerberos 1.10.3

Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
---
M CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/rpc/rpc-mgr-test.cc
3 files changed, 9 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9006/3
--
To view, visit http://gerrit.cloudera.org:8080/9006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
Gerrit-Change-Number: 9006
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

2018-01-11 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9006 )

Change subject: IMPALA-6268: 
KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9006/2/CMakeLists.txt@310
PS2, Line 310: set(KRB5_REALM_OVERRIDE 
-Wl,--undefined=krb5_realm_override_loaded krb5_realm_override)
This is pretty opaque. Could you provide a comment here with what's going on?



--
To view, visit http://gerrit.cloudera.org:8080/9006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
Gerrit-Change-Number: 9006
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6383: free memory after skipping parquet row groups

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9002 )

Change subject: IMPALA-6383: free memory after skipping parquet row groups
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9002/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9002/1//COMMIT_MSG@13
PS1, Line 13: Testing:
My feeling is that this testing is enough for now. With the reservation changes 
forthcoming, the multiple-row-group tests failed hard when it tried to allocate 
new buffers for the next row group when the buffers for the previous row group 
were released.

We could add a mem_limit test tuned to reproduce it but those have a tendency 
to be flaky and it doesn't seem worth the effort when more robust test coverage 
will be forthcoming.


http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.h@647
PS1, Line 647:   /// were returned. and when returning the last batch.
> nit: remove "."?
Done


http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.cc@623
PS1, Line 623: // We skipped the previous row group. Release resources 
before starting another.
> This gets called for cases where no resources have been reserved (e.g. L655
I could also reorganise the code so that ReleaseSkippedRowGroupResources() gets 
called before "continue" so that it's more explicit. I was originally thinking 
that would be error-prone and easy to forget, but with the NumStreams() DCHECK 
maybe that won't be the case.


http://gerrit.cloudera.org:8080/#/c/9002/1/be/src/exec/hdfs-parquet-scanner.cc@738
PS1, Line 738: col_reader->Close(row_batch);
> nit: while you're here you could reduce this to a single line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/9002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b
Gerrit-Change-Number: 9002
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9000 )

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1708/


--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6383: free memory after skipping parquet row groups

2018-01-11 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9002

to look at the new patch set (#2).

Change subject: IMPALA-6383: free memory after skipping parquet row groups
..

IMPALA-6383: free memory after skipping parquet row groups

Before this patch, resources were only flushed after breaking out of
NextRowGroup(). This is a problem because resources can be allocated
for skipped row groups (e.g. for reading dictionaries).

Testing:
Tested in conjunction with a prototype buffer pool patch that was
DCHECKing before the change.

Added DCHECKs to the current version to ensure the streams are cleared
up as expected.

Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/scanner-context.h
3 files changed, 26 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/9002/2
--
To view, visit http://gerrit.cloudera.org:8080/9002
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b
Gerrit-Change-Number: 9002
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9000 )

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..


Patch Set 3: Code-Review+2

(1 comment)

Thanks Zoram and Tim. Carrying +2.

http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@116
PS2, Line 116: overridde
> nit: overriden
Done



--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:16:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis, Zoram Thanga, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9000

to look at the new patch set (#3).

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..

IMPALA-6384: RequestPoolService should honor custom group mapping config

Due to the way in which we instantiate fair scheduler allocation
loader, we donot read the config overrides from the HDFS config
files.

This is an unexpected behavior from users' POV since we typically
support overrides like custom user -> group mapping via HDFS
config (for ex: LDAPGroupsMapping) that eventually affects the
query -> pool assignment.

Fix: This patch loads the hadoop default configuration so that the
underlying QueuePlacementPolicy is based on user specified overrides.

Testing (manual): Changed the core-site.xml to use LDAPGroupsMapping
instead of the default ShellBasedUnixGroupsMapping and confirmed that
the correct group mapping plugin is loaded, by adding additional logging.

Also, modified TestRequestPoolService to assert that the core-site xml
overrides are loaded.

Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
---
M 
common/yarn-extras/src/main/java/org/apache/impala/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
M fe/src/main/java/org/apache/impala/util/RequestPoolService.java
M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
4 files changed, 32 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/9000/3
--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables

2018-01-11 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8986 )

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8986/1/docs/topics/impala_runtime_filtering.xml
File docs/topics/impala_runtime_filtering.xml:

http://gerrit.cloudera.org:8080/#/c/8986/1/docs/topics/impala_runtime_filtering.xml@173
PS1, Line 173:
 : For HD
> Done. Because this paragraph is followed by info that's only relevant for B
Actually, the partitioned/broadcast and local/global discussion applies to 
min-max filters as well.

I should also add that the long term plan is to have all filter types supported 
by all scan types, so no need to separate out min-max as being a really 
specifically Kudu thing (though of course it only applies to Kudu at the 
moment).


http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml
File docs/topics/impala_runtime_filtering.xml:

http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@181
PS2, Line 181: a complete list of relevant values
This is the only part I see that doesn't make sense for min-max filters, as 
they're not a 'list of values', but then a bloom filter isn't a 'list of 
values' either.

Maybe rephrase it something like "A broadcast filter reflects the complete set 
of relevant values and can be immediately evaluated..." and "A partitioned 
filter reflects only the values processed by one host..." or perhaps "contains" 
instead of reflects


http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@203
PS2, Line 203: These filters are used by Kudu to scan a range of values
 : for join columns when identifying matching rows within a 
join query.
I find this sentence confusing, as Kudu isn't identifying the matching rows 
(Kudu doesn't even know we're doing a join, its just scanning values for us)

Maybe say something like "These filters are passed to Kudu to reduce the number 
of rows returnrf to Impala when scanning the probe side of the join"



--
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:12:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for KRPC

2018-01-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8991 )

Change subject: IMPALA-5528: Upgrade GPerfTools to 2.6.3 and tune TCMalloc for 
KRPC
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be574435af51fb7a875b16888cca260b341190e
Gerrit-Change-Number: 8991
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:12:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9000 )

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..


Patch Set 2: Code-Review+2

(1 comment)

The test seems good since it's at least testing the root cause of the problem

http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/9000/2/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl@116
PS2, Line 116: overriden
nit: overriden



--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:06:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

2018-01-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/9006 )

Change subject: IMPALA-6268: 
KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
..

IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

On systems that have Kerberos 1.11 or earlier, service principals with
IP addresses are not supported due to a bug:

http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603

Since our BE tests use such principals, they fail on older platforms with the
above mentioned kerberos versions.

Kudu fixed this by adding a workaround which overrides krb5_realm_override.

https://github.com/cloudera/kudu/commit/ba2ae3de4a7c43ff2f5873e822410e066ea99667

However, when we moved Kudu's security library into Impala, we did not
add the appropriate build flags that allow it to be used. This patch fixes
that.

Testing: Verified that the failing test runs successfully on CentOs 6.4
with Kerberos 1.10.3

Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
---
M CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/rpc/rpc-mgr-test.cc
3 files changed, 6 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9006/2
--
To view, visit http://gerrit.cloudera.org:8080/9006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
Gerrit-Change-Number: 9006
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6384: RequestPoolService should honor custom group mapping config

2018-01-11 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9000 )

Change subject: IMPALA-6384: RequestPoolService should honor custom group 
mapping config
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb93870c0cc37e2432a643a274931f1d3d13fb96
Gerrit-Change-Number: 9000
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 18:59:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

2018-01-11 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8950

to look at the new patch set (#3).

Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
..

IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

In KrpcDataStreamMgr::CreateRecvr() we take the lock_ and
then call recvr->TakeOverEarlySender() for all contexts.
recvr->TakeOverEarlySender() then calls
recvr_->mgr_->EnqueueDeserializeTask((), which can block if the
deserialize pool queue is full. The next thread to become available
in that queue will also have to acquire lock_, thus leading to a
deadlock.

We fix this by moving the EarlySendersList out of the
EarlySendersMap and dropping the lock before taking any actions on
the RPC contexts in the EarlySendersList. All functions called after
dropping 'lock_' do not require the lock to protect them as they are
thread safe.

Additionally modified the BE test data-stream-test to work with KRPC
as well.

Testing: Added a new test to data-stream-test to verify that the
deadlock does not happen. Also, I verified that this test hangs
without the fix.

Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
---
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
2 files changed, 262 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8950/3
--
To view, visit http://gerrit.cloudera.org:8080/8950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
Gerrit-Change-Number: 8950
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

2018-01-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9006


Change subject: IMPALA-6268: 
KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
..

IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

On systems that have Kerberos 1.11 or earlier, service principals with
IP addresses are not supported due to a bug:

http://krbdev.mit.edu/rt/Ticket/Display.html?id=7603

Since our BE tests use such principals, they fail on older platforms with the
above mentioned kerberos versions.

Kudu fixed this by adding a workaround which overrides krb5_realm_override.

https://github.com/cloudera/kudu/commit/ba2ae3de4a7c43ff2f5873e822410e066ea99667

However, when we moved Kudu's security library into Impala, we did not
add the appropriate build flags that allow it to be used. This patch fixes
that.

Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
---
M CMakeLists.txt
M be/src/rpc/CMakeLists.txt
M be/src/rpc/rpc-mgr-test.cc
3 files changed, 6 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9006/1
--
To view, visit http://gerrit.cloudera.org:8080/9006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I60e291e8aa1b59b645b856d33c658471f314c221
Gerrit-Change-Number: 9006
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

2018-01-11 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8950

to look at the new patch set (#2).

Change subject: IMPALA-6346: Potential deadlock in KrpcDataStreamMgr
..

IMPALA-6346: Potential deadlock in KrpcDataStreamMgr

In KrpcDataStreamMgr::CreateRecvr() we take the lock_ and
then call recvr->TakeOverEarlySender() for all contexts.
recvr->TakeOverEarlySender() then calls
recvr_->mgr_->EnqueueDeserializeTask((), which can block if the
deserialize pool queue is full. The next thread to become available
in that queue will also have to acquire lock_, thus leading to a
deadlock.

We fix this by moving the EarlySendersList out of the
EarlySendersMap and dropping the lock before taking any actions on
the RPC contexts in the EarlySendersList. All functions called after
dropping 'lock_' do not require the lock to protect them as they are
thread safe.

Additionally modified the BE test data-stream-test to work with KRPC
as well.

Testing: Added a new test to data-stream-test to verify that the
deadlock does not happen. Also, I verified that this test hangs
without the fix.

Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
---
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
2 files changed, 263 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8950/2
--
To view, visit http://gerrit.cloudera.org:8080/8950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib7d1a8f12a4821092ca61ccc8a6f20c0404d56c7
Gerrit-Change-Number: 8950
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 12:

Release build failed with a link error because of missing includes of a 
.inline.h file.


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 18:24:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1707/


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 18:24:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 18:24:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time

2018-01-11 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Lars Volker, Zoram Thanga, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8814

to look at the new patch set (#13).

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time
..

IMPALA-6290: limit ScannerContext to 1 buffer at a time

This is a prerequisite for constraining the number of buffers per scan
range. Before this patch, calling ReadBytes(), SkipBytes(), etc could
cause an arbitrary number of I/O buffers to accumulate in
'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for
a range and then called ReadBytes(30MB), we would hit resource
exhaustion as soon as 3 buffers were accumulated in
'completed_io_buffers_'.

The fix is to avoid accumulating any buffers in 'completed_io_buffers_'.
Instead of adding them to 'completed_io_buffers_', completed buffers
are just returned to the I/O manager. It turned out that this did not
weaken the ScannerContext's guarantees about memory lifetime, because
ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each
time it was called regardless. I checked that this behaviour wasn't
a bug by inspecting the scanner code. I could not find any cases
where scanners depended on returned memory remaining valid beyond
the next Read*()/Get*()/Skip*() call on the stream.

This change makes that lifetime explicit in the comments. A
side-effect of this fix is that scanners do not need to call
ReleaseCompletedResources() in CommitRows() and means that the
ScannerContext only ever needs to hold one I/O buffer at a time.

This change also reimplements SkipBytes() to avoid it accumulating
memory in the boundary buffer for large skip sizes.

Also clarifies some of the invariants in ScannerContext. E.g. some
places assumed io_buffer_ != NULL, but that is no longer needed.

Testing:
Ran core tests with ASAN and exhaustive tests with DEBUG.

Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/io/disk-io-mgr.cc
M 
testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test
11 files changed, 321 insertions(+), 211 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/13
--
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command

2018-01-11 Thread Adam Holley (Code Review)
Adam Holley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" 
command
..


Patch Set 8:

(5 comments)

Changes applied.

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72
PS4, Line 72:   throw new AnalysisException("ALTER TABLE SET ROW FORMAT 
is only supported " +
> I feel pretty strongly that table-level delimiter changes should not be pre
Ok.  I think I understand now.  Done.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@77
PS4, Line 77:   HdfsFileFormat format = 
HdfsFileFormat.fromHdfsInputFormatClass(
> We should include info about the conflicting partition so the user can debu
Included partition name.  Is there some better data to return?


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@460
PS4, Line 460:   reloadFileMetadata = alterTableSetRowFormat(tbl,
> Thanks for investigating. I agree it's better to clean this up in a separat
I've opened JIRA https://issues.apache.org/jira/browse/IMPALA-6385.  Can you 
comment on that one as to which pattern should be used going forward?


http://gerrit.cloudera.org:8080/#/c/8928/7/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

http://gerrit.cloudera.org:8080/#/c/8928/7/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1404
PS7, Line 1404: # Test select after alter table to ensure no partition changes.
> typo: taqble
Done


http://gerrit.cloudera.org:8080/#/c/8928/7/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1410
PS7, Line 1410:  TYPES
> We should also validate that new partitions are created based on the new ta
Done



--
To view, visit http://gerrit.cloudera.org:8080/8928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Gerrit-Change-Number: 8928
Gerrit-PatchSet: 8
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Thu, 11 Jan 2018 18:22:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command

2018-01-11 Thread Adam Holley (Code Review)
Adam Holley has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" 
command
..

IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command

Examples of new command:
ALTER TABLE t1 SET ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002';
ALTER TABLE t1 SET ROW FORMAT DELIMITED LINES TERMINATED BY '\001';

Testing:
Added parser tests and unit tests for alter statements including
partition options.

Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java
M fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
8 files changed, 331 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/8928/8
--
To view, visit http://gerrit.cloudera.org:8080/8928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Gerrit-Change-Number: 8928
Gerrit-PatchSet: 8
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h
File be/src/exec/scalar-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@47
PS1, Line 47: child_row_idx_
This member is not used anywhere. Probably it should be removed.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 17:40:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@932
PS1, Line 932: 
The case when the subquery returns 0 row could be tested too.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:45:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746
PS1, Line 746: selectNode
scalarCheckNode?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:44:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h
File be/src/exec/scalar-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@44
PS1, Line 44:   boost::scoped_ptr child_row_batch_;
If I understand correctly, having child_row_batch_ as member is usually needed 
because the row_batch in GetNext() can reach its capacity, so the remaining 
fetched rows will be copied in the next GetNext() call(s). Is this possible 
here? That would mean that the GetNext() was called with an already full row 
batch.
If this is not possible, then this member could be replaced with a local 
variable in GetNext().


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56
PS1, Line 56:   if (child_row_batch_->num_rows() == 0) {
Is it possible/ok for this condition to be false? See my comment in 
scalar-check-node.h


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@63
PS1, Line 63: one
"zero or one" or "maximum one" would be more exact


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78
PS1, Line 78:   child_row_batch_->TransferResourceOwnership(row_batch);
Just an idea, not sure if a good one: it may be better to deep copy the row, 
which would make it possible to release the resources referenced by 
child_row_batch_ earlier.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@178
PS1, Line 178:   // Runtime check needed to check if subquery returns 
scalar
This is not just a check, but can be a transformation, as array types are 
replaced. This should be mentioned in the comment.
+nit: missing .


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@179
PS1, Line 179:   // Runtime check needed to check if subquery returns 
scalar
This is not just a check, but can be a transformation, as array types are 
replaced. This should be mentioned in the comment.
+nit: missing .


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@90
PS1, Line 90:   // If true, a ScalarCheckNode is needed to check this 
statement's result at runtime
nit: missing .


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java
File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java@32
PS1, Line 32: // LIMIT 2 is enough to check if the child returns a scalar 
value
: // Also, an optimized plan is generated with LIMIT 2
nit: 2 missing .



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:42:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Fix typo in test observability.

2018-01-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9003 )

Change subject: Fix typo in test_observability.
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1706/


--
To view, visit http://gerrit.cloudera.org:8080/9003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida3f355c6c6be5ed52e4d445f8f80665cdc8e2b8
Gerrit-Change-Number: 9003
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:27:05 +
Gerrit-HasComments: No


  1   2   >