[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 10: Code-Review+2

PS10 is a rebase. Carrying Marcel's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/295/

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 9:

PS9 addresses a clang-tidy error.

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Impala Public Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6032

to look at the new patch set (#9).

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..

IMPALA-2328: Read support for min/max Parquet statistics

This change adds support for skipping row groups based on Parquet row
group statistics. With this change we only support reading statistics
from Parquet files for numerical types (bool, integer, floating point)
and for simple predicates of the formsor
  , where  is LT, LE, GE, GT, and EQ.

Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
A be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exprs/expr.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
32 files changed, 979 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch references to Cloudera and CDH in Impala tutorial

There was one tutorial that actually ran under the
'cloudera' user and so repeated that name over and
over in directory and HDFS paths. I switched that to
'username'.

I suppressed some  and  tags with Cloudera
Manager-specific details. Will physically remove those
from the source in a subsequent iteration.

I left several instances of audience="Cloudera" because
those will be changed to audience="hidden" as part of
a separate change request.

I marked with rev="upstream" some  tags
containing impala-shell banners with a Cloudera copyright
statement. Will decide on a convention to handle those
(elide those lines, or use a conref to consistently
substitute the generic equivalent) and do that in a
followup patch set.

Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Reviewed-on: http://gerrit.cloudera.org:8080/5663
Reviewed-by: Laurel Hale 
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_tutorial.xml
2 files changed, 27 insertions(+), 27 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved
  Laurel Hale: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 8: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/294/

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/46/

-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
..


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 581:   /* Nested types are not supported yet */
nit: use // style comment
Is there a JIRA to add support for nested types?


Line 661: if (enc_stat.encoding != parquet::Encoding::PLAIN_DICTIONARY) 
{
Can the count ever be zero? The parquet.thrift file seems to allow it.


Line 735: if (dict_filter_it == scanner_dict_filter_map_.end()) 
DCHECK(false);
You could use DCHECK(a != b)


http://gerrit.cloudera.org:8080/#/c/5904/5/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 439:   /// These are pointers into elements of column_readers_
> You can use a ScoperBuffer here, which allows the memory to be tracked.
Did this slip through the cracks?


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

PS9, Line 435: into
s/into/to ?


Line 446:   Tuple* dict_filter_tuple_;
I'd initialize this when you need it, since it is just the result of a cast and 
doesn't take up space, but this way adds to the list of fields (and size of the 
header).


PS9, Line 618: (
Why the (?


Line 623:   /// Returns true if the column chunk is 100% dictionary encoded
Can you make this more concrete? For example "if all pages of the column 
chunk...".


PS9, Line 630: row_group_eliminated
Isn't the caller responsible for eliminating the row group? "_eliminated" 
sounds like as a side effect the elimination will be done.


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 254:   map dict_encoding_stats_;
why not unordered_map? Also we I think we prefer int over int32_t


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS9, Line 499: NeedsCoversion
NeedsConversion


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS9, Line 427: hext_header_size
*next_header_size


http://gerrit.cloudera.org:8080/#/c/5904/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 229:* TODO: It should be possible to use this in isConstantImpl.
You could pass the function name into isDeterministic and then re-use this in 
isConstantImpl to resolve the todo


-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Three misc webpage changes

2017-02-22 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: Three misc webpage changes
..


Patch Set 1:

Thanks for the reviews!

> seems fine but I don't really know jquery

If it makes it easier, this is basically C&P code - see e.g. 
https://github.com/apache/incubator-impala/blob/master/www/sessions.tmpl for 
another example of a sortable table already in master.

> Is there a plan to migrate other HTML tables to DataTables? (E.g., Queries, 
> Query Locations, ...)

No particular plans - it's so easy to do I think anyone can do it when they see 
a need.

-- 
To view, visit http://gerrit.cloudera.org:8080/6045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-02-22 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add .pep8rc for Impala's Python style
..


Patch Set 1:

Any news with this?

-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic

2017-02-22 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be 
generic
..


Patch Set 7: Code-Review+1

I'm still getting some errors when it builds. Looks like there are some broken 
links to impala_faq.xml, but your changes are being picked up in the build, so 
rendered docs look okay.

-- 
To view, visit http://gerrit.cloudera.org:8080/5962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-02-22 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-02-22 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch Set 3: Code-Review+1

Looks good.

-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic

2017-02-22 Thread Ambreen Kazi (Code Review)
Ambreen Kazi has posted comments on this change.

Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be 
generic
..


Patch Set 5:

(8 comments)

Addressed all of John's comments.

http://gerrit.cloudera.org:8080/#/c/5962/5/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS5, Line 566: ,
> Remove comma.
Done


http://gerrit.cloudera.org:8080/#/c/5962/5/docs/topics/impala_delegation.xml
File docs/topics/impala_delegation.xml:

PS5, Line 79: 
> Is there a generic alternative place to link? If not, my impulse is to leav
I couldn't find a suitable upstream alternative. Left this in place.


PS5, Line 100: 
> Could genericize this to "consult the documentation for the ODBC driver you
Done


http://gerrit.cloudera.org:8080/#/c/5962/5/docs/topics/impala_kerberos.xml
File docs/topics/impala_kerberos.xml:

PS5, Line 47:  With the  in place, this can be shortened to
Done


PS5, Line 62: 
> Take out any spaces or tabs that are left behind in new blank lines.
Done


PS5, Line 144:  This can also be shortened to an  tag, referencing the ke
Done


PS5, Line 178:  Not critical in this case, but for future reference. If any of the list ite
Will do in the future. This one I've left as is.


http://gerrit.cloudera.org:8080/#/c/5962/5/docs/topics/impala_ldap.xml
File docs/topics/impala_ldap.xml:

PS5, Line 164: on the command line
> Take out that phrase, because often the daemons are started by a script and
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/294/

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic

2017-02-22 Thread Ambreen Kazi (Code Review)
Hello Laurel Hale,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5962

to look at the new patch set (#7).

Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be 
generic
..

IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic

This is part 2 of the work being done to genericize the Impala security
topics. All references to Cloudera have been either marked 'hidden'
or replaced with links to the relevant open-source docs.

Note:
-Links to the standalone Cloudera ODBC driver doc have not been
removed.
-External links to the MIT Kerberos docs and Hadoop security
docs were added to impala_keydefs.

Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_delegation.xml
M docs/topics/impala_kerberos.xml
M docs/topics/impala_ldap.xml
5 files changed, 60 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/5962/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] [DOCS] Release note updates for Impala 2.8

2017-02-22 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Release note updates for Impala 2.8
..


Patch Set 9: Code-Review+2

As the RM for 2.8, I'm +2ing this. We can fix nits in follow-ups, but this will 
block 2.9 release notes work otherwise

-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic

2017-02-22 Thread Ambreen Kazi (Code Review)
Hello Laurel Hale,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5962

to look at the new patch set (#6).

Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be 
generic
..

IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic

This is part 2 of the work being done to genericize the Impala security
topics. All references to Cloudera have been either marked 'hidden'
or replaced with links to the relevant open-source docs.

Note:
-Links to the standalone Cloudera ODBC driver doc have not been
removed.
-External links to the MIT Kerberos docs and Hadoop security
docs were added to impala_keydefs.

Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_delegation.xml
M docs/topics/impala_kerberos.xml
M docs/topics/impala_ldap.xml
5 files changed, 61 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/5962/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-3403: [DOCS] Pare back irrelevant installation info

2017-02-22 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3403: [DOCS] Pare back irrelevant installation info
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6005/1/docs/topics/impala_install.xml
File docs/topics/impala_install.xml:

Line 66:   
> The README.md file:
Please make the README.md description note that this is where you can find a 
pointer to the build instructions, not just "special notices". I believe the 
wiki page is already bit-rotted. The IPMC also has asked for us to describe a 
pointer to the build instructions in the README.


-- 
To view, visit http://gerrit.cloudera.org:8080/6005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6025/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 126: // TODO: this file should be cross compiled and then all of the 
builtin
is this done?


Line 961: struct ReservoirSampleState {
i'd say this really turned into a class


Line 967:   // resize occurs, this needs to be updated from the outside.
what does 'from the outside' mean?


Line 977:   ReservoirSampleState(int init_capacity) :
use standard formatting


Line 1016: // The array of ReservoirSamples starts right after 
ReservoirSampleState, so we use
that's often done by putting an array of size 1 at the end of the header struct:

ReservoirSample samples[1];

and then you can do things like state.samples[5] = x;

it's convenient and makes it explicit that you have a trailing var-len array.


Line 1025:   int64_t GetNext64(int64_t max) {
while you're at it, this deserves a comment


Line 1033:   // Given a buffer that contains a ReservoirSampleState, resize the 
buffer so that it's
its


Line 1040:   if (new_capacity * 2 >= MAX_CAPACITY) new_capacity = MAX_CAPACITY;
if state->capacity is 10 and max_capacity is 40, this line sets new_capacity to 
40.


Line 1062:   // If the array gets filled due to updates or merges, we 
reallocate a larger buffer to
you should put this (= a brief description of what you're doing) somewhere at 
the beginning of the reservoir sample-related code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 4:

(1 comment)

Thanks for the review. I addressed the remaining comments in PS8. There's the 
question who calls ColumnStatsBase::ReadFromThrift, let me know if you want me 
to elaborate on it in the comment.

http://gerrit.cloudera.org:8080/#/c/6032/7/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 55:   }
> i thought the stats predicates are visible in extended?
Yes, done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6032

to look at the new patch set (#8).

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..

IMPALA-2328: Read support for min/max Parquet statistics

This change adds support for skipping row groups based on Parquet row
group statistics. With this change we only support reading statistics
from Parquet files for numerical types (bool, integer, floating point)
and for simple predicates of the formsor
  , where  is LT, LE, GE, GT, and EQ.

Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
A be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exprs/expr.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
32 files changed, 986 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering

2017-02-22 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
..


Patch Set 8:

(31 comments)

Addressed the review comments. I also added a query option 
"parquet_dictionary_filtering" which defaults to true.

http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 515: RETURN_IF_ERROR(InitColumns(row_group_idx_, column_readers_));
> good point. InitColumns() needs to be split up so that the construction of 
I changed this so that we call InitColumns for only the dictionary filtering 
columns before doing dictionary filtering. This means that the reads for the 
other columns will only occur if the row group passes. 

To do better than this (i.e. to only read the dictionary), we will need to do 
what Marcel suggests and rework the code more substantially.


Line 532: RETURN_IF_ERROR(InitDictionaries(column_readers_));
> the control flow is hard to follow because it the top-level functions don't
Done


Line 607: // values are not validated, so skip these datatypes for now.
> also leave todo, this can be done during dictionary construction.
Done


Line 615:   Status HdfsParquetScanner::EvalDictionaryFilters(int row_group_idx,
> indentation
Done


Line 628:   for (ParquetColumnReader* col_reader : dict_filter_column_readers_) 
{
> there is a lot of code here that doesn't evaluate a dictionary. restructure
Done


Line 637: vector& slot_conjunct_ctxs = 
slot_conjunct_it->second;
> move to where variables are used.
Done


Line 701: // here or in ReadDataPage but never both.
> is that still true? i thought you expect the first page to be dictionary-en
Good point, corrected the comment.

The dictionary page offset is used to set the start of the stream. The 
dictionary page offset is required to be less than the data page offset, so the 
stream needs to start at dictionary page offset.


Line 710: // TODO: handle the 4 in a more elegant manner (make it a 
named constant)
> resolve todo (don't leave todos that take very little time to resolve).
Done


Line 723: column_passes_filter = true;
> 'passes filter' is ambiguous (does this mean the predicate is true or false
Done


http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 613:   void InitDictionaryFilters();
> not intuitively named, i thought this had something to do with dictionary c
Changed the name


http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 244:   unordered_set column_encodings_;
> missing comments
Done


Line 525: dict_encoding_stats_[dict_header.encoding]++;
> ++...
Done


Line 611:   
column_encodings_.insert(header.data_page_header.definition_level_encoding);
> is this useful/required? the def/rep level encodings don't vary.
Changed this to do the def/rep level encodings once at the start.


Line 1042: for (Encoding::type encoding : columns_[i]->column_encodings_) {
> you use columns_[i] a lot, create a variable
Done


http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS8, Line 462:  NeedsConversionInline
> can this be another template parameter?
The callers of ReadSlot are on the fast path as well, so those locations would 
need to know what value to pass into ReadSlot and would need the inlined 
version.


Line 764:   // end of the stream.
> superfluous comment, the function comment in the header file already makes 
Done


Line 804:   new_buffer_size, &buffer, &new_buffer_size, &status, /* peek */ 
true);
> indentation
Done


Line 811: if (buffer_size == new_buffer_size) {
> what does this mean?
Added a comment


Line 991: return Status("Column chunk should not contain two dictionary 
pages.");
> make this an actionable error message (ie, include the file name).
Done


http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 431:   /// an indication of whether a dictionary is present.
> so this is true if no dictionary exists?
Removed this


http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/util/dict-test.cc
File be/src/util/dict-test.cc:

Line 55:   for (T i: values) {
> do a similar loop for GetValue()
Done


http://gerrit.cloudera.org:8080/#/c/5904/8/common/thrift/parquet.thrift
File common/thrift/parquet.thrift:

Line 1: /**
> is this the unmodified parquet.thrift? (and it comes without a format versi
Yes, this is the file I found on github. I fixed some whitespace, but there is 
no version information.


http://gerrit.cloudera.org:8080/#/c/5904/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 190: 
> 

[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering

2017-02-22 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#9).

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
..

IMPALA-4624: Implement Parquet dictionary filtering

Here is a basic summary of the changes:
Frontend looks for conjuncts that operate on a single slot and pass a
map from slot id to the conjunct index through thrift to the backend.
The conjunct indices are the incides into the normal PlanNode conjuncts list.
The conjuncts need to satisfy certain conditions:
1. They are bound on a single slot
2. They are deterministic (no random functions)
3. They evaluate to FALSE on a NULL input. This is because the dictionary
does not include NULLs, so any condition that evaluates to TRUE on NULL
cannot be evaluated by looking only at the dictionary.

The backend converts the indices into ExprContexts. These are cloned in
the scanner threads.

The dictionary read codepath has been removed from ReadDataPage into its
own function, InitDictionary. This has also been turned into its own step
in row group initialization. ReadDataPage will not see any dictionary
pages unless the parquet file is invalid.

For dictionary filtering, we initialize dictionaries only as needed to evaluate
the conjuncts. The Parquet scanner evaluates the dictionary filter conjuncts on 
the
dictionary to see if any dictionary entry passes. If no entry passes, the row
group is eliminated. If the row group passes the dictionary filtering, then we
initialize all remaining dictionaries.

Since column chunks can have a mixture of encodings, dictionary filtering
uses three tests to determine whether this is purely dictionary encoded:
1. If the encoding_stats is in the parquet file, then use it to determine if
there are only dictionary encoded pages (i.e. there are no data pages with
an encoding other than PLAIN_DICTIONARY).
-OR-
2. If the encoding stats are not present, then look at the encodings. The column
is purely dictionary encoded if:
a) PLAIN_DICTIONARY is present
AND
b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed
-OR-
3. If this file was written by an older version of Impala, then we know that
dictionary failover happens when the dictionary reaches 40,000 values.
Dictionary filtering can proceed as long as the dictionary is smaller than
that.

parquet-mr writes the encoding list correctly in the current version in our
environment (1.5.0). This means that check #2 works on some existing files
(potentially most existing parquet-mr files).
parquet-mr writes the encoding stats starting in 1.9.0. This is the version
where check #1 will start working.

Impala's parquet writer now implements both, so either check above will work.

Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/parquet.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
M tests/query_test/test_scanners.py
24 files changed, 1,143 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/5904/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: /// writing parquet files. It provides an interface to populate a 
parquet::Statistics
> i think you can leave the getters out for now, since we won't be calling th
Done


http://gerrit.cloudera.org:8080/#/c/6032/7/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 68:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats,
> who calls this?
This is called by the parquet scanners. It's implemented by dispatching on 
'col_type' and calling ReadFromThrift() in the subclass for the correct C++ 
type T. Should this be explained in the comment?


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 7: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: /// writing parquet files. It provides an interface to populate a 
parquet::Statistics
> The other two values are i64 null_count and i64 distinct_count, for which I
i think you can leave the getters out for now, since we won't be calling them, 
but for the sake of completeness we should add the other two enums.


http://gerrit.cloudera.org:8080/#/c/6032/7/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 68:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats,
who calls this?


http://gerrit.cloudera.org:8080/#/c/6032/7/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 55: options.setExplain_level(TExplainLevel.VERBOSE);
i thought the stats predicates are visible in extended?


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3403: [DOCS] Pare back irrelevant installation info

2017-02-22 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3403: [DOCS] Pare back irrelevant installation info
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6005/1/docs/topics/impala_install.xml
File docs/topics/impala_install.xml:

Line 48:   Follow these steps to set up Impala on a cluster by building 
from source:
> This is odd wording. I think it might be better to say that this guide has 
Done


Line 66:   
> I think the README.md is the best place for build instructions, since that 
The README.md file:

https://github.com/apache/incubator-impala/blob/master/README.md

currently is pretty slim pickings. It just says Impala requires Linux (no 
distro names or numbers) and for building, it sends the user elsewhere:

"See bin/bootstrap_build.sh."

Why don't I leave the link to the wiki for the time being, and add a reference 
to the README as well. We can reconsider taking out the wiki link if we later 
beef up the README file.


-- 
To view, visit http://gerrit.cloudera.org:8080/6005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3403: [DOCS] Pare back irrelevant installation info

2017-02-22 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-3403: [DOCS] Pare back irrelevant installation info
..

IMPALA-3403: [DOCS] Pare back irrelevant installation info

Take out the path A / path B pages with CM and non-CM installation
techniques for packages and parcels.

Replace with steps for building from source, adapted from
http://impala.apache.org/downloads.html

Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c
---
M docs/impala.ditamap
M docs/topics/impala_install.xml
2 files changed, 40 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/6005/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


IMPALA-4959: Avoid picking up the system's boost cmake module

In some systems with an old boost installed system-wide the impala build
would fail with something like:

CMake Error at /usr/lib64/boost/BoostConfig.cmake:64 (get_target_property):
  get_target_property() called with non-existent target
  "boost_thread-shared".
Call Stack (most recent call first):
  toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 
(find_package)
  CMakeLists.txt:116 (find_package)

CMake Error at /usr/lib64/boost/BoostConfig.cmake:72 (get_target_property):
  get_target_property() called with non-existent target
  "boost_thread-shared-debug".
Call Stack (most recent call first):
  toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 
(find_package)
  CMakeLists.txt:116 (find_package)

This because, if it exists, cmake's FindBoost.cmake will look for and
use that module, even though boost's cmake build hasn't been maintained
in years and the impala build is actually configured to not use the
systems boost.

This patch sets the cmake flag Boost_NO_BOOST_CMAKE to ON, making sure
the old cmake module is not picked up.

Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Reviewed-on: http://gerrit.cloudera.org:8080/5994
Reviewed-by: Alex Behm 
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M CMakeLists.txt
1 file changed, 4 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved
  Alex Behm: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3411 [DOCS] Rework Impala governance topics to be 
generic.
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3411 [DOCS] Rework Impala governance topics to be 
generic.
..


IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.

This set of edits removes references and links to Cloudera Navigator
and Cloudera Manager from the auditing and lineage topics. Those
were either marked as 'hidden' or replaced with a generic suggestion
to use cluster management software with a focus on governance.

Some paragraphs with overflowing lines were also fixed.

Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782
Reviewed-on: http://gerrit.cloudera.org:8080/5957
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_auditing.xml
M docs/topics/impala_lineage.xml
2 files changed, 57 insertions(+), 35 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
..


IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

Rewrote sections to eliminate 'Cloudera Manager' from topics.
Look for subsequent phases to remove remaining instances of CM.

Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Reviewed-on: http://gerrit.cloudera.org:8080/6049
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M docs/shared/impala_common.xml
M docs/topics/impala_config_options.xml
M docs/topics/impala_upgrading.xml
3 files changed, 6 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3411 [DOCS] Rework Impala governance topics to be 
generic.
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/45/

-- 
To view, visit http://gerrit.cloudera.org:8080/5957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Ambreen Kazi 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/43/

-- 
To view, visit http://gerrit.cloudera.org:8080/6049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/44/

-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-02-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 7:

(7 comments)

I'm pretty happy with this change

http://gerrit.cloudera.org:8080/#/c/5816/8/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 67:   // Which children can be passed through, without being materialized.
duplicate 'without'


http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 225:   // Indicates whether passthrough should be disabled in union nodes.
enabled


http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

PS8, Line 255: d
enabled


http://gerrit.cloudera.org:8080/#/c/5816/8/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 267:   // nullability can be considered to be part of the physical row 
layout.
remove (this was moved inside computePassThrough


http://gerrit.cloudera.org:8080/#/c/5816/8/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1059: select bigint_col from functional.alltypestiny where bigint_col > 0
use unqualified table names everywhere (and fix the one above while you are 
here), so we get coverage over all file formats


Line 1124:  QUERY
Move this test into nested-types-subplan.test, otherwise this test will fail on 
the legacy joins/agg build since nested types are not supported there 
(nested-types-subplan.test will be skipped)


Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 22: Code-Review+2

Michael, any more comments?

-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

2017-02-22 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
..

IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

Implements HdfsLzoTextTextScanner::GetNext() and changes
ProcessSplit() to repeatedly call GetNext() to share the core
scanning code between the legacy ProcessSplit() interface
(ProcessSpit()) and the new GetNext() interface.

These changes were tricky:
- The scanner used to rely on the ability to attach a batch
  to the row-batch queue for freeing resources
- This patch attempts to preserve the resource-freeing behavior
  by clearing resources as soon as they are complete
- In particular, the scanner attempts to skip corrupt/invalid
  data blocks, and we should avoid accumulating memory
  unnecessarily

The other changes are mostly straightforward:
- Add a RowBatch parameter to various functions
- Add a MemPool parameter to various functions for attaching
  memory of completed resources that may still be references
  by returned batches
- Change Close() to free all resources when a nullptr
  RowBatch is passed

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
15 files changed, 292 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/6000/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

2017-02-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
..


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 64:   batch_start_ptr_(NULL),
> use nullptr throughout
Done


Line 273:   MemPool* pool = row_batch->tuple_data_pool();
> please consider getting rid of this variable and referring to the rowbatch 
Done


Line 440:   eos_ = true;
> advance scan_state_?
Done. I think it makes sense to not allow GetNext() once eos() is true because 
there does not
seem to be a real use case for allowing that, and disallowing it helps simplify 
the logic in GetNext().
Imo, we should have that contract for ExecNode::GetNext() as well, but that's 
another story.


Line 445: << "FE should have generated SNAPPY_BLOCKED instead.";
> why not check this when creating the stream?
Good idea. Done.


Line 446: 
RETURN_IF_ERROR(UpdateDecompressor(stream_->file_desc()->file_compression));
> why is this needed, given that we just started (ie, why not move it into in
Moved into InitNewRange()


Line 459: eos_ = true;
> scan_state_ transition?
Done


Line 469: Status HdfsTextScanner::CommitRows(RowBatch* row_batch, int num_rows) 
{
> how about coalescing that with hdfsscannode::commitrows? at the very least 
The main problem is that the HdfsScanner::CommitRows() may add the batch to the 
row-batch queue and allocate a new batch. But for the new scanners we want to 
manually add batches to the queue after getting another batch with GetNext().

I merged this CommitRows() with HdfsScanner::CommitRows() and added a flag to 
control the queue-adding behavior. It's slightly awkward, let me know if you 
have a better idea.


Line 485: ExprContext::FreeLocalAllocations(entry.second);
> why call this here?
Added a comment. Did you have place in mind? We can probably move it somewhere 
else, but not sure what side-effects it may have.


http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 119:   /// regardless if whether it passed the conjuncts.
> regardless of whether
Done


Line 124:   /// Only valid to call in scan state FIRST_TUPLE_FOUND or 
PAST_SCAN_RANGE.
> thanks for introducing that state variable, it's much clearer than a bunch 
Thanks!


Line 135:   Status CommitRows(RowBatch* row_batch, int num_rows);
> in/out params go last
Done


Line 140:   /// otherwise it will just read num_bytes. If we are reading 
compressed text, num_bytes
> if we are reading compressed text, should this even be called (or why is th
Added comment.

This functions internally handles uncompressed text, streaming compressed text 
(e.g., gzip), and non-streaming compressed text. See 
FillByteBufferCompressedFile() and FillByteBufferCompressedStream().

In addition, this function is overridden by the LZO scanner to reuse much of 
the scanning logic in here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function

2017-02-22 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#3).

Change subject: IMPALA-4848: Add WIDHT_BUCKET() function
..

IMPALA-4848: Add WIDHT_BUCKET() function

Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
---
M be/src/exprs/expr-test.cc
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/math-functions.h
M common/function-registry/impala_functions.py
4 files changed, 74 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 20s660ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  187.474us  187.474us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   15.238us   15.238us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s958ms3s749ms3   13.08 MB  
 10.00 MB
00:UNION   3  211.510ms  224.667ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s637ms1s734ms   28.80M  28.80M  528.68 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS31s697ms1s708ms   28.80M  28.80M  528.48 MB  
  1.8

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 20s660ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  187.474us  187.474us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   15.238us   15.238us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s958ms3s749ms3   13.08 MB  
 10.00 MB
00:UNION   3  211.510ms  224.667ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s637ms1s734ms   28.80M  28.80M  528.68 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS31s697ms1s708ms   28.80M  28.80M  528.48 MB  
  1.8

[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..


IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

Almost all of the original Impala FAQ material was
Cloudera-themed or commercially oriented. Lots of
answers about the QuickStart VM, Cloudera discussion
forums, CDH-based recommendations, etc. IMO it is
not worth trying to adapt each FAQ entry to be generic.
Better to start over from the ground up.

Phase 1 of making an Apache-friendly FAQ is to strip
the original page "down to the studs" so new FAQ
entries can be added with more of a developer theme,
based on questions people have in the community.

Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Reviewed-on: http://gerrit.cloudera.org:8080/6003
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_faq.xml
1 file changed, 6 insertions(+), 1,852 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1496:   100ll * 100ll * 100ll, 38, 38}}}
Missing comma - got hit by a bad merge.  expr-test was the only thing rebuilt 
so there wasn't much in the way of errors to spot.


http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
> maybe add a comment for future readers of this code:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Zach Amsden (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5951

to look at the new patch set (#22).

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..

IMPALA-2020, 4915, 4936: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

This change also fixes two open bugs regarding overflow checking.
The root cause of both behaviors turned out to be the same - a
missing std:: caused the wrong abs() function to be used.  Due
to details of IEEE floating point representation, this actually
masked another bug, as NaN is often represented as all 1-bits,
which fails the overflow test.  Since the implicit conversion to
int lost precision, we ended up storing large numbers that don't
actually represent valid decimal numbers in the range when the
value happened to be +/- Infinity.  This caused the rendering
back to ASCII characters to go awry, but is otherwise harmless.

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

Output comparisons, before | after
+--+
| cast(0.5 as int) |
+--+
| 0|  1|
+--+

+-+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+-+
| 0.5  | 0.6  |
+-+

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem;
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293784575265|
+--+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

DECIMAL_V2 set to 1
+--+
| sum(cast(l_extendedprice as bigint)) |
+--+
| 2293814088985|
+--+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.63s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) in 0.72s

New version, decimal_v2=0
+-+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-+
| 2293813121802.09|
+-+
Fetched 1 row(s) in 0.64s
+--+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--+
| 2293813156773.3596978911 |
+--+
Fetched 1 row(s) i

[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/42/

-- 
To view, visit http://gerrit.cloudera.org:8080/6003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 20s660ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  187.474us  187.474us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   15.238us   15.238us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s958ms3s749ms3   13.08 MB  
 10.00 MB
00:UNION   3  211.510ms  224.667ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s637ms1s734ms   28.80M  28.80M  528.68 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS31s697ms1s708ms   28.80M  28.80M  528.48 MB  
  1.8

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 7:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 145:   DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc()));
> This should be IsPrefixOf() because we sanity checking the row descriptors 
Done


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 115:   // passthrough case, the child can be closed right away because the 
row batch received
> the child can be closed right away -> the child was already closed?
Done


Line 116:   // from the child is copied (more details below).
> accuracy: the row batch wasn't copied
Done


Line 121:   if (child_idx_ < children_.size() && 
isChildPassThrough(child_idx_)) {
> Suggest comment:
Done


Line 122: // If the rows from the current child can be passed through, the 
physical row layout
> This comment doesn't seem to add anything, I suggest removing it.
Replaced this with your suggestion.


Line 131: // It may be possible that the row batch is not empty, so we save 
the previous value.
> More details would be helpful. If the batch has rows at this point, I'd ima
Added a dcheck instead. Time made this suggestion in patch 4.


Line 148:   // 'needs_deep_copy' lets us safely close the child in the next 
GetNext() call.
> style: 'needs_deep_copy' is not a visible variable here, suggest just sayin
Done


Line 154: 
> DCHECK that child_idx_ is not passthrough here
Done


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 67:   // Which children can be passed through, without being materialized.
> ... without evaluating and materializing their exprs.
Done


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 412:   /// Return true if the physical layout of this descriptor matches 
the physical layout
> matches that of other_desc
Done


Line 413:   /// of other_desc, but not necessarily ids.
> bot not necessarily the id.
Done


Line 565:   /// of other_desc, but not necessarily ids.
> but not necessarily the ids
Done


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h
File be/src/service/query-options.h:

Line 38:   TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\
> I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a lit
We have both positive and negative like DISABLE_CODEGEN and 
ENABLE_EXPR_REWRITES. I agree that ENABLE is simpler and easier to think about. 
(We should rename all DISABLE options.)


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 227:   public boolean isEquivalent(SlotDescriptor other) {
> Unfortunately, the term 'equivalent' already has a different meaning in the
Done


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:

Line 616:   public List getUnionNodeResultExprs() {
> getUnionResultExprs()
Done


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 47:  * a child has an identical tuple layout as the output of the union 
node, the
> ... as the output of the union node, and the child only has naked SlotRefs 
Done


Line 57:   protected final List resultExprs_;
> unionResultExprs_ to make distinguish it from the resultExprLists_ and such
Done


Line 73:   // If false, no batches from child nodes would be passed through in 
the backend.
> Comment should describe what this flag is. Also you mean "true" right?
Done


Line 76:   // Indicates for which child nodes batches can be passed through in 
the backend.
> remove "in the backend" (it's clear that execution happens there)
Done


Line 81:   protected UnionNode(PlanNodeId id, TupleId tupleId) {
> Is this c'tor still needed? If not, remove.
Yes, it's used if we are creating a constant node. (with no children)


Line 89:   List resultExprs, boolean isInSubplan) {
> indentation, unionResultExprs
Done


Line 182:* Compute whether we can pass through rows without materializing 
for the given child.
> Can combine/simplify like this:
Done. I don't think it's necessary to describe the passthrough conditions here. 
The implementation makes it clear.


Line 189:   Analyzer analyzer, List childTupleIds, List 
childExprList) {
> childExprList -> childResultExprs
Done


Line 190: if (analyzer.getQueryOptions().isDisable_union_passthrough()) 
return false;
> seems clearer to move this into init() and not 

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 20s660ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  187.474us  187.474us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   15.238us   15.238us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s958ms3s749ms3   13.08 MB  
 10.00 MB
00:UNION   3  211.510ms  224.667ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s637ms1s734ms   28.80M  28.80M  528.68 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS31s697ms1s708ms   28.80M  28.80M  528.48 MB  
  1.8

[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2
..


IMPALA-4962: Fix SHOW COLUMN STATS for HS2

Impala incorrectly returned NULLs in the "Max Size" column of the SHOW
COLUMN STATS result when executed through the HS2 interface. The issue
was that the column was specified to be type INT in the result schema,
but the actual type of the contents that we inserted into it was
"long". The reason why this is not an issue in Impala shell is because
we stringify the contents without inspecting the metadata for beeswax
results.

The issue was fixed by changing the type from INT to BIGINT.

Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Reviewed-on: http://gerrit.cloudera.org:8080/6109
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test
M tests/hs2/test_fetch.py
12 files changed, 49 insertions(+), 48 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..


Patch Set 6:

(14 comments)

Thanks for the review. Please see PS7.

http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 382:   /// Min/max statistics contexts, owned by instances of this class.
> they are not owned by the class instances themselves, they're allocated in 
Done


Line 386:   /// 'min_max_conjunct_ctxs_', which are used when evaluating 
parquet::Statistics.
> somewhat incomprehensible comment.
Done


Line 476:   Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, 
bool* skip_row_group);
> you're not evaluating stats, you're evaluating predicates on the stats
Done


http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: enum class StatsField { MIN, MAX };
> move into ColumnStatsBase so it's clear what stats this is referring to.
The other two values are i64 null_count and i64 distinct_count, for which I 
think we should have proper getters, i.e. we don't need to consider their type 
and method of decoding when reading them. Should we prefer an enum-based getter 
for those over ColumnStatsBase::ReadNullCountFromThrift() and 
ReadDistinctCountFromThrift()?


Line 114:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats, 
const StatsField& stats_field,
> move function bodies into parquet-column-stats-inl.h
Done


http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 297: collectionConjuncts_.clear();
> true, but those should be identical, given that we rewrite constants as lit
Constant rewriting is controlled by the enable_expr_rewrites query option, 
changing the code to filter on literals and turning the query option off 
disables statistics for predicates like "a < 3 - 3". There is a test for this 
in parquet_stats.test:L206 that would fail. Let me know if you prefer to switch 
to literals.


http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 85:  * conjuncts, that are passed to the backend and will be evaluated 
against the
> "conjuncts that"
Done


Line 154:   private List minMaxPlanConjuncts_ = Lists.newArrayList();
> "plan conjuncts" doesn't really mean anything. original conjuncts?
Done


Line 302:* Builds a predicate to evaluate parquet::Statistics by copying 
'inputSlot' into
> "to evaluate against"
Done


Line 347:   // Only constant exprs can be evaluated against the 
parquet::Statistics. This
> drop "the"
Done


Line 349:   if (!constExpr.isConstant()) continue;
> as pointed out before, you shouldn't see non-literal constants here
See my reply in PS4, constant folding depends on the enable_expr_rewrites query 
option.


http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 1047:runtime filters: RF000 -> c_custkey
> extended is fine.
Done


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 1: # Test HDFS scan node.
> run this file in extended mode so you see the stats predicates.
Done


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 206: # Test that without expr rewrite and thus without constant folding, 
constant exprs still
> what's the point of those tests?
See my comment regarding literal vs. constant exprs in PS4. These tests make 
sure that we don't break support for constant exprs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics

2017-02-22 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#7).

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
..

IMPALA-2328: Read support for min/max Parquet statistics

This change adds support for skipping row groups based on Parquet row
group statistics. With this change we only support reading statistics
from Parquet files for numerical types (bool, integer, floating point)
and for simple predicates of the formsor
  , where  is LT, LE, GE, GT, and EQ.

Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
A be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exprs/expr.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
32 files changed, 1,121 insertions(+), 283 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4057 and IMPALA-4050 Support starting webserver specified by hostname or "127.0.0.1"

2017-02-22 Thread Jim Apple (Code Review)
Jim Apple has abandoned this change.

Change subject: IMPALA-4057 and IMPALA-4050 Support starting webserver 
specified by hostname or "127.0.0.1"
..


Abandoned

No contact from author in over 2 months

-- 
To view, visit http://gerrit.cloudera.org:8080/4553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I54280a25c3709e2f316a17a6baf33dbbbae720c0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: hewenting 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: hewenting 


[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function

2017-02-22 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4848: Add WIDHT_BUCKET() function
..


Patch Set 2:

(7 comments)

Please review this change after  I push a new patch which addresses some 
overflow bugs.

http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 427: const IntVal& num_buckets) {
> do all null checks in one if
Done


Line 429: return IntVal::null();
> use min_range.val directly because the == operator will check for null agai
Done


Line 430:   }
> Lower...
Done


Line 434:   }
> should -> must
Done


http://gerrit.cloudera.org:8080/#/c/6023/2/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 455:   double bucket_width = (max_range.val - min_range.val) / 
num_buckets.val;
This can overflow. Working on handling this.
select width_bucket(1.5e+200, -1.7e+308, 1.2e+308, 900);


Line 457:   int64_t result = static_cast((expr.val - min_range.val) / 
bucket_width) + 1;
This can overflow. Working on handling this.


http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions.h
File be/src/exprs/math-functions.h:

Line 128:   /// Returns true if no parse_res == PARSE_SUCCESS || parse_res == 
PARSE_OVERFLOW.
> Move in the public section like the other functions.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-22 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


Patch Set 3:

yup thanks for pinging about that, I just submitted it for gvm

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 8: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/293/

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module

2017-02-22 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module
..


Patch Set 3:

mj, alex: what's missing here? gvm run? not sure that'll do much

-- 
To view, visit http://gerrit.cloudera.org:8080/5994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 8: Code-Review+1

(2 comments)

Thanks! Let's see if Alex has any more comments.

http://gerrit.cloudera.org:8080/#/c/6025/8//COMMIT_MSG
Commit Message:

PS8, Line 22: No new tests were added,
can update this now


http://gerrit.cloudera.org:8080/#/c/6025/8/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

PS8, Line 1163: mediam
spelling


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M testdata/workloads/functional-query/queries/

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#8).

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..

IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Before this change, ReservoirSample functions (such as APPX_MEDIAN())
allocated memory for 20,000 elements up front per grouping key. This
caused inefficient memory usage for aggregations with many grouping
keys.

This patch fixes this by initially allocating memory for 16 elements.
Once the buffer becomes full, we reallocate a new buffer with double
capacity and copy the original buffer into the new one. We continue
doubling the buffer size until the buffer has room for 20,000 elements
as before.

Testing:
Ran BE and some relevant EE tests locally.
No new tests were added, the existing tests should provide enough
coverage.

Performance benchmark:
I ran a performance benchmark locally on release build. The following
query results in 3,000 grouping keys and about 30,000 values per key:
SELECT MAX(a) from (
  SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t

Before: 11.57s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
-
06:AGGREGATE1   96.086us   96.086us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   26.629us   26.629us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   68.187us   87.887us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE32s851ms5s362ms   3.00K  -1   1.95 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  119.540ms  220.191ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE35s876ms6s254ms   9.00K  -1   2.93 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  127.834ms  146.842ms  98.30M  -1  19.80 MB   
32.00 MB  tpcds_10_parquet.benchmark

After:  13.58s
Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. 
Peak Mem  Detail
---
06:AGGREGATE1  101.101us  101.101us   1   1  28.00 KB   
 -1.00 B  FINALIZE
05:EXCHANGE 1   32.296us   32.296us   3   1 0   
 -1.00 B  UNPARTITIONED
02:AGGREGATE3   83.284us  120.137us   3   1  44.00 KB   
10.00 MB
04:AGGREGATE33s190ms6s555ms   3.00K  -1   1.96 GB  
128.00 MB  FINALIZE
03:EXCHANGE 3  247.897ms  497.280ms   9.00K  -1 0   
   0  HASH(c1)
01:AGGREGATE37s370ms8s460ms   9.00K  -1   4.71 GB  
128.00 MB  STREAMING
00:SCAN HDFS3  111.721ms  122.306ms  98.30M  -1  19.94 MB   
32.00 MB  tpcds_10_parquet.benchmark

Memory benchmark:
The following query was used to verify that this patch reduces memory
usage:
SELECT APPX_MEDIAN(ss_sold_date_sk)
FROM tpcds.store_sales
GROUP BY ss_customer_sk;

Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB.

Summary before:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 15.856ms5.856ms   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE33s721ms3s789ms   14.82K  15.21K   4.94 GB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  139.276ms  157.753ms   15.60K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE32s851ms3s026ms   15.60K  15.21K   5.29 GB  
 10.00 MB  STREAMING
00:SCAN HDFS3   24.245ms   35.727ms  183.59K 183.59K   4.60 MB  
384.00 MB  tpcds.store_sales

Summary after:
Operator   #Hosts   Avg Time   Max Time#Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail

04:EXCHANGE 1  288.125us  288.125us   14.82K  15.21K 0  
  -1.00 B  UNPARTITIONED
03:AGGREGATE39.358ms   10.982ms   14.82K  15.21K  19.82 MB  
 10.00 MB  FINALIZE
02:EXCHANGE 3  129.832us  154.953us   15.62K  15.21K 0  
0  HASH(ss_customer_sk)
01:AGGREGATE3   11.086ms   13.102ms   15.62K  15.21K   9.49 MB  
 10.00 MB  STREAMING
00:SCAN HDFS3   40.154ms   50.220ms  183.59K 183.59K   2.94 MB  
384.00 MB  tpcds.store_sales

Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
M testdata/workloads/functional-query/queries/

[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 958: ReservoirSampleState
> Because this is not a vector or a list, it's state for reservoir sampling w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Three misc webpage changes

2017-02-22 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: Three misc webpage changes
..


Patch Set 1:

The changes here seem reasonable. Is there a plan to migrate other HTML tables 
to DataTables? (E.g., Queries, Query Locations, ...)

-- 
To view, visit http://gerrit.cloudera.org:8080/6045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2

2017-02-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/292/

-- 
To view, visit http://gerrit.cloudera.org:8080/6109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
> std::abs() is not defined for int128_t
maybe add a comment for future readers of this code:
// std::abs() is not defined for int128_t.


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
> why not std:abs() here?
std::abs() is not defined for int128_t


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Three misc webpage changes

2017-02-22 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Three misc webpage changes
..


Patch Set 1: Code-Review+1

seems fine but I don't really know jquery

-- 
To view, visit http://gerrit.cloudera.org:8080/6045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2

2017-02-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6109/1/tests/hs2/test_fetch.py
File tests/hs2/test_fetch.py:

Line 179: assert re.match(r"id, INT, -?\d+, -?\d+, -?\d+, 4.0", result) is 
not None
haha nice


-- 
To view, visit http://gerrit.cloudera.org:8080/6109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2

2017-02-22 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6109

Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2
..

IMPALA-4962: Fix SHOW COLUMN STATS for HS2

Impala incorrectly returned NULLs in the "Max Size" column of the SHOW
COLUMN STATS result when executed through the HS2 interface. The issue
was that the column was specified to be type INT in the result schema,
but the actual type of the contents that we inserted into it was
"long". The reason why this is not an issue in Impala shell is because
we stringify the contents without inspecting the metadata for beeswax
results.

The issue was fixed by changing the type from INT to BIGINT.

Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
---
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test
M tests/hs2/test_fetch.py
12 files changed, 49 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6109/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] Three misc webpage changes

2017-02-22 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: Three misc webpage changes
..


Patch Set 1:

Anyone want to take a quick look at this?

-- 
To view, visit http://gerrit.cloudera.org:8080/6045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5951/19/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1621:   { "cast(555 as decimal(38,0))",
> could you add a comment saying how many fives that is. I counted 38, so why
oops, I counted wrong -- i think there are 39 5's here?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts

2017-02-22 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts
..


Patch Set 21: Code-Review+2

(2 comments)

Michael, can you take a final pass as well?

http://gerrit.cloudera.org:8080/#/c/5951/19/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1621:  { false, -1000, 5, 0 }}},
could you add a comment saying how many fives that is. I counted 38, so why 
does this overflow?


http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS21, Line 110: abs
why not std:abs() here?


-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage

2017-02-22 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 958: ReservoirSampleState
> Why do you think a vector-like interface is not good? It seems simple and i
Because this is not a vector or a list, it's state for reservoir sampling which 
contains number of samples, the source size, a RNG, *and* a list of samples. 
Putting methods on this that make it look like a vector is confusing. This 
isn't even a struct anymore IMO.


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 7 of "Cloudera Manager" removal

2017-02-22 Thread Laurel Hale (Code Review)
Laurel Hale has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6108

Change subject: IMPALA-3401 [DOCS] Phase 7 of "Cloudera Manager" removal
..

IMPALA-3401 [DOCS] Phase 7 of "Cloudera Manager" removal

Removed the topic "impala_howto_rm.xml" from the build
because it was focused on resource management by using
Cloudera Manager. Commented it out of impala.ditamap,
impala_keydefs.ditamap, and from a reference to it
in shared/impala_common.xml. This removes it
completely from the rendered upstream docs. A later
project will remove the source XML.

Change-Id: I26fc661144ba906828d5f6b5b2ea2eca02693369
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
3 files changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6108/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I26fc661144ba906828d5f6b5b2ea2eca02693369
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laurel Hale 


[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer

2017-02-22 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6107

Change subject: IMPALA-3079: Fix sequence file writer
..

IMPALA-3079: Fix sequence file writer

Before the fix, sequence file writer produced corrupt files in some
cases. Steps to reproduce:
  SET ALLOW_UNSUPPORTED_FORMATS=1;

  create table store_sales_seq_snap like tpcds_parquet.store_sales
  stored as SEQUENCEFILE;

  insert into store_sales_seq_snap partition(ss_sold_date_sk)
  select * from tpcds_parquet.store_sales
  where ss_sold_date_sk between 2450816 and 2451200;

The insert statement produces a corrupt file that cannot be read back.

The fix corrects the implementation of zero-compressed encoding in
ReadWriteUtil class. It also fixes the calculation of block sizes in
SnappyBlockCompressor::ProcessBlock().

Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
---
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/compress.cc
M be/src/util/decompress-test.cc
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M tests/query_test/test_compressed_formats.py
6 files changed, 119 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function

2017-02-22 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-4848: Add WIDHT_BUCKET() function
..

IMPALA-4848: Add WIDHT_BUCKET() function

Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
---
M be/src/exprs/expr-test.cc
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/math-functions.h
M common/function-registry/impala_functions.py
4 files changed, 61 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: anujphadke