[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-12 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 70: 
> We can make this more concise:
Done


Line 74: self.run_test_case('QueryTest/rows-read', vector)
> Might as well make this a separate test method, e.g. test_rows_read(). Or a
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool

2017-04-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool
..

IMPALA-5073: Part 1: add option to use mmap() for buffer pool

Support allocating with mmap instead of TCMalloc to give more control
over memory usage. Also tell Linux to back larger buffers with huge
pages when possible to reduce TLB pressure. The main complication is
that memory returned by mmap() is not necessarily aligned to a huge
page boundary, so we need to "fix up" the mapping ourselves.

Adds additional memory metrics, since we previously relied on the
assumption that all memory was allocated through TCMalloc.
memory.total-used tracks the total across the buffer pool and
TCMalloc. When the buffer pool is not present, they just report
the TCMalloc values.

This can be enabled with the --mmap_buffers flag. The transparent
huge pages support can be disabled with the --madvise_huge_pages
startup flag.

At some point this should become the default, but it requires
more work to validate perf and resource used (virtual address
space, etc).

Testing:
Added some unit tests to test edge cases and the different supported
flags. Many pre-existing tests also exercise the modified code.

Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
---
M be/src/catalog/catalogd-main.cc
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/runtime/bufferpool/system-allocator.h
M be/src/runtime/exec-env.cc
M be/src/statestore/statestored-main.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M be/src/util/metrics-test.cc
M be/src/util/metrics.h
M common/thrift/generate_error_codes.py
M common/thrift/metrics.json
15 files changed, 396 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/6474/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types

2017-04-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
..


Patch Set 4:

(19 comments)

Thank you for your comments. Please see my replies and PS4.

http://gerrit.cloudera.org:8080/#/c/6563/2//COMMIT_MSG
Commit Message:

Line 25: test using that file.
> We'll probably need to check in some files written with the old stats, righ
I created a file using Hive and added tests to read from it.


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 541: bool stats_read = false;
> bad variable name: this is a plain-encoded value. thrift_stats sounds like 
Done


Line 546: if (fn_name == "lt" || fn_name == "le") {
> why did you break this up instead of having ReadFromThrift do the extra wor
Reworked it so it resembles the old flow. The additional check whether to read 
deprecated stats is now inside ReadFromThrift.


Line 556: }
> explicit comparison
Done


Line 559:   }
> Are you going to fix that in the CR?
We don't handle unsigned columns in parquet files at all, so I'm currently 
leaning towards not handling unsigned stats, too. It looks like PR46 will not 
contain the ColumnOrder field after all. We should revisit this when we add 
support for unsigned logical types to the Parquet scanners.


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: /// This class, together with its derivatives, is used to compute 
column statistics when
> track is really not a meaningful term here, it generally just means 'follow
Done


Line 36: /// We currently support writing the 'min_value' and 'max_value' 
fields in
> hopefully also min and max, no? tracking means reading/decoding.
Done


Line 44: /// - Numeric values (BOOLEAN, INT, FLOAT, DOUBLE, DECIMAL) are 
ordered by their numeric
> Add a comment to describe ordering for strings, to be consistent with speci
Done


Line 46: ///
> why is that still not the case?
Done


Line 66:   virtual ~ColumnStatsBase() {}
> you changed the type and meaning of a parameter, but you didn't change the 
Done


Line 72:   static bool ReadFromThrift(const parquet::ColumnChunk& col_chunk,
> unclear what this does, because copies are usually returned or passed into 
Done


Line 146:   virtual int64_t BytesNeeded() const override;
> why can't this be collapsed into a 2-level hierarchy?
Done


Line 150:   /// Encodes a single value using parquet's PLAIN encoding and 
stores it into the
> I'm confused why we need a three-level hierarchy with additional template s
Done


Line 151:   /// binary string 'out'.
> protected follows public section
Done


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 34: inline void ColumnStats::Update(const T& v) {
> hard to compare, please move the functions back to where they were. feel fr
Done


Line 103: inline void ColumnStats::EncodeValueToString(
> Fixed?
Yes, this seems to be the way to go.


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

Line 243: string version_string = tokens[2];
> col_idx is unused.
We want to interpret the statistics based on the type of the column from our 
metadata, which cannot be inferred from the parquet column type alone. It seems 
safer to me to rely on our internal types here. Checking that the types found 
in the parquet file are compatible is done in parquet-metadata-utils.


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-metadata-utils.h
File be/src/exec/parquet-metadata-utils.h:

Line 60:   /// (..). Unspecified parts default to 0, and 
extra parts are
> this sounds like it's doing some reading.
Done


http://gerrit.cloudera.org:8080/#/c/6563/2/common/thrift/parquet.thrift
File common/thrift/parquet.thrift:

Line 344:   BROTLI = 4;
> do we return an error when we see that codec?
Yes, we would catch that in ParquetMetadataUtils::ValidateColumn() where we 
validate the CompressionCodec against a list of supported codecs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types

2017-04-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#4).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
..

IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for 
remaining types

This change adds functionality to write parquet::Statistics for Decimal,
String, and Timestamp values.

It also switches from using the deprecated fields 'min' and 'max' to
populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format PR change.

The HdfsParquetScanner will preferably read the new fields if they are
populated. For tables with only the old fields populated, it will read
them only if they are of simple numeric type, i.e. boolean, integer, or
floating point.

This change removes the comparison of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It exercises the fallback logic for supported in a
test using that file.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
14 files changed, 686 insertions(+), 187 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool

2017-04-12 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6474/9/be/src/runtime/bufferpool/system-allocator.cc
File be/src/runtime/bufferpool/system-allocator.cc:

PS9, Line 27: Impala will 
maybe delete that (to future proof in case this code is used in other 
applications).


Line 96: // code if we are compiling against an older kernel.
nit: the indentation got weird here


http://gerrit.cloudera.org:8080/#/c/6474/9/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

Line 161:   value_ = pool.peak_committed;
why format this one differently?


PS9, Line 174: allocated
system-allocated, or was there a reason you didn't rename here too?


-- 
To view, visit http://gerrit.cloudera.org:8080/6474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

2017-04-12 Thread Jim Apple (Code Review)
Jim Apple has submitted this change and it was merged.

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
site
..


IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Add Impala docs from branch master, commit hash
68f32e52bc42bef578330a4fe0edc5b292891eea.
This is the last commit made by JRussell in the
cleanup project. Removed both HTML files from
the shared folder (ImpalaVariables.html and
impala_common.html)

Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Reviewed-on: http://gerrit.cloudera.org:8080/6604
Reviewed-by: Michael Brown 
Reviewed-by: Jim Apple 
Tested-by: Jim Apple 
---
A docs/build/html/commonltr.css
A docs/build/html/commonrtl.css
A docs/build/html/images/impala_arch.jpeg
A docs/build/html/index.html
A docs/build/html/topics/impala_abort_on_default_limit_exceeded.html
A docs/build/html/topics/impala_abort_on_error.html
A docs/build/html/topics/impala_admin.html
A docs/build/html/topics/impala_admission.html
A docs/build/html/topics/impala_aggregate_functions.html
A docs/build/html/topics/impala_aliases.html
A docs/build/html/topics/impala_allow_unsupported_formats.html
A docs/build/html/topics/impala_alter_table.html
A docs/build/html/topics/impala_alter_view.html
A docs/build/html/topics/impala_analytic_functions.html
A docs/build/html/topics/impala_appx_count_distinct.html
A docs/build/html/topics/impala_appx_median.html
A docs/build/html/topics/impala_array.html
A docs/build/html/topics/impala_auditing.html
A docs/build/html/topics/impala_authentication.html
A docs/build/html/topics/impala_authorization.html
A docs/build/html/topics/impala_avg.html
A docs/build/html/topics/impala_avro.html
A docs/build/html/topics/impala_batch_size.html
A docs/build/html/topics/impala_bigint.html
A docs/build/html/topics/impala_bit_functions.html
A docs/build/html/topics/impala_boolean.html
A docs/build/html/topics/impala_breakpad.html
A docs/build/html/topics/impala_char.html
A docs/build/html/topics/impala_cluster_sizing.html
A docs/build/html/topics/impala_comments.html
A docs/build/html/topics/impala_complex_types.html
A docs/build/html/topics/impala_components.html
A docs/build/html/topics/impala_compression_codec.html
A docs/build/html/topics/impala_compute_stats.html
A docs/build/html/topics/impala_concepts.html
A docs/build/html/topics/impala_conditional_functions.html
A docs/build/html/topics/impala_config.html
A docs/build/html/topics/impala_config_options.html
A docs/build/html/topics/impala_config_performance.html
A docs/build/html/topics/impala_connecting.html
A docs/build/html/topics/impala_conversion_functions.html
A docs/build/html/topics/impala_count.html
A docs/build/html/topics/impala_create_database.html
A docs/build/html/topics/impala_create_function.html
A docs/build/html/topics/impala_create_role.html
A docs/build/html/topics/impala_create_table.html
A docs/build/html/topics/impala_create_view.html
A docs/build/html/topics/impala_databases.html
A docs/build/html/topics/impala_datatypes.html
A docs/build/html/topics/impala_datetime_functions.html
A docs/build/html/topics/impala_ddl.html
A docs/build/html/topics/impala_debug_action.html
A docs/build/html/topics/impala_decimal.html
A docs/build/html/topics/impala_default_order_by_limit.html
A docs/build/html/topics/impala_delegation.html
A docs/build/html/topics/impala_delete.html
A docs/build/html/topics/impala_describe.html
A docs/build/html/topics/impala_development.html
A docs/build/html/topics/impala_disable_codegen.html
A docs/build/html/topics/impala_disable_row_runtime_filtering.html
A docs/build/html/topics/impala_disable_streaming_preaggregations.html
A docs/build/html/topics/impala_disable_unsafe_spills.html
A docs/build/html/topics/impala_disk_space.html
A docs/build/html/topics/impala_distinct.html
A docs/build/html/topics/impala_dml.html
A docs/build/html/topics/impala_double.html
A docs/build/html/topics/impala_drop_database.html
A docs/build/html/topics/impala_drop_function.html
A docs/build/html/topics/impala_drop_role.html
A docs/build/html/topics/impala_drop_stats.html
A docs/build/html/topics/impala_drop_table.html
A docs/build/html/topics/impala_drop_view.html
A docs/build/html/topics/impala_exec_single_node_rows_threshold.html
A docs/build/html/topics/impala_explain.html
A docs/build/html/topics/impala_explain_level.html
A docs/build/html/topics/impala_explain_plan.html
A docs/build/html/topics/impala_faq.html
A docs/build/html/topics/impala_file_formats.html
A docs/build/html/topics/impala_fixed_issues.html
A docs/build/html/topics/impala_float.html
A docs/build/html/topics/impala_functions.html
A docs/build/html/topics/impala_functions_overview.html
A docs/build/html/topics/impala_grant.html
A docs/build/html/topics/impala_group_by.html
A docs/build/html/topics/impala_group_concat.html
A 

[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

2017-04-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
site
..


Patch Set 3: Code-Review+2 Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

2017-04-12 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
site
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool

2017-04-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool
..

IMPALA-5073: Part 1: add option to use mmap() for buffer pool

Support allocating with mmap instead of TCMalloc to give more control
over memory usage. Also tell Linux to back larger buffers with huge
pages when possible to reduce TLB pressure. The main complication is
that memory returned by mmap() is not necessarily aligned to a huge
page boundary, so we need to "fix up" the mapping ourselves.

Adds additional memory metrics, since we previously relied on the
assumption that all memory was allocated through TCMalloc.
memory.total-used tracks the total across the buffer pool and
TCMalloc. When the buffer pool is not present, they just report
the TCMalloc values.

This can be enabled with the --mmap_buffers flag. The transparent
huge pages support can be disabled with the --madvise_huge_pages
startup flag.

At some point this should become the default, but it requires
more work to validate perf and resource used (virtual address
space, etc).

Testing:
Added some unit tests to test edge cases and the different supported
flags. Many pre-existing tests also exercise the modified code.

Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
---
M be/src/catalog/catalogd-main.cc
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/runtime/bufferpool/system-allocator.h
M be/src/runtime/exec-env.cc
M be/src/statestore/statestored-main.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M be/src/util/metrics-test.cc
M be/src/util/metrics.h
M common/thrift/generate_error_codes.py
M common/thrift/metrics.json
15 files changed, 398 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/6474/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool

2017-04-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6474/8/be/src/runtime/bufferpool/system-allocator.cc
File be/src/runtime/bufferpool/system-allocator.cc:

Line 54: RETURN_IF_ERROR(AllocateViaMMap(len, _mem));
I added an option to enable/disable this path and disabled it by default - I 
don't have enough confidence right now that bypassing TCMalloc will have no 
undesirable side-effects. Getting this in will allow us to do experiments and 
defer the decision to turn it on.


http://gerrit.cloudera.org:8080/#/c/6474/8/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

PS8, Line 161: TOTAL_ALLOCATED
> how about SYSTEM_ALLOCATED?
Done


PS8, Line 165: TOTAL_RESERVED
> maybe just RESERVED, but I don't feel too strongly.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.

2017-04-12 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
resources.
..


Patch Set 2:

> (1 comment)

ok--I'll repush

-- 
To view, visit http://gerrit.cloudera.org:8080/6604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.

2017-04-12 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
resources.
..


Patch Set 2:

(1 comment)

Laurel, HTML and PDF look good, but see comment.

http://gerrit.cloudera.org:8080/#/c/6604/2//COMMIT_MSG
Commit Message:

PS2, Line 7: IMPALA-4181 [DOCS] Publish rendered Impala
   : documentation to ASF resources.
The commit header should only be 1 line. This is so commands like "git log 
--oneline" can identify commits easily and without clutter. That means the 
header shouldn't take up multiple lines or be too long.

You're really close to a good commit message.

What about something like this for the header?

  IMPALA-4181: [DOCS] Publish rendered Impala documentation to ASF site

That way the header is 1 line, and short (notice it stays to the left of the 
orangey vertical dotted line when you view the commit message in Gerrit).

The rest of the commit message is fine.


-- 
To view, visit http://gerrit.cloudera.org:8080/6604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types

2017-04-12 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#3).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
..

IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for 
remaining types

This change adds functionality to write parquet::Statistics for Decimal,
String, and Timestamp values.

It also switches from using the deprecated fields 'min' and 'max' to
populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format PR change.

The HdfsParquetScanner will preferably read the new fields if they are
populated. For tables with only the old fields populated, it will read
them only if they are of simple numeric type, i.e. boolean, integer, or
floating point.

This change removes the comparison of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It exercises the fallback logic for supported in a
test using that file.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_parquet_stats.py
14 files changed, 676 insertions(+), 186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.

2017-04-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
resources.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6604/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4181 [DOCS] Publish rendered Impala
> Please use the following format for git commit messages
The first paragraph is always only one line long


-- 
To view, visit http://gerrit.cloudera.org:8080/6604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Laurel Hale 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.

2017-04-12 Thread Laurel Hale (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6604

to look at the new patch set (#2).

Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF 
resources.
..

IMPALA-4181 [DOCS] Publish rendered Impala
documentation to ASF resources.

Add Impala docs from branch master, commit hash
68f32e52bc42bef578330a4fe0edc5b292891eea.
This is the last commit made by JRussell in the
cleanup project. Removed both HTML files from
the shared folder (ImpalaVariables.html and
impala_common.html)

Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
---
A docs/build/html/commonltr.css
A docs/build/html/commonrtl.css
A docs/build/html/images/impala_arch.jpeg
A docs/build/html/index.html
A docs/build/html/topics/impala_abort_on_default_limit_exceeded.html
A docs/build/html/topics/impala_abort_on_error.html
A docs/build/html/topics/impala_admin.html
A docs/build/html/topics/impala_admission.html
A docs/build/html/topics/impala_aggregate_functions.html
A docs/build/html/topics/impala_aliases.html
A docs/build/html/topics/impala_allow_unsupported_formats.html
A docs/build/html/topics/impala_alter_table.html
A docs/build/html/topics/impala_alter_view.html
A docs/build/html/topics/impala_analytic_functions.html
A docs/build/html/topics/impala_appx_count_distinct.html
A docs/build/html/topics/impala_appx_median.html
A docs/build/html/topics/impala_array.html
A docs/build/html/topics/impala_auditing.html
A docs/build/html/topics/impala_authentication.html
A docs/build/html/topics/impala_authorization.html
A docs/build/html/topics/impala_avg.html
A docs/build/html/topics/impala_avro.html
A docs/build/html/topics/impala_batch_size.html
A docs/build/html/topics/impala_bigint.html
A docs/build/html/topics/impala_bit_functions.html
A docs/build/html/topics/impala_boolean.html
A docs/build/html/topics/impala_breakpad.html
A docs/build/html/topics/impala_char.html
A docs/build/html/topics/impala_cluster_sizing.html
A docs/build/html/topics/impala_comments.html
A docs/build/html/topics/impala_complex_types.html
A docs/build/html/topics/impala_components.html
A docs/build/html/topics/impala_compression_codec.html
A docs/build/html/topics/impala_compute_stats.html
A docs/build/html/topics/impala_concepts.html
A docs/build/html/topics/impala_conditional_functions.html
A docs/build/html/topics/impala_config.html
A docs/build/html/topics/impala_config_options.html
A docs/build/html/topics/impala_config_performance.html
A docs/build/html/topics/impala_connecting.html
A docs/build/html/topics/impala_conversion_functions.html
A docs/build/html/topics/impala_count.html
A docs/build/html/topics/impala_create_database.html
A docs/build/html/topics/impala_create_function.html
A docs/build/html/topics/impala_create_role.html
A docs/build/html/topics/impala_create_table.html
A docs/build/html/topics/impala_create_view.html
A docs/build/html/topics/impala_databases.html
A docs/build/html/topics/impala_datatypes.html
A docs/build/html/topics/impala_datetime_functions.html
A docs/build/html/topics/impala_ddl.html
A docs/build/html/topics/impala_debug_action.html
A docs/build/html/topics/impala_decimal.html
A docs/build/html/topics/impala_default_order_by_limit.html
A docs/build/html/topics/impala_delegation.html
A docs/build/html/topics/impala_delete.html
A docs/build/html/topics/impala_describe.html
A docs/build/html/topics/impala_development.html
A docs/build/html/topics/impala_disable_codegen.html
A docs/build/html/topics/impala_disable_row_runtime_filtering.html
A docs/build/html/topics/impala_disable_streaming_preaggregations.html
A docs/build/html/topics/impala_disable_unsafe_spills.html
A docs/build/html/topics/impala_disk_space.html
A docs/build/html/topics/impala_distinct.html
A docs/build/html/topics/impala_dml.html
A docs/build/html/topics/impala_double.html
A docs/build/html/topics/impala_drop_database.html
A docs/build/html/topics/impala_drop_function.html
A docs/build/html/topics/impala_drop_role.html
A docs/build/html/topics/impala_drop_stats.html
A docs/build/html/topics/impala_drop_table.html
A docs/build/html/topics/impala_drop_view.html
A docs/build/html/topics/impala_exec_single_node_rows_threshold.html
A docs/build/html/topics/impala_explain.html
A docs/build/html/topics/impala_explain_level.html
A docs/build/html/topics/impala_explain_plan.html
A docs/build/html/topics/impala_faq.html
A docs/build/html/topics/impala_file_formats.html
A docs/build/html/topics/impala_fixed_issues.html
A docs/build/html/topics/impala_float.html
A docs/build/html/topics/impala_functions.html
A docs/build/html/topics/impala_functions_overview.html
A docs/build/html/topics/impala_grant.html
A docs/build/html/topics/impala_group_by.html
A docs/build/html/topics/impala_group_concat.html
A docs/build/html/topics/impala_hadoop.html
A docs/build/html/topics/impala_having.html
A 

[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 4:

(2 comments)

Only minor style comments remaining.

http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 70: if new_vector.get_value('table_format').file_format == 'kudu':
We can make this more concise:
  in ('kudu, 'hbase')


Line 74: self.run_test_case('QueryTest/rows-read', new_vector)
Might as well make this a separate test method, e.g. test_rows_read(). Or 
actually that seems very specific. Maybe rename to something like 
test_hdfs_scanner_profile/hdfs-scanner-profile.


-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-12 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 665 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-12 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 113:   "Invalid time zone in the '%s' table property: %s",
> double 'the'
Done


http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 665: 
> Let's move all the CREATE/ALTER tests into a separate TestParquetMrInt96Wri
Done


Line 1882: "'  '", "URI path cannot be empty.");
> easier to read single quotes
Done


Line 1904:   type == PrimitiveType.VARCHAR) {
> easier to read single quotes
Done


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

Line 105: parquet_path = get_fs_path(
> What does "fn" stand for? I'm thinking "file name", but this is not just a 
Renamed it to 'parquet_path'


Line 123:   ON i.id = h.id AND i.day = h.day  -- serves as a unique key
> easier to read with the alias 'i' next to the table
Done


Line 125:   (h.timestamp_col IS NULL) != (i.timestamp_col IS NULL)
> simplify the first two conditions with:
Done. Had to put round brackets around col IS NULL expressions to avoid 
analysis errors.


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py
File tests/query_test/test_parquet_timestamp_compatibility.py:

Line 78:   def test_invalid_parquet_mr_write_zone(self, vector, 
unique_database):
> test_invalid_parquet_mr_write_zone
Done


Line 118:   # tz_name conversion on the timestamp values.
> extra space after "triggers"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-04-12 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New tables created by Impala will set the table property to UTC if the
global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is
set to true.

Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not
set the table property unless the global flag is set to true.

Tables created using CREATE TABLE LIKE  will copy the
property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
26 files changed, 665 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

2017-04-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
..


Patch Set 3:

(8 comments)

Fix looks much better

http://gerrit.cloudera.org:8080/#/c/6575/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5180: Don't balk at non-deterministic exprs
Msg is cryptic. Can you summarize what is changed in this patch, as opposed to 
what bug/behavior is fixed?


Line 9: HDFS partition pruning can run afoul of our logic of considering
Not sure what "run afoul" means, please be specific.


http://gerrit.cloudera.org:8080/#/c/6575/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1029:   public boolean isBoundByTupleIds(List tids) {
What about this? Our isBound() family of functions should be consistent.


Line 1037:* Returns true if expr is fully bound by slotIds and contains at 
least
Not your change, but can you modify the comment to explain what "bound" means? 
The concept has caused a lot of confusion in the past especially to newcomers.
Also, please split up the comment into more sentences.


Line 1045: HashSet idSet = Sets.newHashSet();
idSet -> exprSlotIds


Line 1046: int members = 0;
Why is this needed? Do you mean 
Preconditions.checkState(!exprSlotIds.isEmpty())?


Line 1047: this.getIdsHelper(null, idSet);
remove 'this'


http://gerrit.cloudera.org:8080/#/c/6575/3/testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
File testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test:

Line 1026: select count(*) from
keep this query as simple as possible, i.e., same as above


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes