[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 70: > We can make this more concise: Done Line 74: self.run_test_case('QueryTest/rows-read', vector) > Might as well make this a separate test method, e.g. test_rows_read(). Or a Done -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool
Tim Armstrong has uploaded a new patch set (#10). Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool .. IMPALA-5073: Part 1: add option to use mmap() for buffer pool Support allocating with mmap instead of TCMalloc to give more control over memory usage. Also tell Linux to back larger buffers with huge pages when possible to reduce TLB pressure. The main complication is that memory returned by mmap() is not necessarily aligned to a huge page boundary, so we need to "fix up" the mapping ourselves. Adds additional memory metrics, since we previously relied on the assumption that all memory was allocated through TCMalloc. memory.total-used tracks the total across the buffer pool and TCMalloc. When the buffer pool is not present, they just report the TCMalloc values. This can be enabled with the --mmap_buffers flag. The transparent huge pages support can be disabled with the --madvise_huge_pages startup flag. At some point this should become the default, but it requires more work to validate perf and resource used (virtual address space, etc). Testing: Added some unit tests to test edge cases and the different supported flags. Many pre-existing tests also exercise the modified code. Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 --- M be/src/catalog/catalogd-main.cc M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/system-allocator.cc M be/src/runtime/bufferpool/system-allocator.h M be/src/runtime/exec-env.cc M be/src/statestore/statestored-main.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M be/src/util/metrics-test.cc M be/src/util/metrics.h M common/thrift/generate_error_codes.py M common/thrift/metrics.json 15 files changed, 396 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/6474/10 -- To view, visit http://gerrit.cloudera.org:8080/6474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Lars Volker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types .. Patch Set 4: (19 comments) Thank you for your comments. Please see my replies and PS4. http://gerrit.cloudera.org:8080/#/c/6563/2//COMMIT_MSG Commit Message: Line 25: test using that file. > We'll probably need to check in some files written with the old stats, righ I created a file using Hive and added tests to read from it. http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 541: bool stats_read = false; > bad variable name: this is a plain-encoded value. thrift_stats sounds like Done Line 546: if (fn_name == "lt" || fn_name == "le") { > why did you break this up instead of having ReadFromThrift do the extra wor Reworked it so it resembles the old flow. The additional check whether to read deprecated stats is now inside ReadFromThrift. Line 556: } > explicit comparison Done Line 559: } > Are you going to fix that in the CR? We don't handle unsigned columns in parquet files at all, so I'm currently leaning towards not handling unsigned stats, too. It looks like PR46 will not contain the ColumnOrder field after all. We should revisit this when we add support for unsigned logical types to the Parquet scanners. http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 31: /// This class, together with its derivatives, is used to compute column statistics when > track is really not a meaningful term here, it generally just means 'follow Done Line 36: /// We currently support writing the 'min_value' and 'max_value' fields in > hopefully also min and max, no? tracking means reading/decoding. Done Line 44: /// - Numeric values (BOOLEAN, INT, FLOAT, DOUBLE, DECIMAL) are ordered by their numeric > Add a comment to describe ordering for strings, to be consistent with speci Done Line 46: /// > why is that still not the case? Done Line 66: virtual ~ColumnStatsBase() {} > you changed the type and meaning of a parameter, but you didn't change the Done Line 72: static bool ReadFromThrift(const parquet::ColumnChunk& col_chunk, > unclear what this does, because copies are usually returned or passed into Done Line 146: virtual int64_t BytesNeeded() const override; > why can't this be collapsed into a 2-level hierarchy? Done Line 150: /// Encodes a single value using parquet's PLAIN encoding and stores it into the > I'm confused why we need a three-level hierarchy with additional template s Done Line 151: /// binary string 'out'. > protected follows public section Done http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-column-stats.inline.h File be/src/exec/parquet-column-stats.inline.h: Line 34: inline void ColumnStats::Update(const T& v) { > hard to compare, please move the functions back to where they were. feel fr Done Line 103: inline void ColumnStats::EncodeValueToString( > Fixed? Yes, this seems to be the way to go. http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: Line 243: string version_string = tokens[2]; > col_idx is unused. We want to interpret the statistics based on the type of the column from our metadata, which cannot be inferred from the parquet column type alone. It seems safer to me to rely on our internal types here. Checking that the types found in the parquet file are compatible is done in parquet-metadata-utils. http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-metadata-utils.h File be/src/exec/parquet-metadata-utils.h: Line 60: /// (..). Unspecified parts default to 0, and extra parts are > this sounds like it's doing some reading. Done http://gerrit.cloudera.org:8080/#/c/6563/2/common/thrift/parquet.thrift File common/thrift/parquet.thrift: Line 344: BROTLI = 4; > do we return an error when we see that codec? Yes, we would catch that in ParquetMetadataUtils::ValidateColumn() where we validate the CompressionCodec against a list of supported codecs. -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Lars Volker has uploaded a new patch set (#4). Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types .. IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types This change adds functionality to write parquet::Statistics for Decimal, String, and Timestamp values. It also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format PR change. The HdfsParquetScanner will preferably read the new fields if they are populated. For tables with only the old fields populated, it will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the comparison of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It exercises the fallback logic for supported in a test using that file. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M common/thrift/parquet.thrift M testdata/data/README A testdata/data/deprecated_statistics.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py M tests/query_test/test_parquet_stats.py 14 files changed, 686 insertions(+), 187 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/4 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool
Dan Hecht has posted comments on this change. Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/6474/9/be/src/runtime/bufferpool/system-allocator.cc File be/src/runtime/bufferpool/system-allocator.cc: PS9, Line 27: Impala will maybe delete that (to future proof in case this code is used in other applications). Line 96: // code if we are compiling against an older kernel. nit: the indentation got weird here http://gerrit.cloudera.org:8080/#/c/6474/9/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: Line 161: value_ = pool.peak_committed; why format this one differently? PS9, Line 174: allocated system-allocated, or was there a reason you didn't rename here too? -- To view, visit http://gerrit.cloudera.org:8080/6474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site
Jim Apple has submitted this change and it was merged. Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site .. IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site Add Impala docs from branch master, commit hash 68f32e52bc42bef578330a4fe0edc5b292891eea. This is the last commit made by JRussell in the cleanup project. Removed both HTML files from the shared folder (ImpalaVariables.html and impala_common.html) Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead Reviewed-on: http://gerrit.cloudera.org:8080/6604 Reviewed-by: Michael BrownReviewed-by: Jim Apple Tested-by: Jim Apple --- A docs/build/html/commonltr.css A docs/build/html/commonrtl.css A docs/build/html/images/impala_arch.jpeg A docs/build/html/index.html A docs/build/html/topics/impala_abort_on_default_limit_exceeded.html A docs/build/html/topics/impala_abort_on_error.html A docs/build/html/topics/impala_admin.html A docs/build/html/topics/impala_admission.html A docs/build/html/topics/impala_aggregate_functions.html A docs/build/html/topics/impala_aliases.html A docs/build/html/topics/impala_allow_unsupported_formats.html A docs/build/html/topics/impala_alter_table.html A docs/build/html/topics/impala_alter_view.html A docs/build/html/topics/impala_analytic_functions.html A docs/build/html/topics/impala_appx_count_distinct.html A docs/build/html/topics/impala_appx_median.html A docs/build/html/topics/impala_array.html A docs/build/html/topics/impala_auditing.html A docs/build/html/topics/impala_authentication.html A docs/build/html/topics/impala_authorization.html A docs/build/html/topics/impala_avg.html A docs/build/html/topics/impala_avro.html A docs/build/html/topics/impala_batch_size.html A docs/build/html/topics/impala_bigint.html A docs/build/html/topics/impala_bit_functions.html A docs/build/html/topics/impala_boolean.html A docs/build/html/topics/impala_breakpad.html A docs/build/html/topics/impala_char.html A docs/build/html/topics/impala_cluster_sizing.html A docs/build/html/topics/impala_comments.html A docs/build/html/topics/impala_complex_types.html A docs/build/html/topics/impala_components.html A docs/build/html/topics/impala_compression_codec.html A docs/build/html/topics/impala_compute_stats.html A docs/build/html/topics/impala_concepts.html A docs/build/html/topics/impala_conditional_functions.html A docs/build/html/topics/impala_config.html A docs/build/html/topics/impala_config_options.html A docs/build/html/topics/impala_config_performance.html A docs/build/html/topics/impala_connecting.html A docs/build/html/topics/impala_conversion_functions.html A docs/build/html/topics/impala_count.html A docs/build/html/topics/impala_create_database.html A docs/build/html/topics/impala_create_function.html A docs/build/html/topics/impala_create_role.html A docs/build/html/topics/impala_create_table.html A docs/build/html/topics/impala_create_view.html A docs/build/html/topics/impala_databases.html A docs/build/html/topics/impala_datatypes.html A docs/build/html/topics/impala_datetime_functions.html A docs/build/html/topics/impala_ddl.html A docs/build/html/topics/impala_debug_action.html A docs/build/html/topics/impala_decimal.html A docs/build/html/topics/impala_default_order_by_limit.html A docs/build/html/topics/impala_delegation.html A docs/build/html/topics/impala_delete.html A docs/build/html/topics/impala_describe.html A docs/build/html/topics/impala_development.html A docs/build/html/topics/impala_disable_codegen.html A docs/build/html/topics/impala_disable_row_runtime_filtering.html A docs/build/html/topics/impala_disable_streaming_preaggregations.html A docs/build/html/topics/impala_disable_unsafe_spills.html A docs/build/html/topics/impala_disk_space.html A docs/build/html/topics/impala_distinct.html A docs/build/html/topics/impala_dml.html A docs/build/html/topics/impala_double.html A docs/build/html/topics/impala_drop_database.html A docs/build/html/topics/impala_drop_function.html A docs/build/html/topics/impala_drop_role.html A docs/build/html/topics/impala_drop_stats.html A docs/build/html/topics/impala_drop_table.html A docs/build/html/topics/impala_drop_view.html A docs/build/html/topics/impala_exec_single_node_rows_threshold.html A docs/build/html/topics/impala_explain.html A docs/build/html/topics/impala_explain_level.html A docs/build/html/topics/impala_explain_plan.html A docs/build/html/topics/impala_faq.html A docs/build/html/topics/impala_file_formats.html A docs/build/html/topics/impala_fixed_issues.html A docs/build/html/topics/impala_float.html A docs/build/html/topics/impala_functions.html A docs/build/html/topics/impala_functions_overview.html A docs/build/html/topics/impala_grant.html A docs/build/html/topics/impala_group_by.html A docs/build/html/topics/impala_group_concat.html A
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site
Jim Apple has posted comments on this change. Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site .. Patch Set 3: Code-Review+2 Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site
Michael Brown has posted comments on this change. Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool
Tim Armstrong has uploaded a new patch set (#9). Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool .. IMPALA-5073: Part 1: add option to use mmap() for buffer pool Support allocating with mmap instead of TCMalloc to give more control over memory usage. Also tell Linux to back larger buffers with huge pages when possible to reduce TLB pressure. The main complication is that memory returned by mmap() is not necessarily aligned to a huge page boundary, so we need to "fix up" the mapping ourselves. Adds additional memory metrics, since we previously relied on the assumption that all memory was allocated through TCMalloc. memory.total-used tracks the total across the buffer pool and TCMalloc. When the buffer pool is not present, they just report the TCMalloc values. This can be enabled with the --mmap_buffers flag. The transparent huge pages support can be disabled with the --madvise_huge_pages startup flag. At some point this should become the default, but it requires more work to validate perf and resource used (virtual address space, etc). Testing: Added some unit tests to test edge cases and the different supported flags. Many pre-existing tests also exercise the modified code. Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 --- M be/src/catalog/catalogd-main.cc M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/system-allocator.cc M be/src/runtime/bufferpool/system-allocator.h M be/src/runtime/exec-env.cc M be/src/statestore/statestored-main.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M be/src/util/metrics-test.cc M be/src/util/metrics.h M common/thrift/generate_error_codes.py M common/thrift/metrics.json 15 files changed, 398 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/6474/9 -- To view, visit http://gerrit.cloudera.org:8080/6474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5073: Part 1: add option to use mmap() for buffer pool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5073: Part 1: add option to use mmap() for buffer pool .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/6474/8/be/src/runtime/bufferpool/system-allocator.cc File be/src/runtime/bufferpool/system-allocator.cc: Line 54: RETURN_IF_ERROR(AllocateViaMMap(len, _mem)); I added an option to enable/disable this path and disabled it by default - I don't have enough confidence right now that bypassing TCMalloc will have no undesirable side-effects. Getting this in will allow us to do experiments and defer the decision to turn it on. http://gerrit.cloudera.org:8080/#/c/6474/8/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: PS8, Line 161: TOTAL_ALLOCATED > how about SYSTEM_ALLOCATED? Done PS8, Line 165: TOTAL_RESERVED > maybe just RESERVED, but I don't feel too strongly. Done -- To view, visit http://gerrit.cloudera.org:8080/6474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.
Laurel Hale has posted comments on this change. Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources. .. Patch Set 2: > (1 comment) ok--I'll repush -- To view, visit http://gerrit.cloudera.org:8080/6604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.
Michael Brown has posted comments on this change. Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources. .. Patch Set 2: (1 comment) Laurel, HTML and PDF look good, but see comment. http://gerrit.cloudera.org:8080/#/c/6604/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-4181 [DOCS] Publish rendered Impala : documentation to ASF resources. The commit header should only be 1 line. This is so commands like "git log --oneline" can identify commits easily and without clutter. That means the header shouldn't take up multiple lines or be too long. You're really close to a good commit message. What about something like this for the header? IMPALA-4181: [DOCS] Publish rendered Impala documentation to ASF site That way the header is 1 line, and short (notice it stays to the left of the orangey vertical dotted line when you view the commit message in Gerrit). The rest of the commit message is fine. -- To view, visit http://gerrit.cloudera.org:8080/6604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types .. IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types This change adds functionality to write parquet::Statistics for Decimal, String, and Timestamp values. It also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format PR change. The HdfsParquetScanner will preferably read the new fields if they are populated. For tables with only the old fields populated, it will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the comparison of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It exercises the fallback logic for supported in a test using that file. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M common/thrift/parquet.thrift M testdata/data/README A testdata/data/deprecated_statistics.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py M tests/query_test/test_parquet_stats.py 14 files changed, 676 insertions(+), 186 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/3 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.
Jim Apple has posted comments on this change. Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6604/1//COMMIT_MSG Commit Message: Line 7: IMPALA-4181 [DOCS] Publish rendered Impala > Please use the following format for git commit messages The first paragraph is always only one line long -- To view, visit http://gerrit.cloudera.org:8080/6604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Laurel HaleGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources.
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6604 to look at the new patch set (#2). Change subject: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources. .. IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF resources. Add Impala docs from branch master, commit hash 68f32e52bc42bef578330a4fe0edc5b292891eea. This is the last commit made by JRussell in the cleanup project. Removed both HTML files from the shared folder (ImpalaVariables.html and impala_common.html) Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead --- A docs/build/html/commonltr.css A docs/build/html/commonrtl.css A docs/build/html/images/impala_arch.jpeg A docs/build/html/index.html A docs/build/html/topics/impala_abort_on_default_limit_exceeded.html A docs/build/html/topics/impala_abort_on_error.html A docs/build/html/topics/impala_admin.html A docs/build/html/topics/impala_admission.html A docs/build/html/topics/impala_aggregate_functions.html A docs/build/html/topics/impala_aliases.html A docs/build/html/topics/impala_allow_unsupported_formats.html A docs/build/html/topics/impala_alter_table.html A docs/build/html/topics/impala_alter_view.html A docs/build/html/topics/impala_analytic_functions.html A docs/build/html/topics/impala_appx_count_distinct.html A docs/build/html/topics/impala_appx_median.html A docs/build/html/topics/impala_array.html A docs/build/html/topics/impala_auditing.html A docs/build/html/topics/impala_authentication.html A docs/build/html/topics/impala_authorization.html A docs/build/html/topics/impala_avg.html A docs/build/html/topics/impala_avro.html A docs/build/html/topics/impala_batch_size.html A docs/build/html/topics/impala_bigint.html A docs/build/html/topics/impala_bit_functions.html A docs/build/html/topics/impala_boolean.html A docs/build/html/topics/impala_breakpad.html A docs/build/html/topics/impala_char.html A docs/build/html/topics/impala_cluster_sizing.html A docs/build/html/topics/impala_comments.html A docs/build/html/topics/impala_complex_types.html A docs/build/html/topics/impala_components.html A docs/build/html/topics/impala_compression_codec.html A docs/build/html/topics/impala_compute_stats.html A docs/build/html/topics/impala_concepts.html A docs/build/html/topics/impala_conditional_functions.html A docs/build/html/topics/impala_config.html A docs/build/html/topics/impala_config_options.html A docs/build/html/topics/impala_config_performance.html A docs/build/html/topics/impala_connecting.html A docs/build/html/topics/impala_conversion_functions.html A docs/build/html/topics/impala_count.html A docs/build/html/topics/impala_create_database.html A docs/build/html/topics/impala_create_function.html A docs/build/html/topics/impala_create_role.html A docs/build/html/topics/impala_create_table.html A docs/build/html/topics/impala_create_view.html A docs/build/html/topics/impala_databases.html A docs/build/html/topics/impala_datatypes.html A docs/build/html/topics/impala_datetime_functions.html A docs/build/html/topics/impala_ddl.html A docs/build/html/topics/impala_debug_action.html A docs/build/html/topics/impala_decimal.html A docs/build/html/topics/impala_default_order_by_limit.html A docs/build/html/topics/impala_delegation.html A docs/build/html/topics/impala_delete.html A docs/build/html/topics/impala_describe.html A docs/build/html/topics/impala_development.html A docs/build/html/topics/impala_disable_codegen.html A docs/build/html/topics/impala_disable_row_runtime_filtering.html A docs/build/html/topics/impala_disable_streaming_preaggregations.html A docs/build/html/topics/impala_disable_unsafe_spills.html A docs/build/html/topics/impala_disk_space.html A docs/build/html/topics/impala_distinct.html A docs/build/html/topics/impala_dml.html A docs/build/html/topics/impala_double.html A docs/build/html/topics/impala_drop_database.html A docs/build/html/topics/impala_drop_function.html A docs/build/html/topics/impala_drop_role.html A docs/build/html/topics/impala_drop_stats.html A docs/build/html/topics/impala_drop_table.html A docs/build/html/topics/impala_drop_view.html A docs/build/html/topics/impala_exec_single_node_rows_threshold.html A docs/build/html/topics/impala_explain.html A docs/build/html/topics/impala_explain_level.html A docs/build/html/topics/impala_explain_plan.html A docs/build/html/topics/impala_faq.html A docs/build/html/topics/impala_file_formats.html A docs/build/html/topics/impala_fixed_issues.html A docs/build/html/topics/impala_float.html A docs/build/html/topics/impala_functions.html A docs/build/html/topics/impala_functions_overview.html A docs/build/html/topics/impala_grant.html A docs/build/html/topics/impala_group_by.html A docs/build/html/topics/impala_group_concat.html A docs/build/html/topics/impala_hadoop.html A docs/build/html/topics/impala_having.html A
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. Patch Set 4: (2 comments) Only minor style comments remaining. http://gerrit.cloudera.org:8080/#/c/6522/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 70: if new_vector.get_value('table_format').file_format == 'kudu': We can make this more concise: in ('kudu, 'hbase') Line 74: self.run_test_case('QueryTest/rows-read', new_vector) Might as well make this a separate test method, e.g. test_rows_read(). Or actually that seems very specific. Maybe rename to something like test_hdfs_scanner_profile/hdfs-scanner-profile. -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#7). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 665 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has posted comments on this change. Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: Line 113: "Invalid time zone in the '%s' table property: %s", > double 'the' Done http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 665: > Let's move all the CREATE/ALTER tests into a separate TestParquetMrInt96Wri Done Line 1882: "' '", "URI path cannot be empty."); > easier to read single quotes Done Line 1904: type == PrimitiveType.VARCHAR) { > easier to read single quotes Done http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py: Line 105: parquet_path = get_fs_path( > What does "fn" stand for? I'm thinking "file name", but this is not just a Renamed it to 'parquet_path' Line 123: ON i.id = h.id AND i.day = h.day -- serves as a unique key > easier to read with the alias 'i' next to the table Done Line 125: (h.timestamp_col IS NULL) != (i.timestamp_col IS NULL) > simplify the first two conditions with: Done. Had to put round brackets around col IS NULL expressions to avoid analysis errors. http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py File tests/query_test/test_parquet_timestamp_compatibility.py: Line 78: def test_invalid_parquet_mr_write_zone(self, vector, unique_database): > test_invalid_parquet_mr_write_zone Done Line 118: # tz_name conversion on the timestamp values. > extra space after "triggers" Done -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Attila Jeges has uploaded a new patch set (#7). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet .. IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 665 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/7 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs
Alex Behm has posted comments on this change. Change subject: IMPALA-5180: Don't balk at non-deterministic exprs .. Patch Set 3: (8 comments) Fix looks much better http://gerrit.cloudera.org:8080/#/c/6575/3//COMMIT_MSG Commit Message: Line 7: IMPALA-5180: Don't balk at non-deterministic exprs Msg is cryptic. Can you summarize what is changed in this patch, as opposed to what bug/behavior is fixed? Line 9: HDFS partition pruning can run afoul of our logic of considering Not sure what "run afoul" means, please be specific. http://gerrit.cloudera.org:8080/#/c/6575/3/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 1029: public boolean isBoundByTupleIds(List tids) { What about this? Our isBound() family of functions should be consistent. Line 1037:* Returns true if expr is fully bound by slotIds and contains at least Not your change, but can you modify the comment to explain what "bound" means? The concept has caused a lot of confusion in the past especially to newcomers. Also, please split up the comment into more sentences. Line 1045: HashSet idSet = Sets.newHashSet(); idSet -> exprSlotIds Line 1046: int members = 0; Why is this needed? Do you mean Preconditions.checkState(!exprSlotIds.isEmpty())? Line 1047: this.getIdsHelper(null, idSet); remove 'this' http://gerrit.cloudera.org:8080/#/c/6575/3/testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test File testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test: Line 1026: select count(*) from keep this query as simple as possible, i.e., same as above -- To view, visit http://gerrit.cloudera.org:8080/6575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes