[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 10: Code-Review+2 PS10 is a rebase. Carrying Marcel's +2. -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 10: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/295/ -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 9: PS9 addresses a clang-tidy error. -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Hello Marcel Kornacker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6032 to look at the new patch set (#9). Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h A be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 32 files changed, 979 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/9 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Impala Public Jenkins has posted comments on this change. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Impala Public Jenkins has submitted this change and it was merged. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch references to Cloudera and CDH in Impala tutorial There was one tutorial that actually ran under the 'cloudera' user and so repeated that name over and over in directory and HDFS paths. I switched that to 'username'. I suppressed some and tags with Cloudera Manager-specific details. Will physically remove those from the source in a subsequent iteration. I left several instances of audience="Cloudera" because those will be changed to audience="hidden" as part of a separate change request. I marked with rev="upstream" some tags containing impala-shell banners with a Cloudera copyright statement. Will decide on a convention to handle those (elide those lines, or use a conref to consistently substitute the generic equivalent) and do that in a followup patch set. Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Reviewed-on: http://gerrit.cloudera.org:8080/5663 Reviewed-by: Laurel Hale Reviewed-by: Jim Apple Tested-by: Impala Public Jenkins --- M docs/impala_keydefs.ditamap M docs/topics/impala_tutorial.xml 2 files changed, 27 insertions(+), 27 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved Laurel Hale: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 8: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/294/ -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Impala Public Jenkins has posted comments on this change. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/46/ -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Lars Volker has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 9: (13 comments) http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 581: /* Nested types are not supported yet */ nit: use // style comment Is there a JIRA to add support for nested types? Line 661: if (enc_stat.encoding != parquet::Encoding::PLAIN_DICTIONARY) { Can the count ever be zero? The parquet.thrift file seems to allow it. Line 735: if (dict_filter_it == scanner_dict_filter_map_.end()) DCHECK(false); You could use DCHECK(a != b) http://gerrit.cloudera.org:8080/#/c/5904/5/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 439: /// These are pointers into elements of column_readers_ > You can use a ScoperBuffer here, which allows the memory to be tracked. Did this slip through the cracks? http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: PS9, Line 435: into s/into/to ? Line 446: Tuple* dict_filter_tuple_; I'd initialize this when you need it, since it is just the result of a cast and doesn't take up space, but this way adds to the list of fields (and size of the header). PS9, Line 618: ( Why the (? Line 623: /// Returns true if the column chunk is 100% dictionary encoded Can you make this more concrete? For example "if all pages of the column chunk...". PS9, Line 630: row_group_eliminated Isn't the caller responsible for eliminating the row group? "_eliminated" sounds like as a side effect the elimination will be done. http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 254: map dict_encoding_stats_; why not unordered_map? Also we I think we prefer int over int32_t http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS9, Line 499: NeedsCoversion NeedsConversion http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: PS9, Line 427: hext_header_size *next_header_size http://gerrit.cloudera.org:8080/#/c/5904/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 229:* TODO: It should be possible to use this in isConstantImpl. You could pass the function name into isDeterministic and then re-use this in isConstantImpl to resolve the todo -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] Three misc webpage changes
Henry Robinson has posted comments on this change. Change subject: Three misc webpage changes .. Patch Set 1: Thanks for the reviews! > seems fine but I don't really know jquery If it makes it easier, this is basically C&P code - see e.g. https://github.com/apache/incubator-impala/blob/master/www/sessions.tmpl for another example of a sortable table already in master. > Is there a plan to migrate other HTML tables to DataTables? (E.g., Queries, > Query Locations, ...) No particular plans - it's so easy to do I think anyone can do it when they see a need. -- To view, visit http://gerrit.cloudera.org:8080/6045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Jim Apple has posted comments on this change. Change subject: Add .pep8rc for Impala's Python style .. Patch Set 1: Any news with this? -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic
Laurel Hale has posted comments on this change. Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic .. Patch Set 7: Code-Review+1 I'm still getting some errors when it builds. Looks like there are some broken links to impala_faq.xml, but your changes are being picked up in the build, so rendered docs look okay. -- To view, visit http://gerrit.cloudera.org:8080/5962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Jim Apple has posted comments on this change. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Laurel Hale has posted comments on this change. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch Set 3: Code-Review+1 Looks good. -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic
Ambreen Kazi has posted comments on this change. Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic .. Patch Set 5: (8 comments) Addressed all of John's comments. http://gerrit.cloudera.org:8080/#/c/5962/5/docs/shared/impala_common.xml File docs/shared/impala_common.xml: PS5, Line 566: , > Remove comma. Done http://gerrit.cloudera.org:8080/#/c/5962/5/docs/topics/impala_delegation.xml File docs/topics/impala_delegation.xml: PS5, Line 79: > Is there a generic alternative place to link? If not, my impulse is to leav I couldn't find a suitable upstream alternative. Left this in place. PS5, Line 100: > Could genericize this to "consult the documentation for the ODBC driver you Done http://gerrit.cloudera.org:8080/#/c/5962/5/docs/topics/impala_kerberos.xml File docs/topics/impala_kerberos.xml: PS5, Line 47: With the in place, this can be shortened to Done PS5, Line 62: > Take out any spaces or tabs that are left behind in new blank lines. Done PS5, Line 144: This can also be shortened to an tag, referencing the ke Done PS5, Line 178: Not critical in this case, but for future reference. If any of the list ite Will do in the future. This one I've left as is. http://gerrit.cloudera.org:8080/#/c/5962/5/docs/topics/impala_ldap.xml File docs/topics/impala_ldap.xml: PS5, Line 164: on the command line > Take out that phrase, because often the daemons are started by a script and Done -- To view, visit http://gerrit.cloudera.org:8080/5962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/294/ -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic
Hello Laurel Hale, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5962 to look at the new patch set (#7). Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic .. IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic This is part 2 of the work being done to genericize the Impala security topics. All references to Cloudera have been either marked 'hidden' or replaced with links to the relevant open-source docs. Note: -Links to the standalone Cloudera ODBC driver doc have not been removed. -External links to the MIT Kerberos docs and Hadoop security docs were added to impala_keydefs. Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c --- M docs/impala_keydefs.ditamap M docs/shared/impala_common.xml M docs/topics/impala_delegation.xml M docs/topics/impala_kerberos.xml M docs/topics/impala_ldap.xml 5 files changed, 60 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/5962/7 -- To view, visit http://gerrit.cloudera.org:8080/5962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] [DOCS] Release note updates for Impala 2.8
Jim Apple has posted comments on this change. Change subject: [DOCS] Release note updates for Impala 2.8 .. Patch Set 9: Code-Review+2 As the RM for 2.8, I'm +2ing this. We can fix nits in follow-ups, but this will block 2.9 release notes work otherwise -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic
Hello Laurel Hale, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5962 to look at the new patch set (#6). Change subject: IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic .. IMPALA-3410 [DOCS] Rework Impala authentication topics to be generic This is part 2 of the work being done to genericize the Impala security topics. All references to Cloudera have been either marked 'hidden' or replaced with links to the relevant open-source docs. Note: -Links to the standalone Cloudera ODBC driver doc have not been removed. -External links to the MIT Kerberos docs and Hadoop security docs were added to impala_keydefs. Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c --- M docs/impala_keydefs.ditamap M docs/shared/impala_common.xml M docs/topics/impala_delegation.xml M docs/topics/impala_kerberos.xml M docs/topics/impala_ldap.xml 5 files changed, 61 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/5962/6 -- To view, visit http://gerrit.cloudera.org:8080/5962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I639a55eb43555cf074c26d23b5c72f778073231c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-3403: [DOCS] Pare back irrelevant installation info
Jim Apple has posted comments on this change. Change subject: IMPALA-3403: [DOCS] Pare back irrelevant installation info .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6005/1/docs/topics/impala_install.xml File docs/topics/impala_install.xml: Line 66: > The README.md file: Please make the README.md description note that this is where you can find a pointer to the build instructions, not just "special notices". I believe the wiki page is already bit-rotted. The IPMC also has asked for us to describe a pointer to the build instructions in the README. -- To view, visit http://gerrit.cloudera.org:8080/6005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/6025/8/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 126: // TODO: this file should be cross compiled and then all of the builtin is this done? Line 961: struct ReservoirSampleState { i'd say this really turned into a class Line 967: // resize occurs, this needs to be updated from the outside. what does 'from the outside' mean? Line 977: ReservoirSampleState(int init_capacity) : use standard formatting Line 1016: // The array of ReservoirSamples starts right after ReservoirSampleState, so we use that's often done by putting an array of size 1 at the end of the header struct: ReservoirSample samples[1]; and then you can do things like state.samples[5] = x; it's convenient and makes it explicit that you have a trailing var-len array. Line 1025: int64_t GetNext64(int64_t max) { while you're at it, this deserves a comment Line 1033: // Given a buffer that contains a ReservoirSampleState, resize the buffer so that it's its Line 1040: if (new_capacity * 2 >= MAX_CAPACITY) new_capacity = MAX_CAPACITY; if state->capacity is 10 and max_capacity is 40, this line sets new_capacity to 40. Line 1062: // If the array gets filled due to updates or merges, we reallocate a larger buffer to you should put this (= a brief description of what you're doing) somewhere at the beginning of the reservoir sample-related code. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 4: (1 comment) Thanks for the review. I addressed the remaining comments in PS8. There's the question who calls ColumnStatsBase::ReadFromThrift, let me know if you want me to elaborate on it in the comment. http://gerrit.cloudera.org:8080/#/c/6032/7/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 55: } > i thought the stats predicates are visible in extended? Yes, done. -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6032 to look at the new patch set (#8). Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h A be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 32 files changed, 986 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/8 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. Patch Set 8: (31 comments) Addressed the review comments. I also added a query option "parquet_dictionary_filtering" which defaults to true. http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 515: RETURN_IF_ERROR(InitColumns(row_group_idx_, column_readers_)); > good point. InitColumns() needs to be split up so that the construction of I changed this so that we call InitColumns for only the dictionary filtering columns before doing dictionary filtering. This means that the reads for the other columns will only occur if the row group passes. To do better than this (i.e. to only read the dictionary), we will need to do what Marcel suggests and rework the code more substantially. Line 532: RETURN_IF_ERROR(InitDictionaries(column_readers_)); > the control flow is hard to follow because it the top-level functions don't Done Line 607: // values are not validated, so skip these datatypes for now. > also leave todo, this can be done during dictionary construction. Done Line 615: Status HdfsParquetScanner::EvalDictionaryFilters(int row_group_idx, > indentation Done Line 628: for (ParquetColumnReader* col_reader : dict_filter_column_readers_) { > there is a lot of code here that doesn't evaluate a dictionary. restructure Done Line 637: vector& slot_conjunct_ctxs = slot_conjunct_it->second; > move to where variables are used. Done Line 701: // here or in ReadDataPage but never both. > is that still true? i thought you expect the first page to be dictionary-en Good point, corrected the comment. The dictionary page offset is used to set the start of the stream. The dictionary page offset is required to be less than the data page offset, so the stream needs to start at dictionary page offset. Line 710: // TODO: handle the 4 in a more elegant manner (make it a named constant) > resolve todo (don't leave todos that take very little time to resolve). Done Line 723: column_passes_filter = true; > 'passes filter' is ambiguous (does this mean the predicate is true or false Done http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 613: void InitDictionaryFilters(); > not intuitively named, i thought this had something to do with dictionary c Changed the name http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 244: unordered_set column_encodings_; > missing comments Done Line 525: dict_encoding_stats_[dict_header.encoding]++; > ++... Done Line 611: column_encodings_.insert(header.data_page_header.definition_level_encoding); > is this useful/required? the def/rep level encodings don't vary. Changed this to do the def/rep level encodings once at the start. Line 1042: for (Encoding::type encoding : columns_[i]->column_encodings_) { > you use columns_[i] a lot, create a variable Done http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS8, Line 462: NeedsConversionInline > can this be another template parameter? The callers of ReadSlot are on the fast path as well, so those locations would need to know what value to pass into ReadSlot and would need the inlined version. Line 764: // end of the stream. > superfluous comment, the function comment in the header file already makes Done Line 804: new_buffer_size, &buffer, &new_buffer_size, &status, /* peek */ true); > indentation Done Line 811: if (buffer_size == new_buffer_size) { > what does this mean? Added a comment Line 991: return Status("Column chunk should not contain two dictionary pages."); > make this an actionable error message (ie, include the file name). Done http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: Line 431: /// an indication of whether a dictionary is present. > so this is true if no dictionary exists? Removed this http://gerrit.cloudera.org:8080/#/c/5904/8/be/src/util/dict-test.cc File be/src/util/dict-test.cc: Line 55: for (T i: values) { > do a similar loop for GetValue() Done http://gerrit.cloudera.org:8080/#/c/5904/8/common/thrift/parquet.thrift File common/thrift/parquet.thrift: Line 1: /** > is this the unmodified parquet.thrift? (and it comes without a format versi Yes, this is the file I found on github. I fixed some whitespace, but there is no version information. http://gerrit.cloudera.org:8080/#/c/5904/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 190: >
[Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Joe McDonnell has uploaded a new patch set (#9). Change subject: IMPALA-4624: Implement Parquet dictionary filtering .. IMPALA-4624: Implement Parquet dictionary filtering Here is a basic summary of the changes: Frontend looks for conjuncts that operate on a single slot and pass a map from slot id to the conjunct index through thrift to the backend. The conjunct indices are the incides into the normal PlanNode conjuncts list. The conjuncts need to satisfy certain conditions: 1. They are bound on a single slot 2. They are deterministic (no random functions) 3. They evaluate to FALSE on a NULL input. This is because the dictionary does not include NULLs, so any condition that evaluates to TRUE on NULL cannot be evaluated by looking only at the dictionary. The backend converts the indices into ExprContexts. These are cloned in the scanner threads. The dictionary read codepath has been removed from ReadDataPage into its own function, InitDictionary. This has also been turned into its own step in row group initialization. ReadDataPage will not see any dictionary pages unless the parquet file is invalid. For dictionary filtering, we initialize dictionaries only as needed to evaluate the conjuncts. The Parquet scanner evaluates the dictionary filter conjuncts on the dictionary to see if any dictionary entry passes. If no entry passes, the row group is eliminated. If the row group passes the dictionary filtering, then we initialize all remaining dictionaries. Since column chunks can have a mixture of encodings, dictionary filtering uses three tests to determine whether this is purely dictionary encoded: 1. If the encoding_stats is in the parquet file, then use it to determine if there are only dictionary encoded pages (i.e. there are no data pages with an encoding other than PLAIN_DICTIONARY). -OR- 2. If the encoding stats are not present, then look at the encodings. The column is purely dictionary encoded if: a) PLAIN_DICTIONARY is present AND b) Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed -OR- 3. If this file was written by an older version of Impala, then we know that dictionary failover happens when the dictionary reaches 40,000 values. Dictionary filtering can proceed as long as the dictionary is smaller than that. parquet-mr writes the encoding list correctly in the current version in our environment (1.5.0). This means that check #2 works on some existing files (potentially most existing parquet-mr files). parquet-mr writes the encoding stats starting in 1.9.0. This is the version where check #1 will start working. Impala's parquet writer now implements both, so either check above will work. Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M common/thrift/parquet.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test A testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test M tests/query_test/test_scanners.py 24 files changed, 1,143 insertions(+), 172 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/5904/9 -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 31: /// writing parquet files. It provides an interface to populate a parquet::Statistics > i think you can leave the getters out for now, since we won't be calling th Done http://gerrit.cloudera.org:8080/#/c/6032/7/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 68: static bool ReadFromThrift(const parquet::Statistics& thrift_stats, > who calls this? This is called by the parquet scanners. It's implemented by dispatching on 'col_type' and calling ReadFromThrift() in the subclass for the correct C++ type T. Should this be explained in the comment? -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 7: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 31: /// writing parquet files. It provides an interface to populate a parquet::Statistics > The other two values are i64 null_count and i64 distinct_count, for which I i think you can leave the getters out for now, since we won't be calling them, but for the sake of completeness we should add the other two enums. http://gerrit.cloudera.org:8080/#/c/6032/7/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 68: static bool ReadFromThrift(const parquet::Statistics& thrift_stats, who calls this? http://gerrit.cloudera.org:8080/#/c/6032/7/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 55: options.setExplain_level(TExplainLevel.VERBOSE); i thought the stats predicates are visible in extended? -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3403: [DOCS] Pare back irrelevant installation info
John Russell has posted comments on this change. Change subject: IMPALA-3403: [DOCS] Pare back irrelevant installation info .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6005/1/docs/topics/impala_install.xml File docs/topics/impala_install.xml: Line 48: Follow these steps to set up Impala on a cluster by building from source: > This is odd wording. I think it might be better to say that this guide has Done Line 66: > I think the README.md is the best place for build instructions, since that The README.md file: https://github.com/apache/incubator-impala/blob/master/README.md currently is pretty slim pickings. It just says Impala requires Linux (no distro names or numbers) and for building, it sends the user elsewhere: "See bin/bootstrap_build.sh." Why don't I leave the link to the wiki for the time being, and add a reference to the README as well. We can reconsider taking out the wiki link if we later beef up the README file. -- To view, visit http://gerrit.cloudera.org:8080/6005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3403: [DOCS] Pare back irrelevant installation info
John Russell has uploaded a new patch set (#2). Change subject: IMPALA-3403: [DOCS] Pare back irrelevant installation info .. IMPALA-3403: [DOCS] Pare back irrelevant installation info Take out the path A / path B pages with CM and non-CM installation techniques for packages and parcels. Replace with steps for building from source, adapted from http://impala.apache.org/downloads.html Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c --- M docs/impala.ditamap M docs/topics/impala_install.xml 2 files changed, 40 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/6005/2 -- To view, visit http://gerrit.cloudera.org:8080/6005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia81b1afbf04b93c322b9829d9e05e7af4243c37c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. IMPALA-4959: Avoid picking up the system's boost cmake module In some systems with an old boost installed system-wide the impala build would fail with something like: CMake Error at /usr/lib64/boost/BoostConfig.cmake:64 (get_target_property): get_target_property() called with non-existent target "boost_thread-shared". Call Stack (most recent call first): toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 (find_package) CMakeLists.txt:116 (find_package) CMake Error at /usr/lib64/boost/BoostConfig.cmake:72 (get_target_property): get_target_property() called with non-existent target "boost_thread-shared-debug". Call Stack (most recent call first): toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 (find_package) CMakeLists.txt:116 (find_package) This because, if it exists, cmake's FindBoost.cmake will look for and use that module, even though boost's cmake build hasn't been maintained in years and the impala build is actually configured to not use the systems boost. This patch sets the cmake flag Boost_NO_BOOST_CMAKE to ON, making sure the old cmake module is not picked up. Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Reviewed-on: http://gerrit.cloudera.org:8080/5994 Reviewed-by: Alex Behm Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M CMakeLists.txt 1 file changed, 4 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved Alex Behm: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3411 [DOCS] Rework Impala governance topics to be generic. .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3411 [DOCS] Rework Impala governance topics to be generic. .. IMPALA-3411 [DOCS] Rework Impala governance topics to be generic. This set of edits removes references and links to Cloudera Navigator and Cloudera Manager from the auditing and lineage topics. Those were either marked as 'hidden' or replaced with a generic suggestion to use cluster management software with a focus on governance. Some paragraphs with overflowing lines were also fixed. Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782 Reviewed-on: http://gerrit.cloudera.org:8080/5957 Reviewed-by: Jim Apple Tested-by: Impala Public Jenkins --- M docs/topics/impala_auditing.xml M docs/topics/impala_lineage.xml 2 files changed, 57 insertions(+), 35 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' .. IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' Rewrote sections to eliminate 'Cloudera Manager' from topics. Look for subsequent phases to remove remaining instances of CM. Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Reviewed-on: http://gerrit.cloudera.org:8080/6049 Reviewed-by: Jim Apple Tested-by: Impala Public Jenkins --- M docs/shared/impala_common.xml M docs/topics/impala_config_options.xml M docs/topics/impala_upgrading.xml 3 files changed, 6 insertions(+), 17 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3411 [DOCS] Rework Impala governance topics to be generic. .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/45/ -- To view, visit http://gerrit.cloudera.org:8080/5957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Ambreen Kazi Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager'
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3401 [DOCS] Phase 1 of removing 'Cloudera Manager' .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/43/ -- To view, visit http://gerrit.cloudera.org:8080/6049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02ff6c3fc74e2e59b5d130226bd38c23c9c094b7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/44/ -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 7: (7 comments) I'm pretty happy with this change http://gerrit.cloudera.org:8080/#/c/5816/8/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 67: // Which children can be passed through, without being materialized. duplicate 'without' http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 225: // Indicates whether passthrough should be disabled in union nodes. enabled http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: PS8, Line 255: d enabled http://gerrit.cloudera.org:8080/#/c/5816/8/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 267: // nullability can be considered to be part of the physical row layout. remove (this was moved inside computePassThrough http://gerrit.cloudera.org:8080/#/c/5816/8/testdata/workloads/functional-query/queries/QueryTest/union.test File testdata/workloads/functional-query/queries/QueryTest/union.test: Line 1059: select bigint_col from functional.alltypestiny where bigint_col > 0 use unqualified table names everywhere (and fix the one above while you are here), so we get coverage over all file formats Line 1124: QUERY Move this test into nested-types-subplan.test, otherwise this test will fail on the legacy joins/agg build since nested types are not supported there (nested-types-subplan.test will be skipped) Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price) lowercase count -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 22: Code-Review+2 Michael, any more comments? -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
Alex Behm has uploaded a new patch set (#3). Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. .. IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. Implements HdfsLzoTextTextScanner::GetNext() and changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSpit()) and the new GetNext() interface. These changes were tricky: - The scanner used to rely on the ability to attach a batch to the row-batch queue for freeing resources - This patch attempts to preserve the resource-freeing behavior by clearing resources as soon as they are complete - In particular, the scanner attempts to skip corrupt/invalid data blocks, and we should avoid accumulating memory unnecessarily The other changes are mostly straightforward: - Add a RowBatch parameter to various functions - Add a MemPool parameter to various functions for attaching memory of completed resources that may still be references by returned batches - Change Close() to free all resources when a nullptr RowBatch is passed Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 15 files changed, 292 insertions(+), 215 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/6000/3 -- To view, visit http://gerrit.cloudera.org:8080/6000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 64: batch_start_ptr_(NULL), > use nullptr throughout Done Line 273: MemPool* pool = row_batch->tuple_data_pool(); > please consider getting rid of this variable and referring to the rowbatch Done Line 440: eos_ = true; > advance scan_state_? Done. I think it makes sense to not allow GetNext() once eos() is true because there does not seem to be a real use case for allowing that, and disallowing it helps simplify the logic in GetNext(). Imo, we should have that contract for ExecNode::GetNext() as well, but that's another story. Line 445: << "FE should have generated SNAPPY_BLOCKED instead."; > why not check this when creating the stream? Good idea. Done. Line 446: RETURN_IF_ERROR(UpdateDecompressor(stream_->file_desc()->file_compression)); > why is this needed, given that we just started (ie, why not move it into in Moved into InitNewRange() Line 459: eos_ = true; > scan_state_ transition? Done Line 469: Status HdfsTextScanner::CommitRows(RowBatch* row_batch, int num_rows) { > how about coalescing that with hdfsscannode::commitrows? at the very least The main problem is that the HdfsScanner::CommitRows() may add the batch to the row-batch queue and allocate a new batch. But for the new scanners we want to manually add batches to the queue after getting another batch with GetNext(). I merged this CommitRows() with HdfsScanner::CommitRows() and added a flag to control the queue-adding behavior. It's slightly awkward, let me know if you have a better idea. Line 485: ExprContext::FreeLocalAllocations(entry.second); > why call this here? Added a comment. Did you have place in mind? We can probably move it somewhere else, but not sure what side-effects it may have. http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 119: /// regardless if whether it passed the conjuncts. > regardless of whether Done Line 124: /// Only valid to call in scan state FIRST_TUPLE_FOUND or PAST_SCAN_RANGE. > thanks for introducing that state variable, it's much clearer than a bunch Thanks! Line 135: Status CommitRows(RowBatch* row_batch, int num_rows); > in/out params go last Done Line 140: /// otherwise it will just read num_bytes. If we are reading compressed text, num_bytes > if we are reading compressed text, should this even be called (or why is th Added comment. This functions internally handles uncompressed text, streaming compressed text (e.g., gzip), and non-streaming compressed text. See FillByteBufferCompressedFile() and FillByteBufferCompressedStream(). In addition, this function is overridden by the LZO scanner to reuse much of the scanning logic in here. -- To view, visit http://gerrit.cloudera.org:8080/6000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function
anujphadke has uploaded a new patch set (#3). Change subject: IMPALA-4848: Add WIDHT_BUCKET() function .. IMPALA-4848: Add WIDHT_BUCKET() function Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py 4 files changed, 74 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/3 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. A new query option DISABLE_UNION_PASSTHROUGH was added in this patch as a precaution and for testing purposes. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 20s660ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 187.474us 187.474us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 15.238us 15.238us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s958ms3s749ms3 13.08 MB 10.00 MB 00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s637ms1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS31s697ms1s708ms 28.80M 28.80M 528.48 MB 1.8
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. A new query option DISABLE_UNION_PASSTHROUGH was added in this patch as a precaution and for testing purposes. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 20s660ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 187.474us 187.474us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 15.238us 15.238us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s958ms3s749ms3 13.08 MB 10.00 MB 00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s637ms1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS31s697ms1s708ms 28.80M 28.80M 528.48 MB 1.8
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. IMPALA-3406: [DOCS] Empty the original Cloudera FAQ Almost all of the original Impala FAQ material was Cloudera-themed or commercially oriented. Lots of answers about the QuickStart VM, Cloudera discussion forums, CDH-based recommendations, etc. IMO it is not worth trying to adapt each FAQ entry to be generic. Better to start over from the ground up. Phase 1 of making an Apache-friendly FAQ is to strip the original page "down to the studs" so new FAQ entries can be added with more of a developer theme, based on questions people have in the community. Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Reviewed-on: http://gerrit.cloudera.org:8080/6003 Reviewed-by: Jim Apple Tested-by: Impala Public Jenkins --- M docs/topics/impala_faq.xml 1 file changed, 6 insertions(+), 1,852 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 21: (2 comments) http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1496: 100ll * 100ll * 100ll, 38, 38}}} Missing comma - got hit by a bad merge. expr-test was the only thing rebuilt so there wasn't much in the way of errors to spot. http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS21, Line 110: abs > maybe add a comment for future readers of this code: Done -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5951 to look at the new patch set (#22). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +--+ | cast(0.5 as int) | +--+ | 0| 1| +--+ +-+ | cast(cast(0.5999 as float) as decimal(5,1)) | +-+ | 0.5 | 0.6 | +-+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293784575265| +--+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--+ | sum(cast(l_extendedprice as bigint)) | +--+ | 2293814088985| +--+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.63s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-+ | 2293813121802.09| +-+ Fetched 1 row(s) in 0.64s +--+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--+ | 2293813156773.3596978911 | +--+ Fetched 1 row(s) i
[Impala-ASF-CR] IMPALA-3406: [DOCS] Empty the original Cloudera FAQ
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3406: [DOCS] Empty the original Cloudera FAQ .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/42/ -- To view, visit http://gerrit.cloudera.org:8080/6003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib81242f0981c04fff99e2c27e06a8d9f4da34c9f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. A new query option DISABLE_UNION_PASSTHROUGH was added in this patch as a precaution and for testing purposes. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 20s660ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 187.474us 187.474us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 15.238us 15.238us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s958ms3s749ms3 13.08 MB 10.00 MB 00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s637ms1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS31s697ms1s708ms 28.80M 28.80M 528.48 MB 1.8
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 7: (40 comments) http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 145: DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc())); > This should be IsPrefixOf() because we sanity checking the row descriptors Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 115: // passthrough case, the child can be closed right away because the row batch received > the child can be closed right away -> the child was already closed? Done Line 116: // from the child is copied (more details below). > accuracy: the row batch wasn't copied Done Line 121: if (child_idx_ < children_.size() && isChildPassThrough(child_idx_)) { > Suggest comment: Done Line 122: // If the rows from the current child can be passed through, the physical row layout > This comment doesn't seem to add anything, I suggest removing it. Replaced this with your suggestion. Line 131: // It may be possible that the row batch is not empty, so we save the previous value. > More details would be helpful. If the batch has rows at this point, I'd ima Added a dcheck instead. Time made this suggestion in patch 4. Line 148: // 'needs_deep_copy' lets us safely close the child in the next GetNext() call. > style: 'needs_deep_copy' is not a visible variable here, suggest just sayin Done Line 154: > DCHECK that child_idx_ is not passthrough here Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 67: // Which children can be passed through, without being materialized. > ... without evaluating and materializing their exprs. Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 412: /// Return true if the physical layout of this descriptor matches the physical layout > matches that of other_desc Done Line 413: /// of other_desc, but not necessarily ids. > bot not necessarily the id. Done Line 565: /// of other_desc, but not necessarily ids. > but not necessarily the ids Done http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h File be/src/service/query-options.h: Line 38: TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\ > I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a lit We have both positive and negative like DISABLE_CODEGEN and ENABLE_EXPR_REWRITES. I agree that ENABLE is simpler and easier to think about. (We should rename all DISABLE options.) http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 227: public boolean isEquivalent(SlotDescriptor other) { > Unfortunately, the term 'equivalent' already has a different meaning in the Done http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java: Line 616: public List getUnionNodeResultExprs() { > getUnionResultExprs() Done http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 47: * a child has an identical tuple layout as the output of the union node, the > ... as the output of the union node, and the child only has naked SlotRefs Done Line 57: protected final List resultExprs_; > unionResultExprs_ to make distinguish it from the resultExprLists_ and such Done Line 73: // If false, no batches from child nodes would be passed through in the backend. > Comment should describe what this flag is. Also you mean "true" right? Done Line 76: // Indicates for which child nodes batches can be passed through in the backend. > remove "in the backend" (it's clear that execution happens there) Done Line 81: protected UnionNode(PlanNodeId id, TupleId tupleId) { > Is this c'tor still needed? If not, remove. Yes, it's used if we are creating a constant node. (with no children) Line 89: List resultExprs, boolean isInSubplan) { > indentation, unionResultExprs Done Line 182:* Compute whether we can pass through rows without materializing for the given child. > Can combine/simplify like this: Done. I don't think it's necessary to describe the passthrough conditions here. The implementation makes it clear. Line 189: Analyzer analyzer, List childTupleIds, List childExprList) { > childExprList -> childResultExprs Done Line 190: if (analyzer.getQueryOptions().isDisable_union_passthrough()) return false; > seems clearer to move this into init() and not
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. A new query option DISABLE_UNION_PASSTHROUGH was added in this patch as a precaution and for testing purposes. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 20s660ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 187.474us 187.474us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 15.238us 15.238us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s958ms3s749ms3 13.08 MB 10.00 MB 00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s637ms1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS31s697ms1s708ms 28.80M 28.80M 528.48 MB 1.8
[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2 .. IMPALA-4962: Fix SHOW COLUMN STATS for HS2 Impala incorrectly returned NULLs in the "Max Size" column of the SHOW COLUMN STATS result when executed through the HS2 interface. The issue was that the column was specified to be type INT in the result schema, but the actual type of the contents that we inserted into it was "long". The reason why this is not an issue in Impala shell is because we stringify the contents without inspecting the metadata for beeswax results. The issue was fixed by changing the type from INT to BIGINT. Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Reviewed-on: http://gerrit.cloudera.org:8080/6109 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-show-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test M tests/hs2/test_fetch.py 12 files changed, 49 insertions(+), 48 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has posted comments on this change. Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. Patch Set 6: (14 comments) Thanks for the review. Please see PS7. http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 382: /// Min/max statistics contexts, owned by instances of this class. > they are not owned by the class instances themselves, they're allocated in Done Line 386: /// 'min_max_conjunct_ctxs_', which are used when evaluating parquet::Statistics. > somewhat incomprehensible comment. Done Line 476: Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, bool* skip_row_group); > you're not evaluating stats, you're evaluating predicates on the stats Done http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 31: enum class StatsField { MIN, MAX }; > move into ColumnStatsBase so it's clear what stats this is referring to. The other two values are i64 null_count and i64 distinct_count, for which I think we should have proper getters, i.e. we don't need to consider their type and method of decoding when reading them. Should we prefer an enum-based getter for those over ColumnStatsBase::ReadNullCountFromThrift() and ReadDistinctCountFromThrift()? Line 114: static bool ReadFromThrift(const parquet::Statistics& thrift_stats, const StatsField& stats_field, > move function bodies into parquet-column-stats-inl.h Done http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 297: collectionConjuncts_.clear(); > true, but those should be identical, given that we rewrite constants as lit Constant rewriting is controlled by the enable_expr_rewrites query option, changing the code to filter on literals and turning the query option off disables statistics for predicates like "a < 3 - 3". There is a test for this in parquet_stats.test:L206 that would fail. Let me know if you prefer to switch to literals. http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 85: * conjuncts, that are passed to the backend and will be evaluated against the > "conjuncts that" Done Line 154: private List minMaxPlanConjuncts_ = Lists.newArrayList(); > "plan conjuncts" doesn't really mean anything. original conjuncts? Done Line 302:* Builds a predicate to evaluate parquet::Statistics by copying 'inputSlot' into > "to evaluate against" Done Line 347: // Only constant exprs can be evaluated against the parquet::Statistics. This > drop "the" Done Line 349: if (!constExpr.isConstant()) continue; > as pointed out before, you shouldn't see non-literal constants here See my reply in PS4, constant folding depends on the enable_expr_rewrites query option. http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 1047:runtime filters: RF000 -> c_custkey > extended is fine. Done http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: Line 1: # Test HDFS scan node. > run this file in extended mode so you see the stats predicates. Done http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 206: # Test that without expr rewrite and thus without constant folding, constant exprs still > what's the point of those tests? See my comment regarding literal vs. constant exprs in PS4. These tests make sure that we don't break support for constant exprs. -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Lars Volker has uploaded a new patch set (#7). Change subject: IMPALA-2328: Read support for min/max Parquet statistics .. IMPALA-2328: Read support for min/max Parquet statistics This change adds support for skipping row groups based on Parquet row group statistics. With this change we only support reading statistics from Parquet files for numerical types (bool, integer, floating point) and for simple predicates of the formsor , where is LT, LE, GE, GT, and EQ. Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h A be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exprs/expr.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test A testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py 32 files changed, 1,121 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6032/7 -- To view, visit http://gerrit.cloudera.org:8080/6032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4057 and IMPALA-4050 Support starting webserver specified by hostname or "127.0.0.1"
Jim Apple has abandoned this change. Change subject: IMPALA-4057 and IMPALA-4050 Support starting webserver specified by hostname or "127.0.0.1" .. Abandoned No contact from author in over 2 months -- To view, visit http://gerrit.cloudera.org:8080/4553 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I54280a25c3709e2f316a17a6baf33dbbbae720c0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: hewenting Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: hewenting
[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function
anujphadke has posted comments on this change. Change subject: IMPALA-4848: Add WIDHT_BUCKET() function .. Patch Set 2: (7 comments) Please review this change after I push a new patch which addresses some overflow bugs. http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 427: const IntVal& num_buckets) { > do all null checks in one if Done Line 429: return IntVal::null(); > use min_range.val directly because the == operator will check for null agai Done Line 430: } > Lower... Done Line 434: } > should -> must Done http://gerrit.cloudera.org:8080/#/c/6023/2/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 455: double bucket_width = (max_range.val - min_range.val) / num_buckets.val; This can overflow. Working on handling this. select width_bucket(1.5e+200, -1.7e+308, 1.2e+308, 900); Line 457: int64_t result = static_cast((expr.val - min_range.val) / bucket_width) + 1; This can overflow. Working on handling this. http://gerrit.cloudera.org:8080/#/c/6023/1/be/src/exprs/math-functions.h File be/src/exprs/math-functions.h: Line 128: /// Returns true if no parse_res == PARSE_SUCCESS || parse_res == PARSE_OVERFLOW. > Move in the public section like the other functions. Done -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. Patch Set 3: yup thanks for pinging about that, I just submitted it for gvm -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Alex Behm has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/293/ -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4959: Avoid picking up the system's boost cmake module
David Ribeiro Alves has posted comments on this change. Change subject: IMPALA-4959: Avoid picking up the system's boost cmake module .. Patch Set 3: mj, alex: what's missing here? gvm run? not sure that'll do much -- To view, visit http://gerrit.cloudera.org:8080/5994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I759e1a4f8f69727cc1224bf460326140fd2131a2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 8: Code-Review+1 (2 comments) Thanks! Let's see if Alex has any more comments. http://gerrit.cloudera.org:8080/#/c/6025/8//COMMIT_MSG Commit Message: PS8, Line 22: No new tests were added, can update this now http://gerrit.cloudera.org:8080/#/c/6025/8/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: PS8, Line 1163: mediam spelling -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/aggregation.test M testdata/workloads/functional-query/queries/
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#8). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Performance benchmark: I ran a performance benchmark locally on release build. The following query results in 3,000 grouping keys and about 30,000 values per key: SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t Before: 11.57s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 96.086us 96.086us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 26.629us 26.629us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 68.187us 87.887us 3 1 44.00 KB 10.00 MB 04:AGGREGATE32s851ms5s362ms 3.00K -1 1.95 GB 128.00 MB FINALIZE 03:EXCHANGE 3 119.540ms 220.191ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE35s876ms6s254ms 9.00K -1 2.93 GB 128.00 MB STREAMING 00:SCAN HDFS3 127.834ms 146.842ms 98.30M -1 19.80 MB 32.00 MB tpcds_10_parquet.benchmark After: 13.58s Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --- 06:AGGREGATE1 101.101us 101.101us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 32.296us 32.296us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 83.284us 120.137us 3 1 44.00 KB 10.00 MB 04:AGGREGATE33s190ms6s555ms 3.00K -1 1.96 GB 128.00 MB FINALIZE 03:EXCHANGE 3 247.897ms 497.280ms 9.00K -1 0 0 HASH(c1) 01:AGGREGATE37s370ms8s460ms 9.00K -1 4.71 GB 128.00 MB STREAMING 00:SCAN HDFS3 111.721ms 122.306ms 98.30M -1 19.94 MB 32.00 MB tpcds_10_parquet.benchmark Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 19.82 MB. Summary before: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 15.856ms5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE33s721ms3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE32s851ms3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail 04:EXCHANGE 1 288.125us 288.125us 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE39.358ms 10.982ms 14.82K 15.21K 19.82 MB 10.00 MB FINALIZE 02:EXCHANGE 3 129.832us 154.953us 15.62K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 11.086ms 13.102ms 15.62K 15.21K 9.49 MB 10.00 MB STREAMING 00:SCAN HDFS3 40.154ms 50.220ms 183.59K 183.59K 2.94 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/aggregation.test M testdata/workloads/functional-query/queries/
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 958: ReservoirSampleState > Because this is not a vector or a list, it's state for reservoir sampling w Done -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Three misc webpage changes
David Knupp has posted comments on this change. Change subject: Three misc webpage changes .. Patch Set 1: The changes here seem reasonable. Is there a plan to migrate other HTML tables to DataTables? (E.g., Queries, Query Locations, ...) -- To view, visit http://gerrit.cloudera.org:8080/6045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2 .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/292/ -- To view, visit http://gerrit.cloudera.org:8080/6109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS21, Line 110: abs > std::abs() is not defined for int128_t maybe add a comment for future readers of this code: // std::abs() is not defined for int128_t. -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Zach Amsden has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS21, Line 110: abs > why not std:abs() here? std::abs() is not defined for int128_t -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] Three misc webpage changes
Matthew Jacobs has posted comments on this change. Change subject: Three misc webpage changes .. Patch Set 1: Code-Review+1 seems fine but I don't really know jquery -- To view, visit http://gerrit.cloudera.org:8080/6045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2
Alex Behm has posted comments on this change. Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2 .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6109/1/tests/hs2/test_fetch.py File tests/hs2/test_fetch.py: Line 179: assert re.match(r"id, INT, -?\d+, -?\d+, -?\d+, 4.0", result) is not None haha nice -- To view, visit http://gerrit.cloudera.org:8080/6109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4962: Fix SHOW COLUMN STATS for HS2
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/6109 Change subject: IMPALA-4962: Fix SHOW COLUMN STATS for HS2 .. IMPALA-4962: Fix SHOW COLUMN STATS for HS2 Impala incorrectly returned NULLs in the "Max Size" column of the SHOW COLUMN STATS result when executed through the HS2 interface. The issue was that the column was specified to be type INT in the result schema, but the actual type of the contents that we inserted into it was "long". The reason why this is not an issue in Impala shell is because we stringify the contents without inspecting the metadata for beeswax results. The issue was fixed by changing the type from INT to BIGINT. Change-Id: I419657744635dfdc2e1562fe60a597617fff446e --- M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbase-show-stats.test M testdata/workloads/functional-query/queries/QueryTest/show-stats.test M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test M tests/hs2/test_fetch.py 12 files changed, 49 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6109/1 -- To view, visit http://gerrit.cloudera.org:8080/6109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] Three misc webpage changes
Henry Robinson has posted comments on this change. Change subject: Three misc webpage changes .. Patch Set 1: Anyone want to take a quick look at this? -- To view, visit http://gerrit.cloudera.org:8080/6045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica0578dabb7e27e6fd45ee4f31a1418ac3adc891 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/5951/19/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1621: { "cast(555 as decimal(38,0))", > could you add a comment saying how many fives that is. I counted 38, so why oops, I counted wrong -- i think there are 39 5's here? -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2020, 4915, 4936: Add rounding for decimal casts
Dan Hecht has posted comments on this change. Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts .. Patch Set 21: Code-Review+2 (2 comments) Michael, can you take a final pass as well? http://gerrit.cloudera.org:8080/#/c/5951/19/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1621: { false, -1000, 5, 0 }}}, could you add a comment saying how many fives that is. I counted 38, so why does this overflow? http://gerrit.cloudera.org:8080/#/c/5951/21/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: PS21, Line 110: abs why not std:abs() here? -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 958: ReservoirSampleState > Why do you think a vector-like interface is not good? It seems simple and i Because this is not a vector or a list, it's state for reservoir sampling which contains number of samples, the source size, a RNG, *and* a list of samples. Putting methods on this that make it look like a vector is confusing. This isn't even a struct anymore IMO. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401 [DOCS] Phase 7 of "Cloudera Manager" removal
Laurel Hale has uploaded a new change for review. http://gerrit.cloudera.org:8080/6108 Change subject: IMPALA-3401 [DOCS] Phase 7 of "Cloudera Manager" removal .. IMPALA-3401 [DOCS] Phase 7 of "Cloudera Manager" removal Removed the topic "impala_howto_rm.xml" from the build because it was focused on resource management by using Cloudera Manager. Commented it out of impala.ditamap, impala_keydefs.ditamap, and from a reference to it in shared/impala_common.xml. This removes it completely from the rendered upstream docs. A later project will remove the source XML. Change-Id: I26fc661144ba906828d5f6b5b2ea2eca02693369 --- M docs/impala.ditamap M docs/impala_keydefs.ditamap M docs/shared/impala_common.xml 3 files changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6108/1 -- To view, visit http://gerrit.cloudera.org:8080/6108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I26fc661144ba906828d5f6b5b2ea2eca02693369 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laurel Hale
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/6107 Change subject: IMPALA-3079: Fix sequence file writer .. IMPALA-3079: Fix sequence file writer Before the fix, sequence file writer produced corrupt files in some cases. Steps to reproduce: SET ALLOW_UNSUPPORTED_FORMATS=1; create table store_sales_seq_snap like tpcds_parquet.store_sales stored as SEQUENCEFILE; insert into store_sales_seq_snap partition(ss_sold_date_sk) select * from tpcds_parquet.store_sales where ss_sold_date_sk between 2450816 and 2451200; The insert statement produces a corrupt file that cannot be read back. The fix corrects the implementation of zero-compressed encoding in ReadWriteUtil class. It also fixes the calculation of block sizes in SnappyBlockCompressor::ProcessBlock(). Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 --- M be/src/exec/read-write-util-test.cc M be/src/exec/read-write-util.h M be/src/util/compress.cc M be/src/util/decompress-test.cc M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M tests/query_test/test_compressed_formats.py 6 files changed, 119 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6107/1 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-4848: Add WIDHT BUCKET() function
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-4848: Add WIDHT_BUCKET() function .. IMPALA-4848: Add WIDHT_BUCKET() function Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f --- M be/src/exprs/expr-test.cc M be/src/exprs/math-functions-ir.cc M be/src/exprs/math-functions.h M common/function-registry/impala_functions.py 4 files changed, 61 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/6023/2 -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: anujphadke