[Impala-ASF-CR] IMPALA-4033,IMPALA-4105: Improvements of partition DDL.
Amos Bird has uploaded a new change for review. http://gerrit.cloudera.org:8080/5137 Change subject: IMPALA-4033,IMPALA-4105: Improvements of partition DDL. .. IMPALA-4033,IMPALA-4105: Improvements of partition DDL. This commit makes add partition op treat string partition-key values as case sensitive in consistency with other related partition DDL operations. REFRESH statements now support general partition exprs. Change-Id: I0f4e46ec0a63b46e485141290268d019c3dd15c7 --- M bin/start-catalogd.sh M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java 11 files changed, 80 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/5137/1 -- To view, visit http://gerrit.cloudera.org:8080/5137 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0f4e46ec0a63b46e485141290268d019c3dd15c7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Amos Bird
[Impala-ASF-CR] IMPALA-2890: Support ALTER TABLE statements for Kudu tables
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/5136 Change subject: IMPALA-2890: Support ALTER TABLE statements for Kudu tables .. IMPALA-2890: Support ALTER TABLE statements for Kudu tables With this commit, we add support for additional ALTER TABLE statements against Kudu tables. The new supported ALTER TABLE operations for Kudu are: - ADD/DROP range partitions. Syntax: ALTER TABLE ADD [IF NOT EXISTS] RANGE ALTER TABLE DROP [IF EXISTS] RANGE - ADD/DROP/RENAME column. Syntax: ALTER TABLE ADD COLUMNS (col_spec, [col_spec, ...]) ALTER TABLE DROP COLUMN ALTER TABLE CHANGE COLUMN - Rename Kudu table using the 'kudu.table_name' table property. Example: ALTER TABLE SET TBLPROPERTY ('kudu.tbl_name'=''), will change the underlying Kudu table name to . - Renaming the HMS/Catalog table entry of a Kudu table is supported using the existing ALTER TABLE RENAME TO syntax. Not supported: - ALTER TABLE REPLACE COLUMNS Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableAddReplaceColsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test M tests/query_test/test_kudu.py 17 files changed, 815 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/5136/1 -- To view, visit http://gerrit.cloudera.org:8080/5136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 3: Added test files that have both plain and binary encodings. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. Patch Set 3: s/binary/dictionary in last comment. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Henry Robinson has uploaded a new patch set (#3). Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner * Extend metadata checks to allow more than one possible physical type for a given logical type. * Change decimal decoding to handle non-fixed-length format in same path as fixed-length encoding. Testing: * Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Perf: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded decimal(12,2) values using FIXED_BYTE_ARRAY physical type encoding. * The overhead of decoding with the extra branch was measured at 1s; i.e. the per-decode overhead is 1ns. Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test M tests/query_test/test_scanners.py 7 files changed, 135 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5115/3 -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3726: Add support for Kudu-specific column options .. Patch Set 8: Code-Review+2 Minor test fix and rebase. Carry Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/5026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options
Hello Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5026 to look at the new patch set (#8). Change subject: IMPALA-3726: Add support for Kudu-specific column options .. IMPALA-3726: Add support for Kudu-specific column options This commit adds support for Kudu-specific column options in CREATE TABLE statements. The syntax is: CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ]) where option is: | NULL | NOT NULL | ENCODING encoding_val | COMPRESSION compression_algorithm | DEFAULT expr | BLOCK_SIZE num The output of the SHOW CREATE TABLE statement was altered to include all the specified column options for Kudu tables. Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 --- M be/src/exec/kudu-table-sink.cc M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableAddReplaceColsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AvroSchemaParser.java M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test M tests/query_test/test_kudu.py M tests/shell/test_shell_commandline.py 31 files changed, 973 insertions(+), 254 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5026/8 -- To view, visit http://gerrit.cloudera.org:8080/5026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-4510: Selectively filter args for metric verification tests
David Knupp has uploaded a new change for review. http://gerrit.cloudera.org:8080/5135 Change subject: IMPALA-4510: Selectively filter args for metric verification tests .. IMPALA-4510: Selectively filter args for metric verification tests run-tests.py is a wrapper around impala-pytest. It abstracts away the need to invoke separate individual runs for serial tests, parallel tests, and metric verification tests. Because it's possible for a user to specify certain test suites, or even specific tests, on the command line when calling run-tests.py, it had been necessary to override the command line args when it came time to run the metric verification tests -- otherwise those other tests/suites would be rerun. Before this patch, we had simply been stripping away all command line args. However, that blanket approach causes problems when running tests against a remote cluster, because we need to retain those command line args that pertain to the remote cluster. This patch selectively prunes unwanted command line args for the last metric verification test stage, keeping the ones that we need, and also adds extensive documentation for explaining why we have to go through this fairly odd and elaborate step. Change-Id: I069172f44c1307d55f85779cdb01fecc0ba1799e --- M tests/run-tests.py 1 file changed, 55 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/5135/1 -- To view, visit http://gerrit.cloudera.org:8080/5135 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I069172f44c1307d55f85779cdb01fecc0ba1799e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp
[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping
Jim Apple has posted comments on this change. Change subject: Don't overwrite user's .ssh/config file when bootstrapping .. Patch Set 3: > Build failed: > http://ec2-35-161-220-160.us-west-2.compute.amazonaws.com:8080/job/gerrit-verify-dryrun/30/ This was just me testing the verification system -- To view, visit http://gerrit.cloudera.org:8080/4967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping
Impala Public Jenkins has posted comments on this change. Change subject: Don't overwrite user's .ssh/config file when bootstrapping .. Patch Set 3: Build failed: http://ec2-35-161-220-160.us-west-2.compute.amazonaws.com:8080/job/gerrit-verify-dryrun/30/ -- To view, visit http://gerrit.cloudera.org:8080/4967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping
Impala Public Jenkins has posted comments on this change. Change subject: Don't overwrite user's .ssh/config file when bootstrapping .. Patch Set 3: Build started: http://ec2-35-161-220-160.us-west-2.compute.amazonaws.com:8080/job/gerrit-verify-dryrun/30/ -- To view, visit http://gerrit.cloudera.org:8080/4967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. Patch Set 2: jbapple testing gerrit message command -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present This change introduces a new catalogd startup option (init_first_metastore_client_timeout_seconds) that specifies the time in seconds catalogd should spend on retrying to establish a connection to HMS the first time on startup before giving up and exiting fatally. Setting this startup option to a value that is greater than the HMS startup time will allow CM to start Impala at the same time or even before HMS. The default value of init_first_metastore_client_timeout_seconds is 120 seconds. Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Reviewed-on: http://gerrit.cloudera.org:8080/5095 Reviewed-by: Henry Robinson Tested-by: Internal Jenkins --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java A tests/experiments/test_catalog_hms_failures.py 10 files changed, 285 insertions(+), 49 deletions(-) Approvals: Henry Robinson: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping
Impala Public Jenkins has posted comments on this change. Change subject: Don't overwrite user's .ssh/config file when bootstrapping .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping
Impala Public Jenkins has submitted this change and it was merged. Change subject: Don't overwrite user's .ssh/config file when bootstrapping .. Don't overwrite user's .ssh/config file when bootstrapping >From bash's manual page on redirecting with '>' Redirection of output causes the file whose name results from the expansion of word to be opened for writing on file descriptor n, or the standard output (file descriptor 1) if n is not specified. If the file does not exist it is created; if it does exist it is truncated to zero size. Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Reviewed-on: http://gerrit.cloudera.org:8080/4967 Reviewed-by: Jim Apple Tested-by: Impala Public Jenkins --- M bin/bootstrap_development.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates
Michael Ho has uploaded a new patch set (#5). Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates .. IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates This change codegens HdfsParquetScanner::EvalRuntimeFilters() by unrolling its loop, codegen'ing the expression evaluation of the runtime filter and replacing some type information with constants in the hashing function of runtime filter to avoid branching at runtime. This change also fixes IMPALA-4495 by not counting a row as 'considered' in the filter stats before the filter arrives. This avoids unnecessarily marking a runtime filter as ineffective before it's even used. With this change, TPCDS-Q88 improves by 13-14%. primitive_broadcast_join_1 improves by 24%. Change-Id: I27114869840e268d17e91d6e587ef811628e3837 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.h A be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/types.h M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M tests/query_test/test_tpch_queries.py 27 files changed, 507 insertions(+), 166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4833/5 -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates
Michael Ho has posted comments on this change. Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4833/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 716: ctx.stats->IncrCounters(stats_name, 1, 1, !passed_filter); > It seems inconsistent that we're not counting partitions processed before t Reverted to the previous behavior. -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Michael Ho has posted comments on this change. Change subject: IMPALA-4432: Handle internal codegen disabling properly .. Patch Set 4: Code-Review+1 Carry Tim's +1 forward. -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Michael Ho has posted comments on this change. Change subject: IMPALA-4432: Handle internal codegen disabling properly .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5105/3/be/src/runtime/plan-fragment-executor.cc File be/src/runtime/plan-fragment-executor.cc: Line 273: if (runtime_state_->CodegenDisabled() && !runtime_state_->HasScalarFnToCodegen()) return; > Long line Fixed. -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5105 to look at the new patch set (#4). Change subject: IMPALA-4432: Handle internal codegen disabling properly .. IMPALA-4432: Handle internal codegen disabling properly There are some conditions in which codegen is disabled internally even if it's enabled in the query option. For instance, the single node optimization or the expression evaluation requests sent from the FE to the BE. These internal disabling of codegen are advisory as their purposes are to reduce the latency for tables with no or very few rows. The internal disabling of codegen doesn't interact well with UDFs which cannot be interpreted (e.g. IR UDF) as it conflates the 'disable_codegen' query option set by the user. As a result, it's hard to differentiate between when codegen is disabled explicitly by users and when it is disabled internally. This change fixes the problem above by adding an explicit flag in TQueryCtx to indicate that codegen is disabled internally. This flag is only advisory. For cases in which codegen is needed to function, this internal flag is ignored and if codegen is disabled via query option, an error is thrown. For this new flag to work with ScalarFnCall, codegen needs to happen after ScalarFnCall::Prepare() because it's hard to tell if a fragment contains any UDF that cannot be interpreted until after ScalarFnCall::Prepare() is called. However, Prepare() needs the codegen object to codegen so it needs to be created before Prepare(). We can either always create the codegen module or defer codegen to a point after ScalarFnCall::Prepare(). The former has the downside of introducing unnecessary latency for say single-node optimization so the latter is implemented. It is needed as part of IMPALA-4192 any way. After this change, ScalarFnCall expressions which need to be codegen'd are inserted into a vector in RuntimeState in ScalarFnCall::Prepare(). Later in the codegen phase, these expressions' GetCodegendComputeFn() will be called after codegen for operators is done. If any of these expressions are already codegen'd indirectly by the operators, GetCodegendComputeFn() will be a no-op. This preserves the behavior that ScalarFnCall will always be codegen'd even if the fragment doesn't contain any codegen enabled operators. Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 --- M be/src/exec/aggregation-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/fe-support.cc M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/planner/Planner.java M tests/query_test/test_udfs.py 24 files changed, 169 insertions(+), 109 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5105/4 -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4833/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 716: ctx.stats->IncrCounters(stats_name, 1, 1, !passed_filter); It seems inconsistent that we're not counting partitions processed before the filter arrived but we are counting rows. Would be good to clarify in filter-context.h what the counters mean more exactly. I feel like maybe we don't need 'total_possible' - 'considered' seems more informative. -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. IMPALA-3200: Implement suballocator for splitting buffers This is useful for situations like hash tables, where we want to make multiple non-spillable allocations of variable size from buffer pool memory and not incur the overhead of interacting with the global buffer pool. The allocator subdivides buffers to service allocations and uses a buddy allocation algorithm to merge freed allocations into larger chunks. This helps avoid external fragmentation and is quite effective at reusing memory given the typical doubling allocation patterns of hash tables in partitioned aggs and joins. Testing: The allocator has fairly robust internal consistency checks via assertions and unique_ptrs. Includes a unit test that exercises various allocation patterns. I tested porting hash tables over to allocate memory using an earlier version of the suballocator, which worked well (was able to run a wide range of queries successfully with good performance). Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 --- M be/src/bufferpool/CMakeLists.txt A be/src/bufferpool/suballocator-test.cc A be/src/bufferpool/suballocator.cc A be/src/bufferpool/suballocator.h M be/src/common/names.h 5 files changed, 830 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/5 -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates
Michael Ho has posted comments on this change. Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/4833/3//COMMIT_MSG Commit Message: PS3, Line 7: IMPALA-3838, > Need to update commit message to mention IMPALA-4495. Done http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: PS3, Line 137: LlvmCodeGen::FnPrototype prototype(codegen, "FilterContextEval", > long line. Done http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS3, Line 286: total_poss > total_possible Done PS3, Line 700: noinline > noinline Done -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers .. Patch Set 4: (34 comments) http://gerrit.cloudera.org:8080/#/c/4715/4//COMMIT_MSG Commit Message: PS4, Line 13: buddy allocation > Buddy allocators tend to have high internal fragmentation.Did you consider The main use case is to allocate power-of-two buffers, so currently this isn't a problem. Maybe there are some other use cases down the line but for now it seems better to go with a simple algorithm. http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator-test.cc File be/src/bufferpool/suballocator-test.cc: Line 28: > why the gap? Done Line 44: virtual void SetUp() { > override, here and below Done PS4, Line 63: != NULL > elide I think this is a matter of taste, but we've generally standardised on explicit null checks. PS4, Line 68: Random > 'Randomness' or 'PRNG'; the seed itself is not random Wikipedia knows what I mean :) https://en.wikipedia.org/wiki/Random_seed Line 75: void VerifyMemoryValid(const vector>& allocs); > static Done Line 78: void FreeAllocations( > static Done PS4, Line 80: unique_ptr > auto Done PS4, Line 88: default_random_engine > It might be good to pick one for repeatability reasons Done http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator.cc File be/src/bufferpool/suballocator.cc: Line 40: free_lists_[i] = NULL; > This is the default. If you elide this loop, this should happen anyway. Done Line 59: const int target_list_idx = Bits::Log2CeilingNonZero64(bytes); > How do you know bytes is non-zero? This is now handled with the minimum allocation size logic. Line 60: for (int i = target_list_idx; i < ALLOCATION_SIZES; ++i) { > This can be O(1) with ctz and a bitmap for the freelist emptiness It's O(1) now just with a higher constant factor assuming an upper bound on allocation sizes. I'd say we should defer this unless we actually see it become a bottleneck. Line 88: int64_t buffer_len = max(min_buffer_len_, BitUtil::RoundUpToPowerOfTwo(bytes)); > const Done Line 127: if (!status.ok()) { > if (!Suballocation::Create(&nodes[i]).ok()) {... Done Line 144: int64_t child_len = free_node->len_ / 2; > const Done PS4, Line 233: != NULL > elide We wouldn't normally elide it. http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator.h File be/src/bufferpool/suballocator.h: Line 31: enum class ExpandReservations { > ExpansionStrategy, maybe? Done Line 37: }; > This can live inside Suballocator, right? It can but we end up with Suballocator::ExpansionStrategy everywhere, which is pretty verbose. PS4, Line 39: buddy > There are a few different variants of buddy systems. It might be good to sp Elaborated slightly in this comment. PS4, Line 43: relatively large allocations > Yet free_lists_ goes all the way down to one byte? Why not enforce this at I think my main concern there would be that the memory overhead of the allocations isn't tracked, so could overwhelm the actual tracked memory. I added logic to enforce a minimum allocation size and round up allocations to that amount (DCHECKing or similar seems a little awkward). Line 50: ExpandReservations expand_reservations); > Can you annotate these params in a comment Done Line 54: /// Allocate bytes from BufferPool. The allocation is NULL if unsuccessful because > nullptr, here and elsewhere Done Line 71: /// Allocate a buffer of size 'bytes' from the buffer pool and initialize 'result' > bytes must be less than MAX_ALLOCATION_BYTES Done PS4, Line 79: Can fail if malloc() fails > if new returns nullptr Reworded to somethign more generic about memory allocation failing. PS4, Line 109: by > 'but' Done PS4, Line 127: must > or else leaked memory or some kind of segfault or nasal demons? Done PS4, Line 132: Destructor > Elide Done Line 138: private: > also disallow_copy_and_assign? Done Line 152: /// The buffer backing the suballocation, if the suballocation is backed by an entire > capitalization, here and elsewhere: Suballocation and Allocation Not sure why we'd capitalise allocation since it's not a class. Line 153: /// buffer. Otherwise uninitialized. 'buffer_' is open iff buddy is NULL. > buddy_ Done Line 156: /// If this is a left child, the parent of this and its buddy. The parent's allocation > It would be helpful to explain the tree structure first. Added some explanation to the class comment. Line 159: std::unique_ptr parent_; > Can the parent be a buddy, thus being pointed to by its buddy and one of it Yes. I think the explanation in the class comment should cover it. Line 171: Suballocation* prev_free_; > so next_free_ is not really unique? I'm troubled by this design pattern. Di unique_ptr to me means unique ownership, which doesn't preclude a raw pointer
[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates
Michael Ho has uploaded a new patch set (#4). Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates .. IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates This change codegens HdfsParquetScanner::EvalRuntimeFilters() by unrolling its loop, codegen'ing the expression evaluation of the runtime filter and replacing some type information with constants in the hashing function of runtime filter to avoid branching at runtime. This change also fixes IMPALA-4495 by not counting a row as 'considered' in the filter stats before the filter arrives. This avoids unnecessarily marking a runtime filter as ineffective before it's even used. With this change, TPCDS-Q88 improves by 13-14%. primitive_broadcast_join_1 improves by 24%. Change-Id: I27114869840e268d17e91d6e587ef811628e3837 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.h A be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/types.h M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M tests/query_test/test_tpch_queries.py 27 files changed, 509 insertions(+), 167 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4833/4 -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().
Michael Ho has posted comments on this change. Change subject: IMPALA-3838: Codegen EvalRuntimeFilters(). .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: PS3, Line 137: LlvmCodeGen::FnPrototype prototype(codegen, "FilterContextEval", codegen->boolean_type()); long line. http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS3, Line 286: considered total_possible PS3, Line 700: alwaysinline noinline -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().
Michael Ho has posted comments on this change. Change subject: IMPALA-3838: Codegen EvalRuntimeFilters(). .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4833/3//COMMIT_MSG Commit Message: PS3, Line 7: IMPALA-3838: Need to update commit message to mention IMPALA-4495. -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().
Michael Ho has posted comments on this change. Change subject: IMPALA-3838: Codegen EvalRuntimeFilters(). .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/4833/1/be/src/exec/hdfs-parquet-scanner-ir.cc File be/src/exec/hdfs-parquet-scanner-ir.cc: Line 73: if (stats->enabled && ctx->filter->HasBloomFilter()) { > I had it reversed - as far as I can see, before filter arrival, it will inc Thanks for filing the bug. It's fixed in the new patch. http://gerrit.cloudera.org:8080/#/c/4833/2/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: PS2, Line 61: ow against > Is it a tuple row? I though it was a raw value pointer? It means to say the value is computed by evaluating a tuple row against the expression. Comment rephrased. -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-3838: Codegen EvalRuntimeFilters(). .. IMPALA-3838: Codegen EvalRuntimeFilters(). This change codegens HdfsParquetScanner::EvalRuntimeFilters() by unrolling its loop, codegen'ing the expression evaluation of the runtime filter and replacing some type information with constants. With this change, TPCDS-Q88 improves by 13-14%. primitive_broadcast_join_1 improves by 24%. Change-Id: I27114869840e268d17e91d6e587ef811628e3837 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M be/src/runtime/runtime-filter-bank.h A be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/types.h M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M tests/query_test/test_tpch_queries.py 27 files changed, 509 insertions(+), 168 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4833/3 -- To view, visit http://gerrit.cloudera.org:8080/4833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3726: Add support for Kudu-specific column options .. Patch Set 7: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/499/ -- To view, visit http://gerrit.cloudera.org:8080/5026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4410: Safer tear-down of RuntimeState
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4410: Safer tear-down of RuntimeState .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.cc File be/src/runtime/test-env.cc: Line 109: for (shared_ptr& runtime_state : runtime_states_) { > Any thoughts? The second alternative makes most sense to me since it preserves the original intent. -- To view, visit http://gerrit.cloudera.org:8080/4893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie416e4d57240142bf685385299b749c3a6792c45 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. IMPALA-4490: Only generate runtime filters for hash join nodes. Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Reviewed-on: http://gerrit.cloudera.org:8080/5117 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 2 files changed, 34 insertions(+), 5 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4410: Safer tear-down of RuntimeState
Henry Robinson has posted comments on this change. Change subject: IMPALA-4410: Safer tear-down of RuntimeState .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.cc File be/src/runtime/test-env.cc: Line 109: for (shared_ptr& runtime_state : runtime_states_) { > What do you propose - summing over the distinct query IDs? Or checking in C Any thoughts? -- To view, visit http://gerrit.cloudera.org:8080/4893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie416e4d57240142bf685385299b749c3a6792c45 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Tim Armstrong has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > For context: do you also think in most cases it is a bug if we allow unsign Mostly yes, although we don't use unsigned ints in many places in the codebase. -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Jim Apple has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > I'm assuming we'll disable the UBSAN checking temporarily, but later come b For context: do you also think in most cases it is a bug if we allow unsigned integers to overflow? -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Tim Armstrong has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > -fwrapv and the ubsan check are independent, so we can enable one and disab I'm assuming we'll disable the UBSAN checking temporarily, but later come back and fix it. I think in most cases it's a bug if we allow signed integers to overflow. It would be good to identify the subset of places where it is expected and make it explicit that overflow is expected. -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Jim Apple has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > Not sure if that's sufficient, may require some runtime configuration too. -fwrapv and the ubsan check are independent, so we can enable one and disable the other. What kind of bugs do you think -fwrapv will mask? Will those bugs still be masked even if there is no ubsan checking? -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 15: (5 comments) http://gerrit.cloudera.org:8080/#/c/4863/15/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: PS15, Line 329: key Could do std:move() to avoid copy of string http://gerrit.cloudera.org:8080/#/c/4863/15/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: PS15, Line 168: form from Line 214: /// Appends all rows in batch to the temporary Hdfs files their respective partitions. You're missing a verb in here I think Line 269: PartitionPair* current_clustered_partition_; "Only set if 'input_is_clustered_' is true" Line 272: /// batches. "Only set if 'input_is_clustered_' is true" -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4956 to look at the new patch set (#8). Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. IMPALA-4397,IMPALA-3259: reduce codegen time and memory A handful of fixes to codegen memory usage: * Delete the IR module when we're done with it (it can be fairly large) * Track the compiled code size (typically not that large, but it can add up if there are many fragments). * Estimate optimisation memory requirements and track it in the memory tracker. This is very crude but much better than not tracking it. A handful of fixes to improve codegen time/cost, particularly targeted at compute stats workloads: * Avoid over-inlining when there are many aggregate functions, conjuncts, etc by adding "NoInline" attributes. * Bail of out text scanner codegen for wide tables. * Don't codegen non-grouping merge aggregations. They will only process one row per Impala daemon, so codegen is not worth it. * Make the Hll algorithm more efficient by specialising the hash function based on decimal width. Limitations: * This doesn't tackle over-inlining of large expr trees, but a similar approach will be used there in a follow-on patch. Perf: Compute stats on functional_parquet.widetable_1000_cols goes from 1min+ of codegen to < 1s codegen on my machine. Local perf runs of tpc-h and targeted perf showed no regressions and some moderate improvements (1-2%). Also did an experiment to understand the perf consequences of disabling inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran: drop stats tpch_20_parquet.lineitem compute stats tpch_20_parquet.lineitem; There was no difference in time spent in the agg node: 30.7s with inlining, 30.5s without. Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/codegen/mcjit-mem-mgr.h M be/src/exec/aggregation-node.cc M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/anyval-util.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.h M be/src/runtime/lib-cache.cc M be/src/runtime/runtime-state.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test M tests/query_test/test_aggregation.py 29 files changed, 1,467 insertions(+), 165 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/8 -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS7, Line 602: codegen > It still doesn't track the memory consumed by the LLVM module (e.g. Instruc The optimisation memory estimate does include the IR, but only after it's constructed, since we don't really know how large it will be until it's generated. http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS7, Line 809: ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD > nit: May be worth refactoring it into a singe variable with meaningful name Done http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS7, Line 1706: if (aggregate_evaluators_.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) { > As discussed in person, it may be worth double checking whether min / max / I experimented with a code change that disabled inlining always, and looked at the agg node time. I saw: 25% regression for a single sum(decimal), 3% regression for a single ndv(decimal) and a 2% speedup for compute stats on lineitem. For sum() of only numeric columns in lineitem (8 columns) I saw a 23% slowdown. It could be worth trying to special-case those aggregate functions, but it seems like affected queries would be pretty rare. It's also a little tricky since the input expressions could be arbitrarily complex. http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); > Thanks. Similar comments about the need to spell out the difference between There wasn't a particularly natural place to document this general diea. I chose to clarify in the thrift structure that disabling codegen means that the backend should do it if it supports it. That seemed the best place since it's the interface between the FE and BE. -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.
Jim Apple has submitted this change and it was merged. Change subject: IMPALA-3398: Add docs to main Impala branch. .. IMPALA-3398: Add docs to main Impala branch. These are refugees from doc_prototype. They can be rendered with the DITA Open Toolkit version 2.3.3 by: /tmp/dita-ot-2.3.3/bin/dita \ -i impala.ditamap \ -f html5 \ -o $(mktemp -d) \ -filter impala_html.ditaval Change-Id: I8861e99adc446f659a04463ca78c79200669484f Reviewed-on: http://gerrit.cloudera.org:8080/5014 Reviewed-by: John Russell Tested-by: John Russell --- A docs/Cloudera-Impala-Release-Notes.ditamap A docs/generatingImpalaDoc.md A docs/images/howto_access_control.png A docs/images/howto_per_node_peak_memory_usage.png A docs/images/howto_show_histogram.png A docs/images/howto_static_server_pools_config.png A docs/images/impala_arch.jpeg A docs/images/support_send_diagnostic_data.png A docs/impala.ditamap A docs/impala_html.ditaval A docs/impala_pdf.ditaval A docs/impala_sqlref.ditamap A docs/shared/ImpalaVariables.xml A docs/shared/impala_common.xml A docs/topics/impala.xml A docs/topics/impala_abort_on_default_limit_exceeded.xml A docs/topics/impala_abort_on_error.xml A docs/topics/impala_admin.xml A docs/topics/impala_admission.xml A docs/topics/impala_aggregate_functions.xml A docs/topics/impala_aliases.xml A docs/topics/impala_allow_unsupported_formats.xml A docs/topics/impala_alter_function.xml A docs/topics/impala_alter_table.xml A docs/topics/impala_alter_view.xml A docs/topics/impala_analytic_functions.xml A docs/topics/impala_appx_count_distinct.xml A docs/topics/impala_appx_median.xml A docs/topics/impala_array.xml A docs/topics/impala_auditing.xml A docs/topics/impala_authentication.xml A docs/topics/impala_authorization.xml A docs/topics/impala_avg.xml A docs/topics/impala_avro.xml A docs/topics/impala_batch_size.xml A docs/topics/impala_bigint.xml A docs/topics/impala_bit_functions.xml A docs/topics/impala_boolean.xml A docs/topics/impala_breakpad.xml A docs/topics/impala_cdh.xml A docs/topics/impala_char.xml A docs/topics/impala_cluster_sizing.xml A docs/topics/impala_cm_installation.xml A docs/topics/impala_comments.xml A docs/topics/impala_complex_types.xml A docs/topics/impala_components.xml A docs/topics/impala_compression_codec.xml A docs/topics/impala_compute_stats.xml A docs/topics/impala_concepts.xml A docs/topics/impala_conditional_functions.xml A docs/topics/impala_config.xml A docs/topics/impala_config_options.xml A docs/topics/impala_config_performance.xml A docs/topics/impala_connecting.xml A docs/topics/impala_conversion_functions.xml A docs/topics/impala_count.xml A docs/topics/impala_create_data_source.xml A docs/topics/impala_create_database.xml A docs/topics/impala_create_function.xml A docs/topics/impala_create_role.xml A docs/topics/impala_create_table.xml A docs/topics/impala_create_view.xml A docs/topics/impala_data_sources.xml A docs/topics/impala_databases.xml A docs/topics/impala_datatypes.xml A docs/topics/impala_date.xml A docs/topics/impala_datetime_functions.xml A docs/topics/impala_ddl.xml A docs/topics/impala_debug_action.xml A docs/topics/impala_decimal.xml A docs/topics/impala_default_order_by_limit.xml A docs/topics/impala_delegation.xml A docs/topics/impala_delete.xml A docs/topics/impala_describe.xml A docs/topics/impala_development.xml A docs/topics/impala_disable_cached_reads.xml A docs/topics/impala_disable_codegen.xml A docs/topics/impala_disable_outermost_topn.xml A docs/topics/impala_disable_row_runtime_filtering.xml A docs/topics/impala_disable_streaming_preaggregations.xml A docs/topics/impala_disable_unsafe_spills.xml A docs/topics/impala_disk_space.xml A docs/topics/impala_distinct.xml A docs/topics/impala_dml.xml A docs/topics/impala_double.xml A docs/topics/impala_drop_data_source.xml A docs/topics/impala_drop_database.xml A docs/topics/impala_drop_function.xml A docs/topics/impala_drop_role.xml A docs/topics/impala_drop_stats.xml A docs/topics/impala_drop_table.xml A docs/topics/impala_drop_view.xml A docs/topics/impala_errata.xml A docs/topics/impala_exec_single_node_rows_threshold.xml A docs/topics/impala_explain.xml A docs/topics/impala_explain_level.xml A docs/topics/impala_explain_plan.xml A docs/topics/impala_faq.xml A docs/topics/impala_faq_base.xml A docs/topics/impala_features.xml A docs/topics/impala_file_formats.xml A docs/topics/impala_fixed_issues.xml A docs/topics/impala_float.xml A docs/topics/impala_functions.xml A docs/topics/impala_functions_overview.xml A docs/topics/impala_glossary.xml A docs/topics/impala_grant.xml A docs/topics/impala_group_by.xml A docs/topics/impala_group_concat.xml A docs/topics/impala_hadoop.xml A docs/topics/impala_having.xml A docs/topics/impala_hbase.xml A docs/topics/impala_hbase_cache_blocks.xml A docs/topics/impala_hbase_caching.xml A docs/topics/impala_hints.xml A docs/topics/impala_howto_rm.xml A docs/topics/impala_identifiers.xml A docs/top
[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.
John Russell has posted comments on this change. Change subject: IMPALA-3398: Add docs to main Impala branch. .. Patch Set 2: Code-Review+2 Verified+1 Got clean build of both SQL Ref and entire Impala doc bundle. -- To view, visit http://gerrit.cloudera.org:8080/5014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Anonymous Coward #250 Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.
Jim Apple has posted comments on this change. Change subject: IMPALA-3398: Add docs to main Impala branch. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5014/1/docs/impala_sqlref.ditamap File docs/impala_sqlref.ditamap: PS1, Line 6: > conref="shared/ImpalaVariables.xml#impala_vars/prodinfo_for_html" Done -- To view, visit http://gerrit.cloudera.org:8080/5014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Anonymous Coward #250 Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.
Jim Apple has uploaded a new patch set (#2). Change subject: IMPALA-3398: Add docs to main Impala branch. .. IMPALA-3398: Add docs to main Impala branch. These are refugees from doc_prototype. They can be rendered with the DITA Open Toolkit version 2.3.3 by: /tmp/dita-ot-2.3.3/bin/dita \ -i impala.ditamap \ -f html5 \ -o $(mktemp -d) \ -filter impala_html.ditaval Change-Id: I8861e99adc446f659a04463ca78c79200669484f --- A docs/Cloudera-Impala-Release-Notes.ditamap A docs/generatingImpalaDoc.md A docs/images/howto_access_control.png A docs/images/howto_per_node_peak_memory_usage.png A docs/images/howto_show_histogram.png A docs/images/howto_static_server_pools_config.png A docs/images/impala_arch.jpeg A docs/images/support_send_diagnostic_data.png A docs/impala.ditamap A docs/impala_html.ditaval A docs/impala_pdf.ditaval A docs/impala_sqlref.ditamap A docs/shared/ImpalaVariables.xml A docs/shared/impala_common.xml A docs/topics/impala.xml A docs/topics/impala_abort_on_default_limit_exceeded.xml A docs/topics/impala_abort_on_error.xml A docs/topics/impala_admin.xml A docs/topics/impala_admission.xml A docs/topics/impala_aggregate_functions.xml A docs/topics/impala_aliases.xml A docs/topics/impala_allow_unsupported_formats.xml A docs/topics/impala_alter_function.xml A docs/topics/impala_alter_table.xml A docs/topics/impala_alter_view.xml A docs/topics/impala_analytic_functions.xml A docs/topics/impala_appx_count_distinct.xml A docs/topics/impala_appx_median.xml A docs/topics/impala_array.xml A docs/topics/impala_auditing.xml A docs/topics/impala_authentication.xml A docs/topics/impala_authorization.xml A docs/topics/impala_avg.xml A docs/topics/impala_avro.xml A docs/topics/impala_batch_size.xml A docs/topics/impala_bigint.xml A docs/topics/impala_bit_functions.xml A docs/topics/impala_boolean.xml A docs/topics/impala_breakpad.xml A docs/topics/impala_cdh.xml A docs/topics/impala_char.xml A docs/topics/impala_cluster_sizing.xml A docs/topics/impala_cm_installation.xml A docs/topics/impala_comments.xml A docs/topics/impala_complex_types.xml A docs/topics/impala_components.xml A docs/topics/impala_compression_codec.xml A docs/topics/impala_compute_stats.xml A docs/topics/impala_concepts.xml A docs/topics/impala_conditional_functions.xml A docs/topics/impala_config.xml A docs/topics/impala_config_options.xml A docs/topics/impala_config_performance.xml A docs/topics/impala_connecting.xml A docs/topics/impala_conversion_functions.xml A docs/topics/impala_count.xml A docs/topics/impala_create_data_source.xml A docs/topics/impala_create_database.xml A docs/topics/impala_create_function.xml A docs/topics/impala_create_role.xml A docs/topics/impala_create_table.xml A docs/topics/impala_create_view.xml A docs/topics/impala_data_sources.xml A docs/topics/impala_databases.xml A docs/topics/impala_datatypes.xml A docs/topics/impala_date.xml A docs/topics/impala_datetime_functions.xml A docs/topics/impala_ddl.xml A docs/topics/impala_debug_action.xml A docs/topics/impala_decimal.xml A docs/topics/impala_default_order_by_limit.xml A docs/topics/impala_delegation.xml A docs/topics/impala_delete.xml A docs/topics/impala_describe.xml A docs/topics/impala_development.xml A docs/topics/impala_disable_cached_reads.xml A docs/topics/impala_disable_codegen.xml A docs/topics/impala_disable_outermost_topn.xml A docs/topics/impala_disable_row_runtime_filtering.xml A docs/topics/impala_disable_streaming_preaggregations.xml A docs/topics/impala_disable_unsafe_spills.xml A docs/topics/impala_disk_space.xml A docs/topics/impala_distinct.xml A docs/topics/impala_dml.xml A docs/topics/impala_double.xml A docs/topics/impala_drop_data_source.xml A docs/topics/impala_drop_database.xml A docs/topics/impala_drop_function.xml A docs/topics/impala_drop_role.xml A docs/topics/impala_drop_stats.xml A docs/topics/impala_drop_table.xml A docs/topics/impala_drop_view.xml A docs/topics/impala_errata.xml A docs/topics/impala_exec_single_node_rows_threshold.xml A docs/topics/impala_explain.xml A docs/topics/impala_explain_level.xml A docs/topics/impala_explain_plan.xml A docs/topics/impala_faq.xml A docs/topics/impala_faq_base.xml A docs/topics/impala_features.xml A docs/topics/impala_file_formats.xml A docs/topics/impala_fixed_issues.xml A docs/topics/impala_float.xml A docs/topics/impala_functions.xml A docs/topics/impala_functions_overview.xml A docs/topics/impala_glossary.xml A docs/topics/impala_grant.xml A docs/topics/impala_group_by.xml A docs/topics/impala_group_concat.xml A docs/topics/impala_hadoop.xml A docs/topics/impala_having.xml A docs/topics/impala_hbase.xml A docs/topics/impala_hbase_cache_blocks.xml A docs/topics/impala_hbase_caching.xml A docs/topics/impala_hints.xml A docs/topics/impala_howto_rm.xml A docs/topics/impala_identifiers.xml A docs/topics/impala_impala_shell.xml A docs/topics/impala_incompatible_changes.xml A docs/topics/impala_insert.xml A docs/
[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.
John Russell has posted comments on this change. Change subject: IMPALA-3398: Add docs to main Impala branch. .. Patch Set 1: (1 comment) Appended the updated attribute key/value pair. http://gerrit.cloudera.org:8080/#/c/5014/1/docs/impala_sqlref.ditamap File docs/impala_sqlref.ditamap: PS1, Line 6: > What is the most recent URI? conref="shared/ImpalaVariables.xml#impala_vars/prodinfo_for_html" -- To view, visit http://gerrit.cloudera.org:8080/5014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Anonymous Coward #250 Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.
Jim Apple has posted comments on this change. Change subject: IMPALA-3398: Add docs to main Impala branch. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5014/1/docs/impala_sqlref.ditamap File docs/impala_sqlref.ditamap: PS1, Line 6: > Stale URI. Causes build error. What is the most recent URI? -- To view, visit http://gerrit.cloudera.org:8080/5014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Anonymous Coward #250 Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping
Jim Apple has posted comments on this change. Change subject: Don't overwrite user's .ssh/config file when bootstrapping .. Patch Set 2: Code-Review+2 rebase, carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. IMPALA-3202: refactor scratch file management into TmpFileMgr This is a pure refactoring patch that moves all of the logic for allocating scratch file ranges into TmpFileMgr in anticipation of this logic being used by the new BufferPool. There should be no behavioural changes. Also remove a bunch of TODOs that we're not going to fix. Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Reviewed-on: http://gerrit.cloudera.org:8080/4898 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 6 files changed, 441 insertions(+), 432 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Henry Robinson has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. IMPALA-4476: Use unique_database to stop races in test_udfs.py These tests have been failing nondeterministically in larger machines with 16 cores. This change should stop races in haddop fs -put and drop/create function. Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Reviewed-on: http://gerrit.cloudera.org:8080/5124 Reviewed-by: Jim Apple Tested-by: Impala Public Jenkins --- M tests/query_test/test_udfs.py 1 file changed, 30 insertions(+), 23 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Tim Armstrong has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > There are 22 places alone in the backend tests where this happens, and they Not sure if that's sufficient, may require some runtime configuration too. My feeling is that -fwrapv is the wrong fix anyway and will mask genuine bugs. Instead we should fix all instances where we expect it to overflow (if we want 2s-complement overflow in a particular place, we can explicitly cast to unsigned then back). Maybe the best course forward it to disable this UBSAN check for now and deal with it in a follow-up patch. -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG Commit Message: PS6, Line 14: date year* -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Laszlo Gaal has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG Commit Message: PS6, Line 12: did't typo -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Michael Ho has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 7: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS7, Line 602: codegen It still doesn't track the memory consumed by the LLVM module (e.g. Instruction, BasicBlock objects), right ? http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS7, Line 809: ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD nit: May be worth refactoring it into a singe variable with meaningful name for easier understanding. http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS7, Line 1706: if (aggregate_evaluators_.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) { As discussed in person, it may be worth double checking whether min / max / sum on wide table doesn't regress as UpdateSlot() is rather small for these aggregate evaluators. It probably doesn't but it'd be good to avoid surprises. For instance, select min(col1),max(col1),sum(col1),,,min(col26),max(col26),sum(col26) from wide_table makes a difference without inlining for UpdateSlot() into UpdateTuple(). -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Jim Apple has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > What are some examples of where we overflow signed ints? /home/jbapple/Impala/be/src/gutil/strings/numbers.cc:954:17: runtime error: signed integer overflow: 42 * 1 cannot be represented in type 'int' #0 0x2b30a5f3ecfd in FastUInt32ToBufferLeft(unsigned int, char*) /home/jbapple/Impala/be/src/gutil/strings/numbers.cc:954:17 #1 0x2b30a5f3fb7e in FastUInt64ToBufferLeft(unsigned long long, char*) /home/jbapple/Impala/be/src/gutil/strings/numbers.cc:1037:24 #2 0x2b30a5f3e3ac in FastInt64ToBufferLeft(long long, char*) /home/jbapple/Impala/be/src/gutil/strings/numbers.cc:1080:10 #3 0x9cf0f8 in strings::internal::SubstituteArg::SubstituteArg(long) /home/jbapple/Impala/be/src/gutil/strings/substitute.h:106:35 #4 0x2b309d7988a3 in impala::HdfsAvroScanner::SetStatusValueOverflow(impala::TErrorCode::type, long, long) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner.cc:677:53 #5 0x2b309d7be0bc in impala::HdfsAvroScanner::ReadAvroString(impala::PrimitiveType, unsigned char**, unsigned char*, bool, void*, impala::MemPool*) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:241:7 #6 0x7e2e47 in void impala::HdfsAvroScannerTest::TestReadAvroType(bool (impala::HdfsAvroScanner::*)(impala::PrimitiveType, unsigned char**, unsigned char*, bool, void*, impala::MemPool*), impala::PrimitiveType, unsigned char*, long, impala::StringValue, int, impala::TErrorCode::type) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:85:20 #7 0x7c9a66 in impala::HdfsAvroScannerTest::TestReadAvroString(unsigned char*, long, impala::StringValue, int, impala::TErrorCode::type) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:179:5 #8 0x7bab3d in impala::HdfsAvroScannerTest_StringLengthOverflowTest_Test::TestBody() /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:348:5 /home/jbapple/Impala/be/src/exprs/expr-test.cc:819:36: runtime error: signed integer overflow: -9223372036854775807 + -9223372036854775807 cannot be represented in type 'long' #0 0xb901cd in void impala::ExprTest::TestFixedResultTypeOps(long, long, impala::ColumnType const&) /home/jbapple/Impala/be/src/exprs/expr-test.cc:819:36 #1 0x80be58 in impala::ExprTest_ArithmeticExprs_Test::TestBody() /home/jbapple/Impala/be/src/exprs/expr-test.cc:1180:3 /home/jbapple/Impala/be/src/exprs/expr-test.cc:823:36: runtime error: signed integer overflow: -9223372036854775807 * -9223372036854775807 cannot be represented in type 'long' #0 0xb90783 in void impala::ExprTest::TestFixedResultTypeOps(long, long, impala::ColumnType const&) /home/jbapple/Impala/be/src/exprs/expr-test.cc:823:36 #1 0x80be58 in impala::ExprTest_ArithmeticExprs_Test::TestBody() /home/jbapple/Impala/be/src/exprs/expr-test.cc:1180:3 /home/jbapple/Impala/be/src/exprs/expr-test.cc:821:36: runtime error: signed integer overflow: -9223372036854775807 - 9223372036854775807 cannot be represented in type 'long' #0 0xb9049d in void impala::ExprTest::TestFixedResultTypeOps(long, long, impala::ColumnType const&) /home/jbapple/Impala/be/src/exprs/expr-test.cc:821:36 #1 0x80c1a5 in impala::ExprTest_ArithmeticExprs_Test::TestBody() /home/jbapple/Impala/be/src/exprs/expr-test.cc:1184:3 [ RUN ] ExprTest.MathConversionFunctions /home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:370:24: runtime error: signed integer overflow: 2 * 4738381338321616896 cannot be repr esented in type 'long' /home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:371:13: runtime error: signed integer overflow: 4738381338321616896 * 36 cannot be represented in type 'long' #0 0x2b1855eb2f88 in impala::MathFunctions::DecimalInBaseToDecimal(long, signed char, long*) /home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:371:13 #1 0x2b1855eb2bb3 in impala::MathFunctions::ConvInt(impala_udf::FunctionContext*, impala_udf::BigIntVal const&, impala_udf::TinyIntVal const&, impala_udf::TinyIntVal const&) /home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:293:10 #2 0x2b18560f4152 in impala_udf::StringVal impala::ScalarFnCall::InterpretEval(impala::ExprContext*, impala::TupleRow const*) /home/jbapple/Impala/be/src/exprs/scalar-fn-call.cc:579:580 #3 0x2b185602e0d5 in impala::ScalarFnCall::GetStringVal(impala::ExprContext*, impala::TupleRow const*) /home/jbapple/Impala/be/src/exprs/scalar-fn-call.cc:685:44 #4 0x2b1855e03f1c in impala::ExprContext::GetValue(impala::Expr*, impala::TupleRow const*) /home/jbapple/Impala/be/src/exprs/expr-context.cc:277:33 #5 0x2b1855e025cc in impala::ExprContext:
[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3726: Add support for Kudu-specific column options .. Patch Set 7: Code-Review+2 (2 comments) Rebase and fixing tests, carry Alex's +2 http://gerrit.cloudera.org:8080/#/c/5026/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 129: public ColumnDef(String colName, TypeDef typeDef) { > Seems like we'd often need a C'tor that also takes a comment string. You ca Not as simple as it sounds, we need to make sure we don't insert nulls and then populate the map. But because the call to the delegating constructor needs to be the first statement that needs to happen outside the constructor. http://gerrit.cloudera.org:8080/#/c/5026/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 250: def test_primary_key_and_distribution(self, cursor): > AUTO and DEFAULT still seem uninformative. Let's revisit later like you sug I agree for encoding, not so much for compression since it can have no_compression. Anyways, as you said, let's revisit this later. -- To view, visit http://gerrit.cloudera.org:8080/5026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5026 to look at the new patch set (#7). Change subject: IMPALA-3726: Add support for Kudu-specific column options .. IMPALA-3726: Add support for Kudu-specific column options This commit adds support for Kudu-specific column options in CREATE TABLE statements. The syntax is: CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ]) where option is: | NULL | NOT NULL | ENCODING encoding_val | COMPRESSION compression_algorithm | DEFAULT expr | BLOCK_SIZE num The output of the SHOW CREATE TABLE statement was altered to include all the specified column options for Kudu tables. Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 --- M be/src/exec/kudu-table-sink.cc M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableAddReplaceColsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AvroSchemaParser.java M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test M tests/query_test/test_kudu.py 30 files changed, 968 insertions(+), 249 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5026/7 -- To view, visit http://gerrit.cloudera.org:8080/5026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 4: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: (1 comment) It might be best to special-case the local backend. At a high level this is the most intuitive thing to do. It seems like we already special-case the local backend to some extent in UpdateMembership(). Otherwise waiting until the membership update is received may make sense. http://gerrit.cloudera.org:8080/#/c/5127/1/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 300: BackendConfigPtr backend_config = GetBackendConfig(); I think it would make sense to detect at this point if the backend config doesn't have the coordinator in it. Then we could wait for the config to be updated (maybe with a timeout) before failing the query. Even if we don't crash, it seems very strange for a coordinator to not schedule a query on itself in the !exec_on_coord case. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Michael Ho has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250; > This is more about avoiding codegen blow-ups instead of making a cost-based Thanks for the explanation. I think the thing the confusion to me here is that both the FE and BE appear to be making decisions on whether to codegen for a particular node for performance reason. It may help to be state what falls under the high level decision in the planner vs what falls under the backend implementation details. Is the planner decision solely based on the plan shape ? http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); > I think it's different from the agg node. For the agg node we know based on Thanks. Similar comments about the need to spell out the difference between high level decision in the planner vs compilation cost related decision from backend implementation. -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Dan Hecht has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > There are 22 places alone in the backend tests where this happens, and they What are some examples of where we overflow signed ints? -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Henry Robinson has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: Can't the coordinator always make sure it's in the list of available backends, like Tim suggested? Would that be a big change? -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Jim Apple has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > My preference is not to add this option and to fix any cases where values a There are 22 places alone in the backend tests where this happens, and they all look intentional to me. I'd suggest that this is revealed intent. I'm fine with adding -fwrapv to the codegen flags. That's CLANG_IR_CXX_FLAGS? -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.
Alex Behm has posted comments on this change. Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes. .. Patch Set 3: Any more comments? -- To view, visit http://gerrit.cloudera.org:8080/5117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: Thanks for your feedback. I agree that this is not optimal, and if there was a way to send the equivalent of EAGAIN to the client and make it try again either with this or a different coordinator that could be a workaround. I'd also be happy to add a small timeout of a few seconds before returning an error. I'm rather reluctant to change the way we register the local backend with itself, since that way we will have two mechanisms of updating the membership information and need to handle conflicts as well. That being said, between a) Relying on the client to try again, b) waiting for 5 seconds (or whatever value), and c) reworking the cluster membership information management code in the scheduler, which one do you prefer? -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions
Dan Hecht has uploaded a new change for review. http://gerrit.cloudera.org:8080/5129 Change subject: IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions .. IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions Using TimestampValue (or equivalent string representation) for timestamps that require a point in time doesn't work because the same time can represent multiple point in times. For example, the timestamp: '2016-11-13 01:01 AM' occurred twice last weekend. Instead, we should use unix time directly rather than trying to derive unix time from a (timezone-less) timestamp. Note that there are other questionable uses of TimestampValue for internal Impala service stuff, but I want to fix them separately as they are not as important and fixing does add some risk. While I'm here, remove a template TimestampValue constructor that was unused and is confusing. We don't have any end-to-end tests that exercise column lineage, so add a simple custom cluster test that enables lineage and verifes the start and end unix times are within appropriate bounds. The other column lineage graph fields are at least tested via planner tests. Automated regression testing for the specifc daylight savings issue is difficult as we'd have to cross the daylight savings boundary at just the right time during query execution in order to reproduce reliably. But open to ideas. Testing: - loop the new test overnight without any failures. - exhaustive run. Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 --- M be/src/runtime/timestamp-value.h M be/src/service/impala-server.cc M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A tests/custom_cluster/test_lineage.py 6 files changed, 87 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/5129/1 -- To view, visit http://gerrit.cloudera.org:8080/5129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Tim Armstrong has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: Line 40: # -fwrapv: force signed integer arithmetic operations to wrap > This was already assumed, and a number of our tests depend on it holding. My preference is not to add this option and to fix any cases where values are actually wrapping at runtime. This could have performance implications, since a lot of "obvious" loop optimisations can depend on array indices, etc not wrapping. Also we're not enabling this for codegen, which potentially add a gap to our test coverage. -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: This is better than crashing, but it also seems like we shouldn't fail queries for this reason. Probably we should wait until we've gotten the statestore update message before trying to schedule queries (either that or make sure the local backend is always present). -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/5127 Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check to ensure that the local backend has been registered with the scheduler before issuing scan ranges when exec_at_coord is set. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc M common/thrift/generate_error_codes.py 3 files changed, 32 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/1 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250; > I agree that it's more actionable if it's based on number of tuples. This is more about avoiding codegen blow-ups instead of making a cost-based decision (that's tricky because the codegen cost is non-linear and unpredictable and we'd need to build a cost model for the scanners factoring in # rows,# columns, predicates, etc). I don't think there's much point in modelling the cost unless a) there is a substantial benefit to justify the complexity and b) the cost model is going to be fairly accurate. Otherwise we end up in an uncomfortable middle ground with a half-baked code model where the code is more complex but we don't get measurable benefit or have confidence in the cost model's accuracy. I'd rather not do this from the planner because it's tied up in the implementation details of codegen for the scanners. The only reason we have a hard threshold is because the backend codegen logic builds everything in a single function. If that changed we could switch to the approach we use elsewhere of disabling inlining. It makes more sense to me for the planner to make a high-level decision about enabling codegen for the node, then the backend implements that decision (or bails out if it can't do the codegen without blowing up). http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true); > This isn't exactly what I had in mind. If you look at ExchangeNode::Codegen I think it's different from the agg node. For the agg node we know based on the plan shape that the number of input rows is small, so even though we could codegen the agg, it's not worth it. But as far as I can see the only reason we don't codegen non-merging exchanges in general is that there's no backend support for it. There's no reason I see that the planner would always disable codegen for non-merging exchanges if backend support existed. -- To view, visit http://gerrit.cloudera.org:8080/4956 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Jim Apple has posted comments on this change. Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. Patch Set 2: Code-Review+2 (1 comment) Carry +2 http://gerrit.cloudera.org:8080/#/c/5124/1/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: Line 94: self.client.execute(create_fn_stmt) > can get rid of try/catch since unique_database will take care of that Done -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5124 to look at the new patch set (#2). Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. IMPALA-4476: Use unique_database to stop races in test_udfs.py These tests have been failing nondeterministically in larger machines with 16 cores. This change should stop races in haddop fs -put and drop/create function. Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 --- M tests/query_test/test_udfs.py 1 file changed, 30 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/5124/2 -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Jim Apple has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Patch Set 1: (26 comments) > (22 comments) > > what's the plan for keeping this compiling? I have a Jenkins job in progress to build in a variety of different ways (-so, -asan, -ninja, -releast). I will modify to add this. I intend eventually to add the job to a continually-monitored dashboard and perhaps to have it send email alerts on failure. http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: PS1, Line 95: synbol > typo Done http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc File be/src/exec/data-source-scan-node.cc: PS1, Line 150: ) > add "!= 0" as we try to avoid implicit comparisons against zero conversion switched to std::fill http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 276: *decimal >>= bytes_to_fill * 8; > since len comes from the data file, we could also have bytes_to_fill < 0, n Yes, that is also undefined, but it is precluded by ReadFieldLen being .ok, above. Line 278: *decimal = 0; > can this happen with well-formed avro? if not, this be a return false case. Here is a stack trace: /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:276:20: runtime error: shift exponent 32 is too large for 32-bit type 'int32_t' (aka 'int') #0 0x2af0805477df in impala::HdfsAvroScanner::ReadAvroDecimal(int, unsigned char**, unsigned char*, bool, void*, impala::MemPool*) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:276:20 #1 0x7ec68c in void impala::HdfsAvroScannerTest::TestReadAvroType, bool (impala::HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, impala::MemPool*), unsigned long>(bool (impala::HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, impala::MemPool*), unsigned long, unsigned char*, long, impala::DecimalValue, int, impala::TErrorCode::type) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:85:20 #2 0x7c8e9a in void impala::HdfsAvroScannerTest::TestReadAvroDecimal(unsigned char*, long, impala::DecimalValue, int, impala::TErrorCode::type) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:187:5 #3 0x7bbb3f in impala::HdfsAvroScannerTest_DecimalTest_Test::TestBody() /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:393:3 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1379: double percent = total_rows * 100 / std::max(1L, num_input_rows); > if that were the case, we would have seen crashes. 1. "if that were the case, we would have seen crashes." - I think that because total_rows is a double, the denominator gets converted to a double, making this not a crash. UBSAN is a dynamic analysis tool; when it triggers, the behavior actually occurred. 2. Done. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc File be/src/exec/read-write-util.cc: PS1, Line 97: * > nit: we put the * next to type. Done. Line 100: memcpy(&uinteger, &integer, sizeof(integer)); > using memcpy doesn't seem necessary. can't we just write: It has different semantics: for 8-bit ints and integer = -2, my expression has value 3, yours has value 253, and yours with the | replaced by ^ has value 251. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 157: if (LIKELY(dst->ptr && src.ptr)) memcpy(dst->ptr, src.ptr, src.len); > Prefer explicit NULL checks. memcpy is UB if either ptr is null, even if len is 0. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc File be/src/exprs/bit-byte-functions-ir.cc: PS1, Line 151: v * > why is this necessary? and does it generate the same code (let's make sure UBSAN caught a case where v << shift was not representable as a T. That's UB, but since this patch sets the -fwrapv, the multiply behavior is defined. It generates the same code, but it if we don't do it this way and set fwrapv, the compiler can break the code and do bizarre things. Historically, this is a real risk with GCC. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 820: if (!haystack) return NULL; > okay but this check is still unnecessary - no loop iterations will be taken Done http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: PS1, Line 804: t > nit: "t != nullptr" per our style. Done http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1684: accumulators::
[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".
Jim Apple has uploaded a new patch set (#2). Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". .. Add a build flag for the undefined behavior sanitizer, aka "ubsan". Ubsan checks for undefined behavior according to the C++ standard. Some of this behavior has been known to be exploited by optimizing compilers to produce bizarre results, like taking both branches of a conditional. Although a number of cases of UB were not possible to fix, as they were caused by libraries like Thrift or libstdc++, this patch fixes the UB that was feasible to fix. Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/exec/aggregation-node.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/delimited-text-parser-test.cc M be/src/exec/hash-join-node-ir.cc M be/src/exec/hash-join-node.cc M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scanner.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/read-write-util.cc M be/src/exec/select-node.cc M be/src/exec/unnest-node.cc M be/src/exec/zigzag-test.cc M be/src/experiments/tuple-splitter-test.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions-test.cc M be/src/exprs/bit-byte-functions-ir.cc M be/src/exprs/expr-context.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/string-functions-ir.cc M be/src/rpc/authentication.cc M be/src/rpc/thrift-util-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/coordinator.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/parallel-executor-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/sorter.cc M be/src/runtime/string-value.inline.h M be/src/runtime/tuple.h M be/src/thirdparty/mustache/mustache.cc M be/src/udf/udf-test-harness.h M be/src/udf/udf.cc M be/src/util/bit-packing-test.cc M be/src/util/bitmap.h M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M be/src/util/decompress-test.cc M be/src/util/dict-encoding.h M be/src/util/internal-queue-test.cc M be/src/util/runtime-profile.cc M be/src/util/streaming-sampler.h M be/src/util/tuple-row-compare.h M be/src/util/uid-util.h M be/src/util/webserver.cc M bin/make_impala.sh M bin/run-backend-tests.sh M bin/start-impalad.sh M buildall.sh 62 files changed, 218 insertions(+), 163 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/5082/2 -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4432: Handle internal codegen disabling properly .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/5105/3/be/src/runtime/plan-fragment-executor.cc File be/src/runtime/plan-fragment-executor.cc: Line 273: if (runtime_state_->CodegenDisabled() && !runtime_state_->HasScalarFnToCodegen()) return; Long line -- To view, visit http://gerrit.cloudera.org:8080/5105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. Patch Set 6: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR](hadoop-next) Merge remote-tracking branch 'origin/master' into hadoop-next
Tim Armstrong has posted comments on this change. Change subject: Merge remote-tracking branch 'origin/master' into hadoop-next .. Patch Set 1: Code-Review+2 Verified+1 Merge from master. Confirmed that it compiled locally on my machine. -- To view, visit http://gerrit.cloudera.org:8080/5126 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: hadoop-next Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR](hadoop-next) Merge remote-tracking branch 'origin/master' into hadoop-next
Tim Armstrong has submitted this change and it was merged. Change subject: Merge remote-tracking branch 'origin/master' into hadoop-next .. Merge remote-tracking branch 'origin/master' into hadoop-next Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5 --- D common/thrift/cli_service.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 3 insertions(+), 1,166 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/5126 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: hadoop-next Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR](hadoop-next) Merge remote-tracking branch 'origin/master' into hadoop-next
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/5126 Change subject: Merge remote-tracking branch 'origin/master' into hadoop-next .. Merge remote-tracking branch 'origin/master' into hadoop-next Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5 --- D common/thrift/cli_service.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 2 files changed, 3 insertions(+), 1,166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5126/1 -- To view, visit http://gerrit.cloudera.org:8080/5126 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: hadoop-next Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Alex Behm has posted comments on this change. Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5124/1/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: Line 94: try: can get rid of try/catch since unique_database will take care of that -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/5125 Change subject: IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE. .. IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE. TODO: - Show column encoding and compression once available - Corresponding changes to DESCRIBE EXTENDED/FORMATTED will follow in a separate patch. Testing: A private core/hdfs run passed. Change-Id: I83c91b540bc6d27cb4f21535fe12f3f8658c233e --- M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test M testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test M tests/query_test/test_kudu.py 9 files changed, 180 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/5125/1 -- To view, visit http://gerrit.cloudera.org:8080/5125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I83c91b540bc6d27cb4f21535fe12f3f8658c233e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Lars Volker has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 273: RETURN_IF_ERROR(output_partition->writer->AppendRowBatch( > yes. Done -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Hello Alex Behm, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4863 to look at the new patch set (#15). Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. IMPALA-2523: Make HdfsTableSink aware of clustered input IMPALA-2521 introduced clustering for insert statements. This change makes the HdfsTableSink aware of clustered inputs, so that partitions are opened, written, and closed one by one. This change also adds/modifies tests in several ways: - clustered insert tests switch from selecting all rows from alltypessmall to alltypes. Together with varying settings for batch_size, this results in a larger number of row batches being written. - clustered insert tests select from alltypes instead of functional.alltypes to make sure we also select from various input formats. - clustered insert tests have been added to select from alltypestiny to create inserts with 1 and 2 rows per partition respectively. - exhaustive insert tests now use different values for batch_size: 1, 16, 0 (meaning default, 1024). This is limited to uncompressed parquet files, to maintain a reasonable runtime. On my machine execution of test.insert took 1778 seconds, compared to 1002 seconds with the just default row batch size. - There is additional testing in test_insert_behaviour.py to make sure that insertion over several row batches only creates one file per partition. - It renames the test_insert method to make it unique in the file and allow for effective filtering with -k. Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-writer.cc M be/src/exec/hbase-table-writer.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/hdfs-text-table-writer.h M bin/impala-config.sh M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-query/queries/QueryTest/insert.test M tests/query_test/test_insert.py M tests/query_test/test_insert_behaviour.py 26 files changed, 424 insertions(+), 138 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/15 -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Attila Jeges has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 3: > > Thanks!Do you think I should add some code to catalog to validate > the > > value of 'initial_hms_cnxn_timeout_s'? What would be the > acceptable > > range? > > You could validate that it was > 0, but otherwise I don't think you > should put a bound on it. Users may want to wait effectively > forever, and I can see them setting 99 as a value to > simulate that. > > > Adding this config parameter to the CM UI is another issue. Do > you think it should be added for the 5.10 release or it is not > that urgent? I guess, the default value is reasonable for most > users and they can always set it as a safety valve if they have to. > > Sounds like you're talking about a Cloudera release. Reminder that > this is the Apache Impala project - vendor specific concerns should > be discussed elsewhere. Thanks for the review, I've added the > 0 check. Could you add a +2 once more? -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Attila Jeges has uploaded a new patch set (#3). Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present This change introduces a new catalogd startup option (init_first_metastore_client_timeout_seconds) that specifies the time in seconds catalogd should spend on retrying to establish a connection to HMS the first time on startup before giving up and exiting fatally. Setting this startup option to a value that is greater than the HMS startup time will allow CM to start Impala at the same time or even before HMS. The default value of init_first_metastore_client_timeout_seconds is 120 seconds. Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java A tests/experiments/test_catalog_hms_failures.py 10 files changed, 285 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/5095/3 -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Hello Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5095 to look at the new patch set (#3). Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present This change introduces a new catalogd startup option (init_first_metastore_client_timeout_seconds) that specifies the time in seconds catalogd should spend on retrying to establish a connection to HMS the first time on startup before giving up and exiting fatally. Setting this startup option to a value that is greater than the HMS startup time will allow CM to start Impala at the same time or even before HMS. The default value of init_first_metastore_client_timeout_seconds is 120 seconds. Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java A tests/experiments/test_catalog_hms_failures.py 10 files changed, 285 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/5095/3 -- To view, visit http://gerrit.cloudera.org:8080/5095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 273: RETURN_IF_ERROR(output_partition->writer->AppendRowBatch( > This is part of the HdfsTableWriter virtual interface. Should I change it h yes. -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Lars Volker has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 13: (10 comments) Thanks for the review. Please see PS14. http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 66: tsink.table_sink.hdfs_table_sink.skip_header_line_count : > if you break this up, then This has been formatted by clang-format since I touched these lines. While I agree that the formatting could be nicer, I'd like to keep the automatic format here for simplicity. Line 212: GetHashTblKey(NULL, dynamic_partition_key_value_ctxs, &key); > we're standardizing on nullptr now I changed all occurrences in lines I touched. Once the change has a +2 I will rebase the change and then replace all other occurrences in the file to keep it consistent (and keep the risk for merge conflicts low until then). Line 266: // The rows of this batch may span across multiple files. We repeatedly pass the row > remove "across" Done Line 273: RETURN_IF_ERROR(output_partition->writer->AppendRowBatch( > rename this function, it's not appending the entire row batch (AppendRows?) This is part of the HdfsTableWriter virtual interface. Should I change it here and in all table writer classes? Line 283: Status HdfsTableSink::WriteClusteredRowBatch(RuntimeState* state, RowBatch* batch) { > to speed this up, you should: Done. AppendRowBatch() already appends the whole batch if the row index vector is empty, so there was no change necessary. http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 170: const HdfsPartitionDescriptor& partition_descriptor, const TupleRow* current_row, > mention that current_row provides the partition key values. Done Line 181: /// GetHashTblKey(). Maps to an OutputPartition, which are owned by the object pool and > "pool," Done Line 189: void GetHashTblKey(const TupleRow* current_row, const std::vector& ctxs, > current_row -> row (it doesn't matter to this function that the row is 'cur Done Line 213: /// Appends all rows in batch to the temporary Hdfs files corresponding to partitions. > what "corresponding to partitions" refers to is unclear. Rephrased it. Line 214: /// The input must be ordered by the partition key expressions. > the rows are expected to be ordered...? Done. Though I think "must be" makes it more clear that it's a prerequisite to be guaranteed by the caller. -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Hello Alex Behm, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4863 to look at the new patch set (#14). Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. IMPALA-2523: Make HdfsTableSink aware of clustered input IMPALA-2521 introduced clustering for insert statements. This change makes the HdfsTableSink aware of clustered inputs, so that partitions are opened, written, and closed one by one. This change also adds/modifies tests in several ways: - clustered insert tests switch from selecting all rows from alltypessmall to alltypes. Together with varying settings for batch_size, this results in a larger number of row batches being written. - clustered insert tests select from alltypes instead of functional.alltypes to make sure we also select from various input formats. - clustered insert tests have been added to select from alltypestiny to create inserts with 1 and 2 rows per partition respectively. - exhaustive insert tests now use different values for batch_size: 1, 16, 0 (meaning default, 1024). This is limited to uncompressed parquet files, to maintain a reasonable runtime. On my machine execution of test.insert took 1778 seconds, compared to 1002 seconds with the just default row batch size. - There is additional testing in test_insert_behaviour.py to make sure that insertion over several row batches only creates one file per partition. - It renames the test_insert method to make it unique in the file and allow for effective filtering with -k. Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M bin/impala-config.sh M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-query/queries/QueryTest/insert.test M tests/query_test/test_insert.py M tests/query_test/test_insert_behaviour.py 14 files changed, 391 insertions(+), 98 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/14 -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py
Jim Apple has uploaded a new change for review. http://gerrit.cloudera.org:8080/5124 Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py .. IMPALA-4476: Use unique_database to stop races in test_udfs.py These tests have been failing nondeterministically in larger machines with 16 cores. This change should stop races in haddop fs -put and drop/create function. Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 --- M tests/query_test/test_udfs.py 1 file changed, 25 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/5124/1 -- To view, visit http://gerrit.cloudera.org:8080/5124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple
[Impala-ASF-CR] Improve message output from run-step.sh
Internal Jenkins has posted comments on this change. Change subject: Improve message output from run-step.sh .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] Improve message output from run-step.sh
Internal Jenkins has submitted this change and it was merged. Change subject: Improve message output from run-step.sh .. Improve message output from run-step.sh run-step prints a message to tell the reader what it's doing. However, that message wasn't flushed so that run-step could print OK or FAILED on the same line. The result was that long-running steps wouldn't print anything to the log until they were done, at least in Jenkins contexts. This patch changes it so that the message is flushed, and then the result is printed on a separate line (including the time it took to run the step). $ run-step "Hello world!" helloworld.out sleep 5 Hello world! (logging to /tmp/helloworld.out)... OK (Took: 0 min 5 sec) Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Reviewed-on: http://gerrit.cloudera.org:8080/5116 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M testdata/bin/run-step.sh 1 file changed, 7 insertions(+), 4 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins