[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Vincent Tran has uploaded a new patch set (#3). Change subject: IMPALA-5316: Adds last_day() function .. IMPALA-5316: Adds last_day() function This change adds last_day() function. The function takes exactly one DATE/TIMESTAMP argument and returns a TIMESTAMP that is the last date of the input date's calendar month. Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M common/function-registry/impala_functions.py 4 files changed, 70 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/6991/3 -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Vincent Tran has posted comments on this change. Change subject: IMPALA-5316: Adds last_day() function .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 279: void TestLastDayFunction() { > can you check the results of all of these tests are the same as a reference MariaDB returns only the DATE portion of the timestamp. That would be neat to do here as well. But I don't see a way to do it unless we convert the output to StringVal. Line 281: TestTimestampValue("last_day('2003-01-02')", > can you add times to some of these? Done Line 282: TimestampValue::Parse("2003-01-31 00:00:00", 19)); > for all of the below, use the Parse() overload that takes std::string, i.e. Done Line 316: TimestampValue::Parse("2100-02-28 00:00:00", 19)); > I pointed out a few special cases to consider in the .cc file Done http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: PS2, Line 562: end_of_month() > 1) I wonder if you can get a TimestampValue without a Date or with a "speci Done PS2, Line 562: ptime pdate(timestamp.date().end_of_month()); : TimestampValue tsv(pdate); > I think you can avoid converting to ptime and just use the TimestampValue c Done http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions.h File be/src/exprs/timestamp-functions.h: Line 198: static TimestampVal LastDay(FunctionContext* context, const TimestampVal& ts); > please add a comment. look to other fns in the file for examples. Done -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/650/ -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 8: Code-Review+2 The problem with the last run was that testMtDopValidation() resets the RuntimeEnv.isTestEnv_ flag temporary causing the table loading to hit the preconditions check. I adjusted the check to not rely on that. Simple change, carrying +2. Ran a pvt build to make sure the tests pass. -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6970 to look at the new patch set (#8). Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd We need not account these hdfs table metrics on the Catalog server as they are eventually calculated again on the Impalads while unpacking the corresponding thrift table. This fix can potentially affect the frontend tests that directly load the Catalog's version of HdfsTable without the loadFromThrift() call paths that do the accounting. That is fixed by adding a separate call that computes these values and is called from ImpaladTestCatalog.getTable(). Testing: Enough tests already cover these code paths like show stats/ table or partition refresh tests etc. No new tests are added. Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 2 files changed, 27 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6970/8 -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 19: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/648/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/649/ -- To view, visit http://gerrit.cloudera.org:8080/6998 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6998 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5376: Loads all TPC-DS tables
Hello Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6877 to look at the new patch set (#8). Change subject: IMPALA-5376: Loads all TPC-DS tables .. IMPALA-5376: Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 --- M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M testdata/bin/compute-table-stats.sh M testdata/bin/generate-schema-statements.py M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds.test M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-q1.test M testdata/workloads/tpcds/queries/tpcds-q19.test A testdata/workloads/tpcds/queries/tpcds-q2.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test M testdata/workloads/tpcds/queries/tpcds-q27.test A testdata/workloads/tpcds/queries/tpcds-q27a.test A testdata/workloads/tpcds/queries/tpcds-q28.test M testdata/workloads/tpcds/queries/tpcds-q3.test M testdata/workloads/tpcds/queries/tpcds-q34.test A testdata/workloads/tpcds/queries/tpcds-q4.test M testdata/workloads/tpcds/queries/tpcds-q42.test M testdata/workloads/tpcds/queries/tpcds-q43.test M testdata/workloads/tpcds/queries/tpcds-q46.test M testdata/workloads/tpcds/queries/tpcds-q47.test M testdata/workloads/tpcds/queries/tpcds-q52.test M testdata/workloads/tpcds/queries/tpcds-q53.test M testdata/workloads/tpcds/queries/tpcds-q55.test M testdata/workloads/tpcds/queries/tpcds-q59.test M testdata/workloads/tpcds/queries/tpcds-q6.test M testdata/workloads/tpcds/queries/tpcds-q61.test M testdata/workloads/tpcds/queries/tpcds-q63.test M testdata/workloads/tpcds/queries/tpcds-q65.test M testdata/workloads/tpcds/queries/tpcds-q68.test M testdata/workloads/tpcds/queries/tpcds-q7.test M testdata/workloads/tpcds/queries/tpcds-q73.test M testdata/workloads/tpcds/queries/tpcds-q79.test M testdata/workloads/tpcds/queries/tpcds-q88.test M testdata/workloads/tpcds/queries/tpcds-q89.test M testdata/workloads/tpcds/queries/tpcds-q96.test M testdata/workloads/tpcds/queries/tpcds-q98.test M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_tpcds_queries.py 41 files changed, 10,581 insertions(+), 3,960 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/6877/8 -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-4164: Use inline hints instead of always-inline
Michael Ho has posted comments on this change. Change subject: IMPALA-4164: Use inline hints instead of always-inline .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6941/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 163: target_cpu_attr_ = fn->getFnAttribute("target-cpu").getValueAsString(); > Will see if I can get a Jenkins machine without avx2. My suspicion is that The Jenkins VM apparently doesn't have avx2. I updated PhjBuilder to inline some cross-compiled bloom-filter functions which use AVX2 instructions and it failed during compilation as expected. Hard-coding "-avx2" into "cpu_attrs_" passed to the execution engine didn't cause the same failure on a machine with avx2 instructions support. -- To view, visit http://gerrit.cloudera.org:8080/6941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Loads all TPC-DS tables
Michael Ho has posted comments on this change. Change subject: Loads all TPC-DS tables .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/6877/7//COMMIT_MSG Commit Message: PS7, Line 7: Loads all TPC-DS tables > Jira? There is only a downstream JIRA tracking this. I will file an upstream JIRA for it. http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: PS7, Line 365:impala_home = os.getenv("IMPALA_HOME"), :hint = insert_hint) > Please remove spaces around = as above. Done http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: PS7, Line 27: cc_rec_start_date string : cc_rec_end_date string > Here and likely elsewhere, there are column type differences between the sp The schema mostly matches what we use for nightly perf testing. I agree that we should probably change some of these to timestamp but that it may require changing some existing table such as date_dim so I didn't want to do it in this change. Some other notable difference from the spec is that we use bigint for certain columns such as ss_ticket_number as it may overflow a 32-bit integer when the scale gets large enough. -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6840 to look at the new patch set (#10). Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. IMPALA-2373: Extrapolate row counts for HDFS tables. The main idea of this patch is to use table stats to extrapolate the row counts for new/modified partitions. Existing behavior: - Partitions that lack the row count stat are ignored when estimating the cardinality of HDFS scans. Such partitions effectively have an estimated row count of zero. - We always use the row count stats for partitions that have one. The row count may be innaccurate if data in such partitions has changed significantly. Summary of changes: - Enhance COMPUTE STATS to also store the total number of file bytes in the table. - Use the table-level row count and file bytes stats to estimate the number of rows in a scan. - A new impalad startup flag is added to enable/disable the extrapolation behavior. The feature is disabled by default. Note that even with the feature disabled, COMPUTE STATS stores the file bytes so you can enable the feature without having to run COMPUTE STATS again. Testing: - Added new FE unit test - Added new EE test Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 --- M be/src/common/global-flags.cc M be/src/exec/catalog-op-executor.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test A testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test A tests/custom_cluster/test_stats_extrapolation.py M tests/metadata/test_explain.py 30 files changed, 817 insertions(+), 206 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6840/10 -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Alex Behm has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/6840/9/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: Line 141: | stats-rows=7300 extrapolated-rows=disabled > why disabled everywhere? There are several remaining concerns around stats extrapolation (e.g. IMPALA-5360) and compute stats with sampling. I don't think it is reasonably possible to address the remaining concerns in this patch, so I propose to commit this foundational change with extrapolation disabled by default and continue working on discussing/addressing the remaining concerns which are mostly add-ons to this change. We can still decide to enable extrapolation by default later on. Line 268: | tuple-ids=3,2 row-size=61B cardinality=7300 > ? Test infra issue. The 61B is definitely correct. See IMPALA-5341. http://gerrit.cloudera.org:8080/#/c/6840/9/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test File testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test: Line 99: 'WARNING: The following tables are missing relevant table and/or column statistics.' > i think we need to do something about this warning. Nice catch. Agree. Done. -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Sailesh Mukil has abandoned this change. Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. Abandoned The updated patch somehow got a new change ID. The new patch is here: https://gerrit.cloudera.org/#/c/6998/ -- To view, visit http://gerrit.cloudera.org:8080/6995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/6998 Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies Builds on CentOS 6.4 fail due to dependencies not met for the new 'cryptography' python package. The ADLS commit states that the new packages are only required for ADLS and that ADLS on a dev environment is only supported from CentOS 6.7. This patch moves the compiled requirements for ADLS from compiled-requirements.txt to adls-requirements.txt and passing a compiler to the Pip environment while installing the ADLS requirements. Testing: Tested it on a machine that with TARGET_FILESYSTEM='adls' and also tested it on a CentOS 6.4 machine with the default configuration. Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78 --- M infra/python/bootstrap_virtualenv.py M infra/python/deps/adls-requirements.txt M infra/python/deps/compiled-requirements.txt 3 files changed, 13 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/6998/1 -- To view, visit http://gerrit.cloudera.org:8080/6998 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. Patch Set 1: (3 comments) Tim mentioned a good point. Instead of splitting the requirements into a seperate adls-compiled-requirements.txt, we can just move these requirements to adls-requirements.txt and set the compiler environment 'CC' while installing those. So, I've made that change instead as it keeps the patch much simpler. http://gerrit.cloudera.org:8080/#/c/6995/1/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: PS1, Line 228: ret = install_compiled_deps_if_possible(COMPILED_REQS_PATH) : if ret == False: return False : if os.environ.get('TARGET_FILESYSTEM') == "adls": : ret = install_compiled_deps_if_possible(ADLS_COMPILED_REQS_PATH) : if ret == False: return False > this might be more elegantly expressed as: Not necessary anymore after the refactor. Line 235: def install_adls_deps(): > why not add the compiled deps logic here? Not necessary anymore after the refactor. PS1, Line 236: 6 > elsewhere the min reqs for the compiled_deps is said to be 6.7. Can you mak Done -- To view, visit http://gerrit.cloudera.org:8080/6995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4164: Use inline hints instead of always-inline
Michael Ho has uploaded a new patch set (#7). Change subject: IMPALA-4164: Use inline hints instead of always-inline .. IMPALA-4164: Use inline hints instead of always-inline When generating IR functions during codegen, we used to always tag the functions with the "AlwaysInline" attribute. That potentially leads to excessive inlining, leading to very long optimization / compilation time with marginal performance benefit at runtime. One of the reasons for doing it was that the "target-cpu" and "target-features" attributes are missing in the generated IR functions so the LLVM inliner considers them incompatible with the cross-compiled functions. As a result, the inliner will not inline the generated IR functions into cross-compiled functions unless we put the "AlwaysInline" attributes with the generated IR functions. This change fixes the problem above by setting the "target-cpu" and "target-features" attributes of all IR functions to match that of of the host's CPUs so both generated IR functions and cross-compiled functions will have the same values for those attributes. With these attributes set, we now switch to using "InlineHint" in place of "AlwaysInline" and let LLVM make the decision of whether it's sensible to inline a function. With this change, the codegen time of a query with a long predicate went from 15s to 4s and the overall runtime went from 19s to 8s. Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h A testdata/workloads/targeted-perf/queries/primitive_long_predicate.test 3 files changed, 211 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6941/7 -- To view, visit http://gerrit.cloudera.org:8080/6941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Loads all TPC-DS tables
Michael Brown has posted comments on this change. Change subject: Loads all TPC-DS tables .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/6877/7//COMMIT_MSG Commit Message: PS7, Line 7: Loads all TPC-DS tables Jira? PS7, Line 21: before. The results of E2E tests were compared against the run done with : Netezza and Vertica. Thanks for saying so! http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: PS7, Line 365:impala_home = os.getenv("IMPALA_HOME"), :hint = insert_hint) Please remove spaces around = as above. http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: PS7, Line 27: cc_rec_start_date string : cc_rec_end_date string Here and likely elsewhere, there are column type differences between the spec and here, and here is one of those places.. Is there a general motivation for having different types here differ with those in the spec? For example, why not timestamp? -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
Greg Rahn has posted comments on this change. Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. Patch Set 2: Code-Review+1 LGTM -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4164: Use inline hints instead of always-inline
Michael Ho has uploaded a new patch set (#6). Change subject: IMPALA-4164: Use inline hints instead of always-inline .. IMPALA-4164: Use inline hints instead of always-inline When generating IR functions during codegen, we used to always tag the functions with the "AlwaysInline" attribute. That potentially leads to excessive inlining, leading to very long optimization / compilation time with marginal performance benefit at runtime. One of the reasons for doing it was that the "target-cpu" and "target-features" attributes are missing in the generated IR functions so the LLVM inliner considers them incompatible with the cross-compiled functions. As a result, the inliner will not inline the generated IR functions into cross-compiled functions unless we put the "AlwaysInline" attributes with the generated IR functions. This change fixes the problem above by setting the "target-cpu" and "target-features" attributes of all IR functions to match that of of the host's CPUs so both generated IR functions and cross-compiled functions will have the same values for those attributes. With these attributes set, we now switch to using "InlineHint" in place of "AlwaysInline" and let LLVM make the decision of whether it's sensible to inline a function. With this change, the codegen time of a query with a long predicate went from 15s to 4s and the overall runtime went from 19s to 8s. Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h A testdata/workloads/targeted-perf/queries/primitive_long_predicate.test 3 files changed, 211 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6941/6 -- To view, visit http://gerrit.cloudera.org:8080/6941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 19: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/648/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Henry Robinson has posted comments on this change. Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. Patch Set 1: (3 comments) Can you mention the testing you do (when it's complete) in the commit msg? http://gerrit.cloudera.org:8080/#/c/6995/1/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: PS1, Line 228: ret = install_compiled_deps_if_possible(COMPILED_REQS_PATH) : if ret == False: return False : if os.environ.get('TARGET_FILESYSTEM') == "adls": : ret = install_compiled_deps_if_possible(ADLS_COMPILED_REQS_PATH) : if ret == False: return False this might be more elegantly expressed as: REQS = [COMPILED_REQS_PATH] if os.environ.get("TARGET_FILESYSTEM") == "adls": REQS += ADLS_COMPILED_REQS_PATH for req in REQS: if not install_compiled_deps_if_possible(req): return False return True Line 235: def install_adls_deps(): why not add the compiled deps logic here? PS1, Line 236: 6 elsewhere the min reqs for the compiled_deps is said to be 6.7. Can you make these consistent? -- To view, visit http://gerrit.cloudera.org:8080/6995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Dan Hecht has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 19: Code-Review+2 Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#19). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 861 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/19 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. Patch Set 1: > Uploaded patch set 1. Testing: Ran it on a machine with TARGET_FILESYSTEM='adls' and confirmed it worked. Currently running it on a CentOS 6.4 machine. Will post here once it passes the stage where it breaks on the 'master' branch. -- To view, visit http://gerrit.cloudera.org:8080/6995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/6995 Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies .. IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies Builds on CentOS 6.4 fail due to dependencies not met for the new 'cryptography' python package. The commit states that the new packages are only required for ADLS and that ADLS on a dev environment is only supported from CentOS 6.7. This patch only installs the ADLS requirements if the dev cluster is configured to use ADLS due to the above limitation. This is done by taking the ADLS compiled requirements and adding them to a new file called 'adls-compiled-requirements.txt' and invkoing that only when the TARGET_FILESYSTEM is 'adls'. Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04 --- M infra/python/bootstrap_virtualenv.py A infra/python/deps/adls-compiled-requirements.txt M infra/python/deps/compiled-requirements.txt 3 files changed, 48 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6995/1 -- To view, visit http://gerrit.cloudera.org:8080/6995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Henry Robinson has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 2: Code-Review+1 (1 comment) I think there would be some benefit in having ReleaseResources() called in the same place for all queries, but agree that this solves the bug. http://gerrit.cloudera.org:8080/#/c/6897/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 918: DCHECK(!query_status_.ok()); Add a TODO that this is weird, since CancelInternal() is called on the successful query path as well (per UnregisterQuery()). -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function
Alex Behm has posted comments on this change. Change subject: IMPALA-4848: Add WIDTH_BUCKET() function .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6023/4/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 476: int256_t y = DecimalUtil::MultiplyByScale(ConvertToInt256(range_size.value()), > Agree. I think we'll need to be more stingy with the types. For example, the input args are converted to Decimal16 here, but I think if we used the smallest types, then we can avoid going to int256_t for Decimal4 and Decimal8. -- To view, visit http://gerrit.cloudera.org:8080/6023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Dan Hecht has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 2: Code-Review+1 (2 comments) LGTM but please see if Henry has any more comments. http://gerrit.cloudera.org:8080/#/c/6897/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS2, Line 1094: if (filter_mem_tracker_.get() != nullptr) { : filter_mem_tracker_->UnregisterFromParent(); : } does that belong here? if not, maybe add TODO to make it clear in the meantime. http://gerrit.cloudera.org:8080/#/c/6897/2/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: PS2, Line 450: schedule_->Release() What's that? Maybe it's referring to AdmissionController::ReleaseQuery()? -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 7: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/647/ -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#18). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 861 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/18 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 288: if (use_file_handle_cache && expected_local_) return Status::OK(); > you need to document that behavior in the scanrange declaration Updated ScanRange::Open documentation and the comment for exclusive_hdfs_fh_. http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 453: /// file handle from the file handle cache to hold for its exclusive use. > also document the behavior for non-local scan ranges Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 2: > Will wait for final patch before signing off. > > What was the reason behind moving ReleaseResources() out of Done()? > IIUC, the race is fixed by not releasing the QS reference until the > coordinator is destroyed. Leaving ReleaseResources() in Done() > seems natural, so I wondered what the thinking was. The thinking was that the coordinator already knows when it can release resources, and it seemed like a nice simplification of the public interface. -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 453: /// file handle from the file handle cache to hold for its exclusive use. also document the behavior for non-local scan ranges -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 17: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 288: if (use_file_handle_cache && expected_local_) return Status::OK(); you need to document that behavior in the scanrange declaration -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Henry Robinson has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 2: Will wait for final patch before signing off. What was the reason behind moving ReleaseResources() out of Done()? IIUC, the race is fixed by not releasing the QS reference until the coordinator is destroyed. Leaving ReleaseResources() in Done() seems natural, so I wondered what the thinking was. -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 16: This upload disables the caching for remote files. This also rebases to the latest (which involved minor merges on the tests due to the ADLS change). -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#17). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 857 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/17 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. IMPALA-5354: INSERT hints for Kudu tables A previous change, IMPALA-3742, added an exchange node and sort node to plans for inserts into Kudu tables to partition and sort the input to match the target table. This patch enables INSERT hints for Kudu tables - 'noshuffle' which removes the exchange node from the plan and 'noclustered' which removes the sort node. Insert hints have no effect for inserts that are small enough to result in a single node execution. Testing: - Updated FE planner and analysis tests. - Ran Kudu EE tests. Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Reviewed-on: http://gerrit.cloudera.org:8080/6980 Reviewed-by: Matthew Jacobs Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 8 files changed, 105 insertions(+), 34 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, but someone else must approve Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. Patch Set 4: Posting for discussion. The approach used is quite simple. The main risk of this approach is if the cardinality estimate is too low, we may reserve less memory for the join/agg than needed. If there is memory pressure overall in the query, this could potentially force a join/agg to spill and do small I/Os, which could be slow. A minimum spillable buffer size query option will provide a safety valve if the planner sets the buffer size too low. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Bumped Kudu version to 795c435
Impala Public Jenkins has posted comments on this change. Change subject: Bumped Kudu version to 795c435 .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bumped Kudu version to 795c435
Impala Public Jenkins has submitted this change and it was merged. Change subject: Bumped Kudu version to 795c435 .. Bumped Kudu version to 795c435 This is needed for IMPALA-5167. Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Reviewed-on: http://gerrit.cloudera.org:8080/6975 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates .. IMPALA-5160: adjust spill buffer size based on planner estimates Scale down the buffer size in hash joins and hash aggregations if estimates indicate that the build side of the join is small. This greatly reduces minimum memory requirements for joins in some common cases, e.g. small dimension tables. Currently this is not plumbed through to the backend and only takes effect in planner tests. Testing: Added targeted planner tests for small/mid/large/unknown memory requirements for aggregations and joins. Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/util/BitUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test 8 files changed, 919 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/4 -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5158: report buffer pool free memory in MemTracker .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6993/2/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: PS2, Line 259: an a -- To view, visit http://gerrit.cloudera.org:8080/6993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5158: report buffer pool free memory in MemTracker .. IMPALA-5158: report buffer pool free memory in MemTracker Clean pages and free buffers appear as untracked memory in the MemTracker hierarchy. This was misleading since the memory is tracked and present in the BufferPool. This change adds two MemTrackers below the process level that accounts for this memory. Updating global counters would be very inefficient and negate most of the effort put into making the buffer allocator scalable. Instead the values of the metrics are computed on demand by summing values across all of the arena in BufferAlloctor. The numbers reported are approximate because we do not lock any of the BufferAllocator state and therefore don't get a consistent view of the entire BufferAllocator at any moment in time. However they are accurate enough to understand the general state of the system. Also switches over ASAN to use a metric, similar to the regular TCMalloc build so that the behaviour under ASAN diverges less. Testing: Add some checks to unit tests to sanity-check that the numbers computed are valid. Manually tested by rebasing my buffer pool dev branch onto this change and running some spilling queries. The /memz page reported: Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB Buffer Pool: Free Buffers: Total=391.50 MB Buffer Pool: Clean Pages: Total=112.00 MB Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB Query(b9421063d13af70b:ddb99739): Reservation=0 ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB << snip >> Untracked Memory: Total=127.45 MB Manually tested the ASAN change by building under ASAN, running some queries, and inspecting the memz/ page. It reported a value of 100-200MB untracked memory, similar to the non-ASAN build. Change-Id: I007eb258377b33fff9f3246580d80fa551837078 --- M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/scheduling/admission-controller.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/metrics.json 14 files changed, 357 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6993/3 -- To view, visit http://gerrit.cloudera.org:8080/6993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5158: report buffer pool free memory in MemTracker .. IMPALA-5158: report buffer pool free memory in MemTracker Clean pages and free buffers appear as untracked memory in the MemTracker hierarchy. This was misleading since the memory is tracked and present in the BufferPool. This change adds two MemTrackers below the process level that accounts for this memory. Updating global counters would be very inefficient and negate most of the effort put into making the buffer allocator scalable. Instead the values of the metrics are computed on demand by summing values across all of the arena in BufferAlloctor. The numbers reported are approximate because we do not lock any of the BufferAllocator state and therefore don't get a consistent view of the entire BufferAllocator at any moment in time. However they are accurate enough to understand the general state of the system. Also switches over ASAN to use a metric, similar to the regular TCMalloc build so that the behaviour under ASAN diverges less. Testing: Add some checks to unit tests to sanity-check that the numbers computed are valid. Manually tested by rebasing my buffer pool dev branch onto this change and running some spilling queries. The /memz page reported: Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB Buffer pool: free buffers: Total=391.50 MB Buffer pool: clean pages: Total=112.00 MB Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB Query(b9421063d13af70b:ddb99739): Reservation=0 ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB << snip >> Untracked Memory: Total=127.45 MB Manually tested the ASAN change by building under ASAN, running some queries, and inspecting the memz/ page. It reported a value of 100-200MB untracked memory, similar to the non-ASAN build. Change-Id: I007eb258377b33fff9f3246580d80fa551837078 --- M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/scheduling/admission-controller.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/metrics.json 14 files changed, 357 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6993/2 -- To view, visit http://gerrit.cloudera.org:8080/6993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6993 Change subject: IMPALA-5158: report buffer pool free memory in MemTracker .. IMPALA-5158: report buffer pool free memory in MemTracker Clean pages and free buffers appear as untracked memory in the MemTracker hierarchy. This was misleading since the memory is tracked and present in the BufferPool. This change adds two MemTrackers below the process level that accounts for this memory. Updating global counters would be very inefficient and negate most of the effort put into making the buffer allocator scalable. Instead the values of the metrics are computed on demand by summing values across all of the arena in BufferAlloctor. The numbers reported are approximate because we do not lock any of the BufferAllocator state and therefore don't get a consistent view of the entire BufferAllocator at any moment in time. However they are accurate enough to understand the general state of the system. Also switches over ASAN to use a metric, similar to the regular TCMalloc build so that the behaviour under ASAN diverges less. Testing: Add some checks to unit tests to sanity-check that the numbers computed are valid. Manually tested by rebasing my buffer pool dev branch onto this change and running some spilling queries. The /memz page reported: Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB Buffer pool: free buffers: Total=391.50 MB Buffer pool: clean pages: Total=112.00 MB Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB Query(b9421063d13af70b:ddb99739): Reservation=0 ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB << snip >> Untracked Memory: Total=127.45 MB Manually tested the ASAN change by building under ASAN, running some queries, and inspecting the memz/ page. It reported a value of 100-200MB untracked memory, similar to the non-ASAN build. Change-Id: I007eb258377b33fff9f3246580d80fa551837078 --- M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/scheduling/admission-controller.cc M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/metrics.json 14 files changed, 357 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6993/1 -- To view, visit http://gerrit.cloudera.org:8080/6993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5369: Remove old pom parent in testdata module .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5369: Remove old pom parent in testdata module .. IMPALA-5369: Remove old pom parent in testdata module Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 Reviewed-on: http://gerrit.cloudera.org:8080/6992 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M testdata/pom.xml 1 file changed, 0 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/647/ -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 7: Code-Review+2 (2 comments) Carrying +2. http://gerrit.cloudera.org:8080/#/c/6970/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 241: public void computeHdfsStatsForTesting() { > computeFileStatsForTesting() Done Line 246: numHdfsFiles_ += partition.getNumFileDescriptors(); > cast not needed Done -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6970 to look at the new patch set (#7). Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd We need not account these hdfs table metrics on the Catalog server as they are eventually calculated again on the Impalads while unpacking the corresponding thrift table. This fix can potentially affect the frontend tests that directly load the Catalog's version of HdfsTable without the loadFromThrift() call paths that do the accounting. That is fixed by adding a separate call that computes these values and is called from ImpaladTestCatalog.getTable(). Testing: Enough tests already cover these code paths like show stats/ table or partition refresh tests etc. No new tests are added. Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 2 files changed, 28 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6970/7 -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Alex Behm has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6970/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 241: public void updateHdfsStatsForTesting() { computeFileStatsForTesting() Line 246: numHdfsFiles_ += (long) partition.getNumFileDescriptors(); cast not needed -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6970 to look at the new patch set (#6). Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd We need not account these hdfs table metrics on the Catalog server as they are eventually calculated again on the Impalads while unpacking the corresponding thrift table. This fix can potentially affect the frontend tests that directly load the Catalog's version of HdfsTable without the loadFromThrift() call paths that do the accounting. That is fixed by adding a separate call that computes these values and is called from ImpaladTestCatalog.getTable(). Testing: Enough tests already cover these code paths like show stats/ table or partition refresh tests etc. No new tests are added. Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 2 files changed, 28 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6970/6 -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5016: Simplify coalesce() in SimplifyConditionalsRule and do static partition pruning with coalesce() function.
Alex Behm has posted comments on this change. Change subject: IMPALA-5016: Simplify coalesce() in SimplifyConditionalsRule and do static partition pruning with coalesce() function. .. Patch Set 1: (8 comments) High-level comments. I'll take a closer look when we converge on the correct high-level approach first. http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: Line 88: private final ExprRewriter partitionMetadataRewriter; I don't think any changes should be made to HdfsPartitionPruner for this patch. See my other comment on Expr.optimizeConjuncts() http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/rewrite/CoalescePartitionMetadataRule.java File fe/src/main/java/org/apache/impala/rewrite/CoalescePartitionMetadataRule.java: Line 35: * Rewrites coalesce function using partition metadata, it can be applied before partition pruning. Let's please integrate this into the SimplifyConditionalsRule. I don't see this rule as being any different, and there is code/work duplication among this rule and your changes in SimplifyConditionalsRule. This rule is just another case of simplification. This rule should be generally applicable to any expr, not just those used for partition pruning. However, this rewrite rule will cause some queries to break when applied to exprs not used for partition pruning, e.g.: select coalesce(1, count(*) from t might be incorrectly rewritten to: select 1 from t The existing SimplifyConditionalsRule already handles these cases correctly (and this new rewrite will also just work when integrated into SimplifyConditionalsRule). Line 44: public CoalescePartitionMetadataRule(HdfsTable table, List partitionSlots) { These do not need to be passed. You can get this information from a SlotRef expr by looking at its SlotDescriptor. The SlotDescriptor's parent is a TupleDescriptor which has a reference to the Table. That should help with integrating this code into SimplifyConditionalsRule. Line 67: private boolean canRewriteWithExpr(Expr child) { This check does not look quite right because it will return 'true' for something like 'year + NULL'. The fact that the expr is bound by partition slots and that those partition slots cannot return NULL, does not mean that the overall expr must be non-NULL. The overall expr can still be NULL. You can imagine other Exprs that may return NULL for a non-NULL input. I think it would be better to keep this simple for now, and only rewrite if the child is a (possibly cast) SlotRef. http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/rewrite/PartitionMetadataRule.java File fe/src/main/java/org/apache/impala/rewrite/PartitionMetadataRule.java: Line 29: * Base class for all Expr rewrite rules that need using partition metadata. I don't think this is needed. We apply all rewrite rules before doing partition pruning for scans already. See Expr.optimizeConjuncts() which gets called in SingleNodePlanner.createScanNode(). http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java: Line 80: private Expr simplifyIfFunctionExpr(FunctionCallExpr expr) { simplifyIfFunction() Line 101: private Expr simplifyCoalesceFunctionExpr(FunctionCallExpr expr) { simplifyCoalesceFunction() http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 339: RewritesOk("coalesce(null, a, b)", rule, "coalesce(a, b)"); Needs tests for the partition SlotRef case. -- To view, visit http://gerrit.cloudera.org:8080/6990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b57267ce68ef882d73120b5054603b72867c7f5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: yu feng Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Sailesh Mukil has submitted this change and it was merged. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. IMPALA-5333: Add support for Impala to work with ADLS This patch leverages the AdlFileSystem in Hadoop to allow Impala to talk to the Azure Data Lake Store. This patch has functional changes as well as adds test infrastructure for testing Impala over ADLS. We do not support ACLs on ADLS since the Hadoop ADLS connector does not integrate ADLS ACLs with Hadoop users/groups. For testing, we use the azure-data-lake-store-python client from Microsoft. This client seems to have some consistency issues. For example, a drop table through Impala will delete the files in ADLS, however, listing that directory through the python client immediately after the drop, will still show the files. This behavior is unexpected since ADLS claims to be strongly consistent. Some tests have been skipped due to this limitation with the tag SkipIfADLS.slow_client. Tracked by IMPALA-5335. The azure-data-lake-store-python client also only works on CentOS 6.6 and over, so the python dependencies for Azure will not be downloaded when the TARGET_FILESYSTEM is not "adls". While running ADLS tests, the expectation will be that it runs on a machine that is at least running CentOS 6.6. Note: This is only a test limitation, not a functional one. Clusters with older OSes like CentOS 6.4 will still work with ADLS. Added another dependency to bootstrap_build.sh for the ADLS Python client. Testing: Ran core tests with and without TARGET_FILESYSTEM as 'adls' to make sure that all tests pass and that nothing breaks. Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Reviewed-on: http://gerrit.cloudera.org:8080/6910 Tested-by: Impala Public Jenkins Reviewed-by: Sailesh Mukil --- M bin/bootstrap_build.sh M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M infra/python/bootstrap_virtualenv.py A infra/python/deps/adls-requirements.txt M infra/python/deps/compiled-requirements.txt M infra/python/deps/pip_download.py M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_compressed_formats.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_hdfs_fd_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_scanners.py M tests/stress/test_ddl_stress.py A tests/util/adls_util.py M tests/util/filesystem_utils.py 43 files changed, 359 insertions(+), 49 deletions(-) Approvals: Impala Public Jenkins: Verified Sailesh Mukil: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Loads all TPC-DS tables
Michael Ho has posted comments on this change. Change subject: Loads all TPC-DS tables .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6877/6/testdata/workloads/tpcds/queries/tpcds-q34.test File testdata/workloads/tpcds/queries/tpcds-q34.test: Line 34:'Williamson County','Williamson County','Williamson County','Williamson County') > What's the deal with the repeated value? They are the suggested substitution parameters for the official qualification run defined in the TPC-DS specification. -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] Loads all TPC-DS tables
Dan Hecht has posted comments on this change. Change subject: Loads all TPC-DS tables .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6877/6/testdata/workloads/tpcds/queries/tpcds-q34.test File testdata/workloads/tpcds/queries/tpcds-q34.test: Line 34:'Williamson County','Williamson County','Williamson County','Williamson County') What's the deal with the repeated value? -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] Loads all TPC-DS tables
Hello Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6877 to look at the new patch set (#7). Change subject: Loads all TPC-DS tables .. Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 --- M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M testdata/bin/compute-table-stats.sh M testdata/bin/generate-schema-statements.py M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds.test M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-q1.test M testdata/workloads/tpcds/queries/tpcds-q19.test A testdata/workloads/tpcds/queries/tpcds-q2.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test M testdata/workloads/tpcds/queries/tpcds-q27.test A testdata/workloads/tpcds/queries/tpcds-q27a.test A testdata/workloads/tpcds/queries/tpcds-q28.test M testdata/workloads/tpcds/queries/tpcds-q3.test M testdata/workloads/tpcds/queries/tpcds-q34.test A testdata/workloads/tpcds/queries/tpcds-q4.test M testdata/workloads/tpcds/queries/tpcds-q42.test M testdata/workloads/tpcds/queries/tpcds-q43.test M testdata/workloads/tpcds/queries/tpcds-q46.test M testdata/workloads/tpcds/queries/tpcds-q47.test M testdata/workloads/tpcds/queries/tpcds-q52.test M testdata/workloads/tpcds/queries/tpcds-q53.test M testdata/workloads/tpcds/queries/tpcds-q55.test M testdata/workloads/tpcds/queries/tpcds-q59.test M testdata/workloads/tpcds/queries/tpcds-q6.test M testdata/workloads/tpcds/queries/tpcds-q61.test M testdata/workloads/tpcds/queries/tpcds-q63.test M testdata/workloads/tpcds/queries/tpcds-q65.test M testdata/workloads/tpcds/queries/tpcds-q68.test M testdata/workloads/tpcds/queries/tpcds-q7.test M testdata/workloads/tpcds/queries/tpcds-q73.test M testdata/workloads/tpcds/queries/tpcds-q79.test M testdata/workloads/tpcds/queries/tpcds-q88.test M testdata/workloads/tpcds/queries/tpcds-q89.test M testdata/workloads/tpcds/queries/tpcds-q96.test M testdata/workloads/tpcds/queries/tpcds-q98.test M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_tpcds_queries.py 41 files changed, 10,477 insertions(+), 3,827 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/6877/7 -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. IMPALA-5232: Parquet reader error message prints memory address instead of value Changed the argument of the message to print the size instead of the memory location. Testing: Automating the test would be non-trivial as it would require generating a corrupt parquet file to trigger the error condition. Also, the test would only have to check a change in the error message and not the control flow. Manually tested the error message by modifying the value of num_bytes and performing a select on a table in parquet format. Tested both condition with the following values: - num_bytes = -1 - num_bytes = *data_size + 1 Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Reviewed-on: http://gerrit.cloudera.org:8080/6982 Reviewed-by: Sailesh Mukil Tested-by: Impala Public Jenkins --- M be/src/exec/parquet-column-readers.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Sailesh Mukil: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Add several items to "known issues" page
Michael Brown has posted comments on this change. Change subject: [DOCS] Add several items to "known issues" page .. Patch Set 4: > I added IMPALA-3558 as suggested by Sailesh. But I couldn't find > Sailesh's comment in the gerrit UI to mark it as 'done'. You couldn't mark at as Done because it wasn't an inline comment but instead top-level. Best you can do is reply like you've done. -- To view, visit http://gerrit.cloudera.org:8080/5809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Add several items to "known issues" page
John Russell has posted comments on this change. Change subject: [DOCS] Add several items to "known issues" page .. Patch Set 4: I added IMPALA-3558 as suggested by Sailesh. But I couldn't find Sailesh's comment in the gerrit UI to mark it as 'done'. -- To view, visit http://gerrit.cloudera.org:8080/5809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Add several items to "known issues" page
Hello Michael Brown, Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5809 to look at the new patch set (#4). Change subject: [DOCS] Add several items to "known issues" page .. [DOCS] Add several items to "known issues" page IMPALA-4432 is actually an incompatible change, so it gets listed on the separate "Incompatible Changes" page. Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf --- M docs/topics/impala_incompatible_changes.xml M docs/topics/impala_known_issues.xml 2 files changed, 106 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/5809/4 -- To view, visit http://gerrit.cloudera.org:8080/5809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 15: (5 comments) http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 283: if (exclusive_hdfs_fh_ != nullptr) return Status::OK(); > when will this be true? The IO thread is calling ReadRange on each ScanRange. ReadRange calls Open whether or not that ScanRange was already open. This check short-circuits in the case that the ScanRange was already open. Line 351: VLOG_FILE << "Cache HDFS file handle file=" << file(); > this is not a meaningful log message, let's remove it Removed. Line 420: HdfsFileHandle* hdfs_file_from_cache = nullptr; > the name is somewhat misleading, because exclusive_hdfs_fh_ is still from t Changed to "borrowed_hdfs_fh" Line 484: // - or if the file handle is not from the cache > but exclusive_hdfs_fh_ is also from the cache, why wouldn't we retry in tha Changed to do retry for exclusive file handles. http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1271: bool cache_hit; > rename to dummy or something like that Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE
Lars Volker has posted comments on this change. Change subject: IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE .. Patch Set 1: (5 comments) Thank you for documenting this. Please see my inline comments. Let me know if you'd like to discuss them in person. http://gerrit.cloudera.org:8080/#/c/6981/1/docs/topics/impala_create_table.xml File docs/topics/impala_create_table.xml: Line 388: CREATE TABLE AS SELECT operation. Creating data files that are I think it's important to understand that the source table property does not affect the target table. CREATE TABLE TARGET AS SELECT * FROM SOURCE; with SOURCE having SORT BY() columns will not copy them over to TARGET. Line 389: sorted is most useful for Parquet tables, where the metadata includes the minimum and Here it should be clear that the information is stored in the file metadata inside each Parquet file, and not in our own metadata store. Line 390: maximum values for each column in each data file. Grouping data values together Technically, statistics are stored per RowGroup. Impala only writes 1 rowgroup per file, but that's a self imposed limitation. Line 400: evident with Parquet tables. We could mention here, that other file formats don't have statistics inside the file metadata. Line 412: tools that creat HDFS files, Impala does not guarantee or rely on the data being typo -- To view, visit http://gerrit.cloudera.org:8080/6981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd571cd8840368edb327d16d27192458838ef234 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alan Choi Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#16). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 849 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/16 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
John Russell has uploaded a new patch set (#2). Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. IMPALA-5326: [DOCS] Document REPLACE() function Included syntax, mention of performance benefit, and examples. Also added an item in the "new features" list with a link to the detailed writeup. Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da --- M docs/topics/impala_new_features.xml M docs/topics/impala_string_functions.xml 2 files changed, 67 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/6979/2 -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
John Russell has posted comments on this change. Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_new_features.xml File docs/topics/impala_new_features.xml: PS1, Line 70: A new function, replace(), which is faster than : regexp_replace() for simple string substitutions. : See Proposal: Done, with minor wordsmithing. (Editorial conventions to avoid passive voice & future tense.) http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: PS1, Line 890: - > Maybe best to alias the column/expression or use a different example. This Done PS1, Line 890: - > How about replace('hello world', '0', 'z') and just illustrate the original Done PS1, Line 890: - > The lowercase folding is just something for the column heading/alias. Done PS1, Line 890: - > This looks like a bug. It might be worth calling out the oddness here that I'll call out case-sensitivity in text rather than an example, as Zach suggested. -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 5: Code-Review-1 I see how this can break the tableSample patch tests. We do the following in HdfsTable#getFileSample() public Map> getFilesSample(List inputParts, long percentBytes, long randomSeed) { Preconditions.checkState(percentBytes >= 0 && percentBytes <= 100); // Conservative max size for Java arrays. The actual maximum varies // from JVM version and sometimes between configurations. final long JVM_MAX_ARRAY_SIZE = Integer.MAX_VALUE - 10; if (numHdfsFiles_ > JVM_MAX_ARRAY_SIZE) { throw new IllegalStateException(String.format( "Too many files to generate a table sample. " + "Table '%s' has %s files, but a maximum of %s files are supported.", getTableName().toString(), numHdfsFiles_, JVM_MAX_ARRAY_SIZE)); } int totalNumFiles = (int) numHdfsFiles_; <- The last line breaks only in tests because ImpaladTestCatalog#getTable() directly returns Catalog's version of HdfsTable during tests and doesn't rebuild it from Thrift. So basically the above line throws an exception. Alex, should we change the approach and fix the accounting instead? -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Michael Ho has posted comments on this change. Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries .. Patch Set 9: (16 comments) http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: PS9, Line 567: skippable_map skippable_map != nullptr PS9, Line 586: (*skippable_map)[i] == true (*skippable_map)[i] PS9, Line 586: 0 huh ? PS9, Line 590: bool skip_val http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS9, Line 153: Again No need for "Again". PS9, Line 159: /// This function is capable of doing codegen for either of the above functions; simply : /// pass a null value instead of a bool array pointer. If 'skippable_map' is not NULL, this will generate the equivalent of EvalConjunctsPreEval(). Otherwise, this generates EvalConjuncts(). Also describe the content of skippable_map. Alternately, we can keep this function private and expose two interfaces CodegenEvalConjuncts() and CodegenEvalConjunctsPreEval(). The former always passes nullptr for 'skippable_map' argument. PS9, Line 163: skippable_map Will 'skippable_conjuncts' be a more appropriate name ? http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-parquet-scanner-ir.cc File be/src/exec/hdfs-parquet-scanner-ir.cc: PS9, Line 88: LOG(INFO) << "tuple idx: " << tuple_index; debugging output ? http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS9, Line 1050: std::map>& DictionaryFilterPositionMap http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: PS9, Line 167: typedef std::map> DictFilterConjunctsPositionMap; Comment what this map stands for ? Slot Id -> index of the dictionary filter conjuncts in conjunct_ctxs_ ? PS9, Line 366: Mapping to original position in conjuncts Mapping from SlotId to ... http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS9, Line 206: IS_FILTERED Comment what IS_FILTERED stands for. PS9, Line 240: (filters == nullptr) ^ IS_FILTERED Is there a reason to not use DCHECK((filters == nullptr) != IS_FILTERED) ? http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: PS9, Line 390: return false; Is this hard coded for testing ? If not, please document why it's always false. http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/util/bitmap.h File be/src/util/bitmap.h: PS9, Line 86: (mask & buffer_[word_index]) (mask & buffer[word_index]) == 0 PS9, Line 99: buffer_.size() num_words_ -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5364: Correct title of query locations table
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5364: Correct title of query locations table .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6988/1//COMMIT_MSG Commit Message: Line 12: We should have per-backend reporting on the number of fragment instances It would be nice to have a JIRA for this and mention it here. http://gerrit.cloudera.org:8080/#/c/6988/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS1, Line 858: host nit: Mention this change in the commit message too. -- To view, visit http://gerrit.cloudera.org:8080/6988 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2fddd41500ada10eb1c12d92afcb96a82009857b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5016 rewrite coalesce() in SimplifyConditionalsRule.java, and try to do static partition pruning with coalesce() function.
Alex Behm has posted comments on this change. Change subject: IMPALA-5016 rewrite coalesce() in SimplifyConditionalsRule.java, and try to do static partition pruning with coalesce() function. .. Patch Set 1: No worries, the first patch is always the hardest :). Thanks for your fast response and continued work. -- To view, visit http://gerrit.cloudera.org:8080/6974 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I00aac2497ba51432037a6f9312805a63933ace87 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: yu feng Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-HasComments: No
[native-toolchain-CR] Ported native-toolchain to work on ppc64le
Matthew Jacobs has posted comments on this change. Change subject: Ported native-toolchain to work on ppc64le .. Patch Set 5: > (1 comment) Thanks Jim, I do realize that so like I said, I wouldn't hold this up if it doesn't match up. That said, if it does match then we know it's not affected. -- To view, visit http://gerrit.cloudera.org:8080/6468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a Gerrit-PatchSet: 5 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna Serrao Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: No
[native-toolchain-CR] Ported native-toolchain to work on ppc64le
Jim Apple has posted comments on this change. Change subject: Ported native-toolchain to work on ppc64le .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6468/5/source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch File source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch: Line 1: Only in breakpad-88e-p3/: my_breakpad > I was hoping we could compare the sha1sum of the binary before/after the ch SOme projects don't have reproducible builds: https://reproducible-builds.org/ https://wiki.debian.org/ReproducibleBuilds For projects like that, the sha1sum can change even without changing the source code, but by just building at a different time. -- To view, visit http://gerrit.cloudera.org:8080/6468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a Gerrit-PatchSet: 5 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna Serrao Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: Yes
[native-toolchain-CR] Ported native-toolchain to work on ppc64le
Matthew Jacobs has posted comments on this change. Change subject: Ported native-toolchain to work on ppc64le .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6468/5/source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch File source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch: Line 1: Only in breakpad-88e-p3/: my_breakpad > I checked if any interference, of ppc64le changes in x86 code, existed by b I was hoping we could compare the sha1sum of the binary before/after the change? I don't want to hold this up if that doesn't line up, but it would give us confidence that we're not affecting the x86 bits :) -- To view, visit http://gerrit.cloudera.org:8080/6468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a Gerrit-PatchSet: 5 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna Serrao Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/646/ -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bumped Kudu version to 795c435
Impala Public Jenkins has posted comments on this change. Change subject: Bumped Kudu version to 795c435 .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/645/ -- To view, visit http://gerrit.cloudera.org:8080/6975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bumped Kudu version to 795c435
Matthew Jacobs has posted comments on this change. Change subject: Bumped Kudu version to 795c435 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bumped Kudu version to 795c435
Thomas Tauber-Marshall has posted comments on this change. Change subject: Bumped Kudu version to 795c435 .. Patch Set 2: http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/394/ -- To view, visit http://gerrit.cloudera.org:8080/6975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bumped Kudu version to 795c435
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6975 to look at the new patch set (#2). Change subject: Bumped Kudu version to 795c435 .. Bumped Kudu version to 795c435 This is needed for IMPALA-5167. Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6975/2 -- To view, visit http://gerrit.cloudera.org:8080/6975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5316: Adds last day() function
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5316: Adds last_day() function .. Patch Set 2: (7 comments) Thanks for the contribution, Vincent. http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 279: void TestLastDayFunction() { can you check the results of all of these tests are the same as a reference database? Line 281: TestTimestampValue("last_day('2003-01-02')", can you add times to some of these? Line 282: TimestampValue::Parse("2003-01-31 00:00:00", 19)); for all of the below, use the Parse() overload that takes std::string, i.e. remove the str len. Line 316: TimestampValue::Parse("2100-02-28 00:00:00", 19)); I pointed out a few special cases to consider in the .cc file In addition: '1400-01-01 00:00:00' (first supported date) '-12-31 23:59:59' (last supported date) http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: PS2, Line 562: end_of_month() 1) I wonder if you can get a TimestampValue without a Date or with a "special" date, eg. maybe if you create a TIMESTAMP with just a time. Please test last_day(cast("00:00:00" as timestamp)); 2) I think we have to be sure to check the case when timestamp.date() is at the end of the supported date range (year 12-31- 23:59:59), I suspect it should return a valid date but we should check. I can't think of any other corner cases. PS2, Line 562: ptime pdate(timestamp.date().end_of_month()); : TimestampValue tsv(pdate); I think you can avoid converting to ptime and just use the TimestampValue constructor that takes a gregorian date and a time_duration, e.g. tsv(timestamp.date().end_of_month(), time_duration(0,0,0,0)) http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions.h File be/src/exprs/timestamp-functions.h: Line 198: static TimestampVal LastDay(FunctionContext* context, const TimestampVal& ts); please add a comment. look to other fns in the file for examples. -- To view, visit http://gerrit.cloudera.org:8080/6991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6897/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS1, Line 79: eos > this doesn't include the DML case, where query completion is signalled by t i'll clarify. -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has uploaded a new patch set (#2). Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. IMPALA-4890/5143: Coordinator race involving TearDown() TearDown() releases resources and destroys control structures (the QueryState reference), and it can be called while a concurrent thread executes Exec() or might call GetNext() in the future. The solution is not to destroy the control structures. This also releases resources automatically at the end of query execution. Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/client-request-state.cc 3 files changed, 34 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/6897/2 -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6897/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 895 > I believe that PlanRootSink() may set eos in GetNext() at the same time as good point, QueryResultSet is self-contained. i'll leave these two calls in place and add a todo, because getting rid of the blocking here isn't necessary for the bug fix. -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5369: Remove old pom parent in testdata module .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/644/ -- To view, visit http://gerrit.cloudera.org:8080/6992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5369: Remove old pom parent in testdata module .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Loads all TPC-DS tables
Hello Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6877 to look at the new patch set (#6). Change subject: Loads all TPC-DS tables .. Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 --- M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M testdata/bin/compute-table-stats.sh M testdata/bin/generate-schema-statements.py M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds.test M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-q1.test M testdata/workloads/tpcds/queries/tpcds-q19.test A testdata/workloads/tpcds/queries/tpcds-q2.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test M testdata/workloads/tpcds/queries/tpcds-q27.test A testdata/workloads/tpcds/queries/tpcds-q27a.test A testdata/workloads/tpcds/queries/tpcds-q28.test M testdata/workloads/tpcds/queries/tpcds-q3.test M testdata/workloads/tpcds/queries/tpcds-q34.test A testdata/workloads/tpcds/queries/tpcds-q4.test M testdata/workloads/tpcds/queries/tpcds-q42.test M testdata/workloads/tpcds/queries/tpcds-q43.test M testdata/workloads/tpcds/queries/tpcds-q46.test M testdata/workloads/tpcds/queries/tpcds-q47.test M testdata/workloads/tpcds/queries/tpcds-q52.test M testdata/workloads/tpcds/queries/tpcds-q53.test M testdata/workloads/tpcds/queries/tpcds-q55.test M testdata/workloads/tpcds/queries/tpcds-q59.test M testdata/workloads/tpcds/queries/tpcds-q6.test M testdata/workloads/tpcds/queries/tpcds-q61.test M testdata/workloads/tpcds/queries/tpcds-q63.test M testdata/workloads/tpcds/queries/tpcds-q65.test M testdata/workloads/tpcds/queries/tpcds-q68.test M testdata/workloads/tpcds/queries/tpcds-q7.test M testdata/workloads/tpcds/queries/tpcds-q73.test M testdata/workloads/tpcds/queries/tpcds-q79.test M testdata/workloads/tpcds/queries/tpcds-q88.test M testdata/workloads/tpcds/queries/tpcds-q89.test M testdata/workloads/tpcds/queries/tpcds-q96.test M testdata/workloads/tpcds/queries/tpcds-q98.test M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_tpcds_queries.py 41 files changed, 10,498 insertions(+), 3,848 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/6877/6 -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module
Grant Henke has uploaded a new change for review. http://gerrit.cloudera.org:8080/6992 Change subject: IMPALA-5369: Remove old pom parent in testdata module .. IMPALA-5369: Remove old pom parent in testdata module Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 --- M testdata/pom.xml 1 file changed, 0 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/6992/1 -- To view, visit http://gerrit.cloudera.org:8080/6992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Grant Henke
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Vincent Tran has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 5: I can make that happen. Can I just push the patch again as normal? -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
Jim Apple has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: David and I poked at this a little and tried to figure out what was going on - why, for instance, would installing the DateTime module make the datetime module work? We didn't figure it out. To compound the strangeness, while this line was required to get this working on my machine, now if I remove this line and `python ./infra/python/bootstrap_virtualenv.py -r`, starting the minicluster works. Alice laughed: "There's no use trying," she said; "one can't believe impossible things." "I daresay you haven't had much practice," said the Queen. "When I was younger, I always did it for half an hour a day. Why, sometimes I've believed as many as six impossible things before breakfast." -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/6980/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS1, Line 502: deringExprs.add( : KuduUtil.createPartitionEx > 1 line Done PS1, Line 504: orderingExprs.addAll(insertStmt.getPrimaryKeyExprs()); : } : } else if (insertStmt.hasClusteredHint() || !insertStmt.getSortExprs().isEmpty()) { : // NOTE: If the table has a 'sort.colum > how about refactoring this to KuduUtil to avoid duplicate code w/ Distribut Done http://gerrit.cloudera.org:8080/#/c/6980/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test: Line 213: > add test with both hints Done -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-5354: INSERT hints for Kudu tables .. IMPALA-5354: INSERT hints for Kudu tables A previous change, IMPALA-3742, added an exchange node and sort node to plans for inserts into Kudu tables to partition and sort the input to match the target table. This patch enables INSERT hints for Kudu tables - 'noshuffle' which removes the exchange node from the plan and 'noclustered' which removes the sort node. Insert hints have no effect for inserts that are small enough to result in a single node execution. Testing: - Updated FE planner and analysis tests. - Ran Kudu EE tests. Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 8 files changed, 105 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/6980/2 -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 5: I'm late to the party but we need a test that confirms the expr gets written correctly - e.g. in expr-test. -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent Tran Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 1: (1 comment) > (2 comments) > > Where does ReleaseResources() get called for DML queries? It used > to be in ClientRequestState::Done(), I think, but I don't see where > it gets called now (it must be, because tests are passing). This is now done implicitly as part of Cancel(). UnregisterQuery() calls CRS::Cancel (which in turn calls Coordinator::Cancel) before it calls Done(). http://gerrit.cloudera.org:8080/#/c/6897/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 113: if (query_state_ != nullptr) { : ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(query_state_); : } > I thought that in general we wanted to move away from work done in d'tors? we want to move away from release resources in d'tors, and from tearing down control structures prematurely (as was done in teardown()). the coordinator now has access to the query state until the coordinator itself goes away. -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes