[Impala-ASF-CR] Fix diagnostics path to not include the parent dir structure
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10347 ) Change subject: Fix diagnostics path to not include the parent dir structure .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I540f6c228a0315780d45cf11961f124478b5dd0c Gerrit-Change-Number: 10347 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 09:29:54 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix diagnostics path to not include the parent dir structure
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10347 ) Change subject: Fix diagnostics path to not include the parent dir structure .. Fix diagnostics path to not include the parent dir structure Without the fix, the diagnostics tar file included the entire directory structure of the diagnostics root dir. Before: === $ tar tf /tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh.tar.gz tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh/ tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh/stacks/ tmp/impala-diagnostics-2018-05-08-11-59-39-spv8Eh/stacks/jstack-0.txt After: = $ tar tf /tmp/impala-diagnostics-2018-05-08-12-01-51-Y0nlQI.tar.gz impala-diagnostics-2018-05-08-12-01-51-Y0nlQI/ impala-diagnostics-2018-05-08-12-01-51-Y0nlQI/stacks/ impala-diagnostics-2018-05-08-12-01-51-Y0nlQI/stacks/jstack-0.txt . Tested with python 2.6 Change-Id: I540f6c228a0315780d45cf11961f124478b5dd0c Reviewed-on: http://gerrit.cloudera.org:8080/10347 Reviewed-by: Bharath Vissapragada Tested-by: Impala Public Jenkins --- M bin/diagnostics/collect_diagnostics.py 1 file changed, 4 insertions(+), 1 deletion(-) Approvals: Bharath Vissapragada: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I540f6c228a0315780d45cf11961f124478b5dd0c Gerrit-Change-Number: 10347 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#10). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M tests/metadata/test_last_ddl_time_update.py 7 files changed, 203 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/10 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Hello Gabor Kaszab, Zoltan Borok-Nagy, Csaba Ringhofer, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9986 to look at the new patch set (#5). Change subject: IMPALA-3307: Add support for IANA time-zone db .. IMPALA-3307: Add support for IANA time-zone db Impala currently uses two different libraries for timestamp manipulations: boost and glibc. Issues with boost: - Time-zone database is currently hard coded in timezone_db.cc. Impala admins cannot update it without upgrading Impala. - Time-zone database is flat, therefore can’t track year-to-year changes. - Time-zone database is not updated on a regular basis. Issues with glibc: - Uses /usr/share/zoneinfo/ database which could be out of sync on some of the nodes in the Impala cluster. - Uses the host system’s local time-zone. Different nodes in the Impala cluster might use a different local time-zone. - Conversion functions take a global lock, which causes severe performance degradation. In addition to the issues above, the fact that /usr/share/zoneinfo/ and the hard-coded boost time-zone database are both in use is a source of inconsistency in itself. This patch makes the following changes: - Instead of boost and glibc, impalad uses Google's CCTZ to implement time-zone conversions. - Introduces a new startup flag (--hdfs_zone_info_dir) to impalad to specify an HDFS/S3/ADLS location that contains the shared compiled IANA time-zone database. If the startup flag is set, impalad will use the specified time-zone database. Otherwise, impalad will use the default /usr/share/zoneinfo time-zone database. - impalad reads the entire time-zone database into an in-memory map on startup for fast lookups. - The name of the coordinator node’s local time-zone is saved to the query context when preparing query execution. This time-zone is used whenever the current time-zone is referred afterwards in an execution node. - Introduces a new startup flag (--hdfs_zone_abbrev_conf) to impalad to specify an HDFS/S3/ADLS path to a shared config file that contains definitions for non-standard time-zone abbreviations. Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/common/global-types.h M be/src/exec/data-source-scan-node.cc M be/src/exec/data-source-scan-node.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M be/src/exprs/expr-test.cc M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc A be/src/exprs/timezone_db-test.cc M be/src/exprs/timezone_db.cc M be/src/exprs/timezone_db.h M be/src/runtime/raw-value-test.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M be/src/service/impala-server.cc M be/src/service/impalad-main.cc M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/hdfs-util-test.cc M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M be/src/util/time-test.cc M be/src/util/time.cc M be/src/util/time.h M bin/bootstrap_toolchain.py M bin/impala-config.sh A cmake_modules/FindCctz.cmake M common/thrift/ImpalaInternalService.thrift M common/thrift/metrics.json M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/bin/create-load-data.sh M testdata/data/timezoneverification.csv A testdata/tzdb/abbrev.conf A testdata/tzdb/zoneinfo/AmerICA/ArgeNTINA/MendOZA A testdata/tzdb/zoneinfo/AmerICA/CancUN A testdata/tzdb/zoneinfo/UTC M testdata/workloads/functional-query/queries/QueryTest/exprs.test A tests/custom_cluster/test_custom_tzdb.py 53 files changed, 2,556 insertions(+), 1,096 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/9986/5 -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 5 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@526 PS4, Line 526: const string& tz_name = (start_lookup.abbr != nullptr) ? start_lookup.abbr : : context->impl()->state()->local_time_zone()->name(); > What is the goal of this logic? To print timezone abbreviations instead of Both, I guess. I just replicated the original behavior: localtime_r() sets tzone.tm_zone to the time-zone abbreviation. Added a comment. I don't think that the TimestampValue class would be a good place for this function. The timezone abbreviation corresponds to 'start_unix_millis', which is not a TimestampValue. I'll keep this logic here for now. http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@93 PS4, Line 93: inline bool CheckIfDateOutOfRange(const cctz::civil_day& date) { : static const cctz::civil_day max_date(TimestampFunctions::MAX_YEAR, 12, 31); : static const cctz::civil_day min_date(TimestampFunctions::MIN_YEAR, 1, 1); : return date < min_date || date > max_date; : } > This could be simpler and possibly faster by expecting a cctz::civil_second Done http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@128 PS4, Line 128: > cctz explains pretty well the handling of dst boundaries, maybe we could ad Done http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@129 PS4, Line 129: // In case of ambiguity invalidate TimestampValue. : const cctz::time_zone::civil_lookup from_cl = local_tz->lookup(from_cs); : if (UNLIKELY(from_cl.kind != cctz::time_zone::civil_lookup::UNIQUE)) { : SetToInvalidDateTime(); : } else { : cctz::time_point from_tp = from_cl.pre; : : // Convert 'from_tp' time_point to civil_second assuming 'UTC' time-zone. : cctz::civil_second to_cs = cctz::convert(from_tp, TimezoneDatabase::GetUtcTimezone()); : : // boost::gregorian::date() throws boost::gregorian::bad_year if year is not in the : // 1400.. range. Need to check validity before creating the date object. : if (UNLIKELY(CheckIfDateOutOfRange(cctz::civil_day(to_cs { > I may be possible to get TimestampValue from cctz::time_zone::civil_lookup Done http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/runtime/timestamp-value.cc@146 PS4, Line 146: time_ = boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), to_cs.second(), > nit: long line Done -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 15:28:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Update version to 3.1.0-SNAPSHOT
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/10360 ) Change subject: Update version to 3.1.0-SNAPSHOT .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10360 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2e2df73d27fa332d17fec4e9aa469ea91b14167 Gerrit-Change-Number: 10360 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Thu, 10 May 2018 15:30:01 + Gerrit-HasComments: No
[Impala-ASF-CR] Update version to 3.1.0-SNAPSHOT
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10360 ) Change subject: Update version to 3.1.0-SNAPSHOT .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2445/ -- To view, visit http://gerrit.cloudera.org:8080/10360 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2e2df73d27fa332d17fec4e9aa469ea91b14167 Gerrit-Change-Number: 10360 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Thu, 10 May 2018 15:31:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/10354 ) Change subject: IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb Gerrit-Change-Number: 10354 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 16:02:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10354 ) Change subject: IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2447/ -- To view, visit http://gerrit.cloudera.org:8080/10354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb Gerrit-Change-Number: 10354 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 16:03:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: Code-Review+1 Sorry, some of the authorization test changes were missing from patch 9. Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 16:10:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6998: test bloom wait time fails due to late arrival of filters on Isilon
Sailesh Mukil has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10366 Change subject: IMPALA-6998: test_bloom_wait_time fails due to late arrival of filters on Isilon .. IMPALA-6998: test_bloom_wait_time fails due to late arrival of filters on Isilon This test has been failing on Isilon runs, most likely due to timing issues which makes it a test issue rather than a product bug. This patch disables the test for Isilon. We should revisit what tests we run on non-HDFS filesystems later on, but until then, this should unblock the build. Change-Id: I2df6983a65a50b7efdd482124b70f518ee4c3229 --- M tests/query_test/test_runtime_filters.py 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/10366/1 -- To view, visit http://gerrit.cloudera.org:8080/10366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2df6983a65a50b7efdd482124b70f518ee4c3229 Gerrit-Change-Number: 10366 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR](2.x) IMPALA-6972: Disable parallel dataload on MINICLUSTER PROFILE=2
Hello Philip Zeyliger, Impala Public Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/10367 to review the following change. Change subject: IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 .. IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 There is a Hive bug in Hive 1.1.0 that can result in a NullPointerException when doing parallel Hive operations (see IMPALA-6532). Since dataload goes parallel on Hive loads starting with IMPALA-6372, dataload can hit this error on Hive 1.1.0 (i.e. IMPALA_MINICLUSTER_PROFILE=2). This is impacting builds on the 2.x branch. This disables parallel dataload for IMPALA_MINICLUSTER_PROFILE=2. IMPALA_MINICLUSTER_PROFILE=3 uses a newer version of Hive that has a fix for this, so this continues to use parallel dataload for that case. Parallelism can be reenabled when Hive 1.1.0 gets the fix from Hive 2.1.1. Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Reviewed-on: http://gerrit.cloudera.org:8080/10306 Reviewed-by: Philip Zeyliger Tested-by: Impala Public Jenkins --- M bin/impala-config.sh M bin/load-data.py M testdata/bin/create-load-data.sh 3 files changed, 12 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/10367/1 -- To view, visit http://gerrit.cloudera.org:8080/10367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: newchange Gerrit-Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Gerrit-Change-Number: 10367 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 10: (8 comments) http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235 PS10, Line 1235:* TODO: the following paragraph seems at least partially obsolate Why not clean it up then? Point (1) is obsolete but point (2) is still valid. http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@244 PS10, Line 244: Long.toString(System.currentTimeMillis() / 1000)); Doesn't the HMS set this in alter_table()? http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@785 PS10, Line 785: msTbl.putToParameters("impala.lastComputeStatsTime", Create a constant for the table property in Table similar to what we have in HdfsTable, e.g. TBL_PROP_SKIP_HEADER_LINE_COUNT http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2868 PS10, Line 2868: // HMS updates transient_lastDdlTime if it is removed. I understand what this comment says, but I don't understand it's relevance in this context. Does that mean we should add a dummy time if !setLastDdlTime? http://gerrit.cloudera.org:8080/#/c/10116/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2870 PS10, Line 2870: Long.toString(System.currentTimeMillis() / 1000)); We're doing "System.currentTimeMillis() / 1000" in quite a few places, I suggest adding a helper function in Table or MetastoreUtils or similar. http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@74 PS10, Line 74: def run_test(self, query, expect_changed_ddl_time, expect_changed_stats_time): Passing a ";" delimited string is really weird. Why not pass a list of queries? As far as I can tell there's a single use of feature in L229, can just split up those two queries? http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@93 PS10, Line 93: # Hive uses a seconds granularity on the last ddl time. Not just Hive - we do too. Just say "All times are stored in seconds" or something http://gerrit.cloudera.org:8080/#/c/10116/10/tests/metadata/test_last_ddl_time_update.py@155 PS10, Line 155: h.expect_no_time_change("drop incremental stats %(TBL)s partition (j=1, s='2012')") use "drop stats" instead to wipe everything (including incremental stats) -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 16:58:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10158 ) Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@522 PS15, Line 522: query_events_->MarkEvent(exec_state_to_event.at(new_state)); : // TODO: is this needed? This will also happen on the "backend" side of cancel rpc. : // And in the case of EOS, sink already knows this. > makes sense. I filed IMPALA-7011 for that. http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc@132 PS16, Line 132: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc@618 PS16, Line 618: > concurrently Done -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 14 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 17:08:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10158 to look at the new patch set (#17). Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. IMPALA-5384, part 2: Simplify Coordinator locking and clarify state The is the final change to clarify and break up the Coordinator's lock. The state machine for the coordinator is made explicit, distinguishing between executing state and multiple terminal states. Logic to transition into a terminal state is centralized in one location and executes exactly once for each coordinator object. Derived from a patch for IMPALA_5384 by Marcel Kornacker. Testing: - exhaustive functional tests - stress test on minicluster with memory overcommitment. Verified from the logs that this exercises all these paths: - successful queries - client requested cancellation - error from exec FInstances RPC - error reported asynchronously via report status RPC - eos before backend execution completed Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e --- M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/client-request-state.cc M be/src/service/impala-server.h M be/src/util/counting-barrier.h 6 files changed, 389 insertions(+), 391 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/17 -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 17 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10158 ) Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. Patch Set 18: Code-Review+1 Thanks for the reviews! rebase and carry. -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 18 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 17:10:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5216: Make admission control queuing async
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10060 ) Change subject: IMPALA-5216: Make admission control queuing async .. Patch Set 7: (5 comments) Will have some more comments but wanted to get the discussion going on these. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.h@203 PS7, Line 203: cancelled given that there's no Cancel() API in this class, how is that done? (I'm guessing the client can set the promise, but in any case, let's explain it). http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@671 PS7, Line 671: if (coord() != NULL) { : RETURN_IF_ERROR(coord()->Wait()); : RETURN_IF_ERROR(UpdateCatalog()); : } now that Execute() doesn't set coord(), isn't this racy? i.e. this wait thread could get here before the admission control submission thread created and set the Coordinator. That wasn't possible before because the order was Execute(); // sets coord_ if necessary. WaitAsync() http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@897 PS7, Line 897: if (uses_admission_control()) admit_outcome_.TrySet(AdmissionOutcome::CANCELLED); Does this somehow cause the entry in the AC queue to be removed proactively? I don't think it's clear how this works from just reading the AC public header. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@1147 PS7, Line 1147: lock_guard l(lock_); : if (!UpdateQueryStatus(admit_status).ok()) return; : if (is_cancelled_) { : VLOG_QUERY << "Cancelled right after successful admission, query id=" :<< schedule_->query_id(); : // If query cancelled after being admitted, release Admission control resources. : exec_env_->admission_controller()->ReleaseQuery(*schedule_.get()); : // If cancellation was user initiated, then query_status would be OK, therefore : // set it to CANCELLED. : discard_result(UpdateQueryStatus(Status::CANCELLED)); : return; : } why is that needed? I guess it's really just an optimization to avoid Coordinator::Exec()/Cancel() for when the cancel happens in this small window, is that right? Hmm, or is the the log misleading and this path is taken when the cancel happens while queued? I think that's true, and if so, then it is worthwhile, but then you should reword the log/comments. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@1171 PS7, Line 1171:lock_guard l(lock_); : if (!UpdateQueryStatus(exec_status).ok()) return; : cancelled = is_cancelled_; : if (!cancelled) { : // This needs to be done while holding lock_ as it will ensure that coord_ is : // visible to the cancellation thread if the query is marked cancelled after exiting : // this critical section. : coord_exec_started_.Store(true); : } else { : VLOG_QUERY << "Cancelled right after starting the coordinator query id=" : << schedule_->query_id(); : discard_result(UpdateQueryStatus(Status::CANCELLED)); : // If there was already a non OK query status before we set it to CANCELLED then : // take it to update coord_. : cancellation_status = query_status(); : } : } As alluded to earlier, I think we might be able to simplfiying this by not setting the coord_ pointer until we do the final cancellation check. -- To view, visit http://gerrit.cloudera.org:8080/10060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf Gerrit-Change-Number: 10060 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 17:45:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5216: Make admission control queuing async
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10060 ) Change subject: IMPALA-5216: Make admission control queuing async .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@671 PS7, Line 671: if (coord() != NULL) { : RETURN_IF_ERROR(coord()->Wait()); : RETURN_IF_ERROR(UpdateCatalog()); : } > now that Execute() doesn't set coord(), isn't this racy? i.e. this wait thr Oh, I guess that's what the Join() is for on line 660. Could you add a comment there saying something like: // Wait until the query has passed through admission control and is running. (Later when we clean up and clearly define the possible CLR states, I think this will become easier to remember). -- To view, visit http://gerrit.cloudera.org:8080/10060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf Gerrit-Change-Number: 10060 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 17:50:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR](2.x) IMPALA-6972: Disable parallel dataload on MINICLUSTER PROFILE=2
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/10367 ) Change subject: IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Gerrit-Change-Number: 10367 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 18:00:09 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6972: Disable parallel dataload on MINICLUSTER PROFILE=2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10367 ) Change subject: IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2448/ -- To view, visit http://gerrit.cloudera.org:8080/10367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Gerrit-Change-Number: 10367 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 18:04:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6035: Add query options to limit reserved threads
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10365 Change subject: IMPALA-6035: Add query options to limit reserved threads .. IMPALA-6035: Add query options to limit reserved threads Adds two options: THREAD_RESERVATION_LIMIT and THREAD_RESERVATION_AGGREGATE_LIMIT, which are both enforced by admission control based on planner resource requirements and the schedule. The mechanism used is the same as the minimum reservation checks. THREAD_RESERVATION_LIMIT limits the total number of reserved threads in fragments scheduled on a single backend. THREAD_RESERVATION_AGGREGATE_LIMIT limits the sum of reserved threads across all fragments. This also slightly improves the minimum reservation error message to include the host name. Testing: Added end-to-end tests that exercise the code paths. Ran core tests. Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M testdata/workloads/functional-query/queries/QueryTest/admission-reject-min-reservation.test A testdata/workloads/functional-query/queries/QueryTest/resource-limits.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test A tests/query_test/test_resource_limits.py 12 files changed, 244 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/10365/3 -- To view, visit http://gerrit.cloudera.org:8080/10365 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007 Gerrit-Change-Number: 10365 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-6035: Add query options to limit reserved threads
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10365 ) Change subject: IMPALA-6035: Add query options to limit reserved threads .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/10365/3/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/10365/3/be/src/scheduling/query-schedule.h@61 PS3, Line 61: int64_t min_reservation_bytes = 0; This looked like a bug, because these were never explicitly initialised. Actually it isn't a bug because std::map guarantees primitives in values are zero-initialised if there isn't a default constructor: http://en.cppreference.com/w/cpp/container/map/operator_at -- To view, visit http://gerrit.cloudera.org:8080/10365 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007 Gerrit-Change-Number: 10365 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 18:18:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 5: (19 comments) Thanks for dealing with my previous comments, Attila! I filed some more but basically I'm fine with the changes and feel free to start involving someone with more experience on this topic. http://gerrit.cloudera.org:8080/#/c/9986/4/CMakeLists.txt File CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9986/4/CMakeLists.txt@281 PS4, Line 281: Cctz nit: CCTZ http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@139 PS4, Line 139: new_time_zone_(time_zone), new_tz_ >From reading the names of these 2 variables it's not clear what de difference >is. Can you have a new_time_zone_ and a new_time_zone_name_ or such? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@140 PS4, Line 140: /*overwrite*/ Do you need this comment? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@153 PS4, Line 153: Timezone * nit: Timezone* Expect..() http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/expr-test.cc@155 PS4, Line 155: new_tz_ = TimezoneDatabase::FindTimezone(new_time_zone_); I wonder if it makes sense to do this assignment in the constructor and then this function can be changed to something like "getTimezone" that simply returns the corresponding member. http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@503 PS4, Line 503: namespace Why did you need this namespace? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions-ir.cc@504 PS4, Line 504: nline What is the plan to get rid of the "Duplicate code" TODOs in this review? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc@55 PS4, Line 55: // This should raise some sort of error or at least return null. Hive just ignores it. Shouldn't this be a TODO? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timestamp-functions.cc@87 PS4, Line 87: // This should raise some sort of error or at least return null. Hive just ignores it. Same as above http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc File be/src/exprs/timezone_db-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@57 PS4, Line 57: TzAbbev nit: TzAbbrev? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@68 PS4, Line 68: // Abbreviations must start with an uppercase letter. If it has to start with an uppercase letter, can we add a test this with an input that would be valid if the first letter was uppercase? e.g. "pST", "pst", "singapore" etc. http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db-test.cc@105 PS4, Line 105: // Misformatted time-zone names. Can you again play around with upper vs lower case letters here? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@1 PS4, Line 1: // with the License. You may obtain a copy of the License at Hmm, is the top of the Apache comment missing? http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@23 PS4, Line 23: /// 'TimezoneDatabase' class contains functions to load and access the IANA time-zone Nice description, thx :) http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.h@91 PS4, Line 91: tz_seg nit: might be just my preference but tz_segment is still short and I think better to read. Up to you. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147 PS1, Line 147: > Returning a pointer just to be able to signal failure with nullptr seems co What I meant is that you could get rid of the offset_sec paramater if you changed the return value to int64_t* and you would still be able to detect failure if the return value is null. Not a big deal, though. Your choice. http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/4/be/src/exprs/timezone_db.cc@365 PS4, Line 365: hdfsFile hdfs_file = hdfsOpenFile( Just for my information: LoadZoneInfoFromHdfs() copies the zone info file to a local dir, meanwhile this function uses hdfsOpenFile to get the abbrev data. Is there a reason that they don't follow the same approac
[Impala-ASF-CR] Update version to 3.1.0-SNAPSHOT
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10360 ) Change subject: Update version to 3.1.0-SNAPSHOT .. Update version to 3.1.0-SNAPSHOT Change-Id: Ia2e2df73d27fa332d17fec4e9aa469ea91b14167 Reviewed-on: http://gerrit.cloudera.org:8080/10360 Reviewed-by: Michael Brown Tested-by: Impala Public Jenkins --- M bin/save-version.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Michael Brown: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10360 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia2e2df73d27fa332d17fec4e9aa469ea91b14167 Gerrit-Change-Number: 10360 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] Update version to 3.1.0-SNAPSHOT
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10360 ) Change subject: Update version to 3.1.0-SNAPSHOT .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10360 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2e2df73d27fa332d17fec4e9aa469ea91b14167 Gerrit-Change-Number: 10360 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Thu, 10 May 2018 18:55:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6966: sort table memory by size in catalogd web UI
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10292 ) Change subject: IMPALA-6966: sort table memory by size in catalogd web UI .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2449/ -- To view, visit http://gerrit.cloudera.org:8080/10292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85 Gerrit-Change-Number: 10292 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 10 May 2018 19:17:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6966: sort table memory by size in catalogd web UI
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/10292 ) Change subject: IMPALA-6966: sort table memory by size in catalogd web UI .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85 Gerrit-Change-Number: 10292 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 10 May 2018 19:17:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10354 ) Change subject: IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb Gerrit-Change-Number: 10354 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 19:27:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10354 ) Change subject: IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell .. IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell sqlparse-0.1.19 is the last version of sqlparse that supports Python 2.6. Testing: - Ran all end-to-end tests Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb Reviewed-on: http://gerrit.cloudera.org:8080/10354 Reviewed-by: David Knupp Tested-by: Impala Public Jenkins --- M LICENSE.txt M README.md M bin/rat_exclude_files.txt M shell/.gitignore D shell/ext-py/sqlparse-0.1.14/PKG-INFO D shell/ext-py/sqlparse-0.1.14/setup.cfg A shell/ext-py/sqlparse-0.1.19/.travis.yml R shell/ext-py/sqlparse-0.1.19/AUTHORS R shell/ext-py/sqlparse-0.1.19/CHANGES R shell/ext-py/sqlparse-0.1.19/COPYING R shell/ext-py/sqlparse-0.1.19/MANIFEST.in R shell/ext-py/sqlparse-0.1.19/README.rst R shell/ext-py/sqlparse-0.1.19/TODO R shell/ext-py/sqlparse-0.1.19/bin/sqlformat R shell/ext-py/sqlparse-0.1.19/docs/source/analyzing.rst R shell/ext-py/sqlparse-0.1.19/docs/source/api.rst R shell/ext-py/sqlparse-0.1.19/docs/source/changes.rst R shell/ext-py/sqlparse-0.1.19/docs/source/conf.py R shell/ext-py/sqlparse-0.1.19/docs/source/index.rst R shell/ext-py/sqlparse-0.1.19/docs/source/indices.rst R shell/ext-py/sqlparse-0.1.19/docs/source/intro.rst R shell/ext-py/sqlparse-0.1.19/docs/source/ui.rst R shell/ext-py/sqlparse-0.1.19/docs/sqlformat.1 R shell/ext-py/sqlparse-0.1.19/pytest.ini R shell/ext-py/sqlparse-0.1.19/setup.py R shell/ext-py/sqlparse-0.1.19/sqlparse/__init__.py R shell/ext-py/sqlparse-0.1.19/sqlparse/engine/__init__.py R shell/ext-py/sqlparse-0.1.19/sqlparse/engine/filter.py R shell/ext-py/sqlparse-0.1.19/sqlparse/engine/grouping.py R shell/ext-py/sqlparse-0.1.19/sqlparse/exceptions.py R shell/ext-py/sqlparse-0.1.19/sqlparse/filters.py R shell/ext-py/sqlparse-0.1.19/sqlparse/formatter.py R shell/ext-py/sqlparse-0.1.19/sqlparse/functions.py R shell/ext-py/sqlparse-0.1.19/sqlparse/keywords.py R shell/ext-py/sqlparse-0.1.19/sqlparse/lexer.py R shell/ext-py/sqlparse-0.1.19/sqlparse/pipeline.py R shell/ext-py/sqlparse-0.1.19/sqlparse/sql.py R shell/ext-py/sqlparse-0.1.19/sqlparse/tokens.py R shell/ext-py/sqlparse-0.1.19/sqlparse/utils.py R shell/ext-py/sqlparse-0.1.19/tests/__init__.py R shell/ext-py/sqlparse-0.1.19/tests/files/_Make_DirEntry.sql R shell/ext-py/sqlparse-0.1.19/tests/files/begintag.sql R shell/ext-py/sqlparse-0.1.19/tests/files/begintag_2.sql R shell/ext-py/sqlparse-0.1.19/tests/files/dashcomment.sql R shell/ext-py/sqlparse-0.1.19/tests/files/function.sql R shell/ext-py/sqlparse-0.1.19/tests/files/function_psql.sql R shell/ext-py/sqlparse-0.1.19/tests/files/function_psql2.sql R shell/ext-py/sqlparse-0.1.19/tests/files/function_psql3.sql R shell/ext-py/sqlparse-0.1.19/tests/files/huge_select.sql R shell/ext-py/sqlparse-0.1.19/tests/files/test_cp1251.sql R shell/ext-py/sqlparse-0.1.19/tests/test_filters.py R shell/ext-py/sqlparse-0.1.19/tests/test_format.py R shell/ext-py/sqlparse-0.1.19/tests/test_functions.py R shell/ext-py/sqlparse-0.1.19/tests/test_grouping.py R shell/ext-py/sqlparse-0.1.19/tests/test_parse.py R shell/ext-py/sqlparse-0.1.19/tests/test_pipeline.py R shell/ext-py/sqlparse-0.1.19/tests/test_regressions.py R shell/ext-py/sqlparse-0.1.19/tests/test_split.py R shell/ext-py/sqlparse-0.1.19/tests/test_tokenize.py R shell/ext-py/sqlparse-0.1.19/tests/utils.py R shell/ext-py/sqlparse-0.1.19/tox.ini 61 files changed, 558 insertions(+), 188 deletions(-) Approvals: David Knupp: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb Gerrit-Change-Number: 10354 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] test-with-docker: work with git worktree
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10335 ) Change subject: test-with-docker: work with git worktree .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10335/1/docker/test-with-docker.py File docker/test-with-docker.py: http://gerrit.cloudera.org:8080/#/c/10335/1/docker/test-with-docker.py@417 PS1, Line 417: self.git_common_dir = os.path.realpath( : _check_output(["git", "rev-parse", "--git-common-dir"]).strip()) When I run this on one of my normal checkouts, it does this: $ git rev-parse --git-common-dir --git-common-dir This is on git version 1.9.1, so maybe that is old. -- To view, visit http://gerrit.cloudera.org:8080/10335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9186e0b6f068aacc25f8d691508165c04329fa8b Gerrit-Change-Number: 10335 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 20:19:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7000: [DOCS] Correct info about dedicated executors
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/10357 ) Change subject: IMPALA-7000: [DOCS] Correct info about dedicated executors .. Patch Set 1: Hi Juan, Do you have any comment on this change? Thank you! -- To view, visit http://gerrit.cloudera.org:8080/10357 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b7e6c57188a41a45d5813882b6dbc37cf47cf1f Gerrit-Change-Number: 10357 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Comment-Date: Thu, 10 May 2018 20:19:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 ) Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. Patch Set 11: (1 comment) getting back to this change... the upcoming update includes the idea to better separate the split specs vs. concrete scan ranges. rebase changes caught this in the middle so the update also includes rebase changes (only affected HdfsScanNode.java). http://gerrit.cloudera.org:8080/#/c/8523/11/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/11/common/thrift/PlanNodes.thrift@228 PS11, Line 228: 4: optional THdfsFileSplitGeneratorSpec hdfs_file_split_generator_spec > will try it out. The additional changes are relatively mechanical. Since this design explicitly does not allow these specs to propagate further in the backend, I agree that we're better off going this route. -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 11 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 10 May 2018 21:42:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Hello Lars Volker, Dimitris Tsirogiannis, Alex Behm, Mostafa Mokhtar, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8523 to look at the new patch set (#12). Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. IMPALA-5931: Generates scan ranges in planner for s3/adls Currently, for filesystems that do not include physical block information (e.g., block replica locations, caching), synthetic blocks are generated and stored in the catalog when metadata is loaded. Example file systems for which this is done includes S3, ADLS, and local fs. This change avoids generating these blocks when metadata is loaded. Instead, scan ranges are directly generated from such files by the backend coordinator. Previously, all scan ranges were produced by the planner in HDFSScanNode in the frontend. Now, those files without block information are sent to the coordinator represented by a split specification that determines how the coordinator will create scan ranges to send to executors. This change reduces the space needed in the catalog and reduces the scan range data structures that are passed from the frontend to the backend when planning and coordinating a query. In addition a bug is avoided where non-splittable files were being split anyways to support the query parameter that places a limit on scan ranges. Testing: - added backend scheduler tests - mixed-filesystems test covers tables/queries with multiple fs's. - local fs tests cover the code paths in this change - all core tests pass when configured with s3 - manually tried larger local filesystem tables (tpch) with multiple partitions and observed the same scan ranges. - TODO: adls testing Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 --- M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/util/CMakeLists.txt A be/src/util/flat_buffer.cc A be/src/util/flat_buffer.h M common/thrift/Frontend.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 21 files changed, 718 insertions(+), 233 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/12 -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 12 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR](2.x) IMPALA-6972: Disable parallel dataload on MINICLUSTER PROFILE=2
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10367 ) Change subject: IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 .. IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 There is a Hive bug in Hive 1.1.0 that can result in a NullPointerException when doing parallel Hive operations (see IMPALA-6532). Since dataload goes parallel on Hive loads starting with IMPALA-6372, dataload can hit this error on Hive 1.1.0 (i.e. IMPALA_MINICLUSTER_PROFILE=2). This is impacting builds on the 2.x branch. This disables parallel dataload for IMPALA_MINICLUSTER_PROFILE=2. IMPALA_MINICLUSTER_PROFILE=3 uses a newer version of Hive that has a fix for this, so this continues to use parallel dataload for that case. Parallelism can be reenabled when Hive 1.1.0 gets the fix from Hive 2.1.1. Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Reviewed-on: http://gerrit.cloudera.org:8080/10306 Reviewed-by: Philip Zeyliger Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/10367 --- M bin/impala-config.sh M bin/load-data.py M testdata/bin/create-load-data.sh 3 files changed, 12 insertions(+), 1 deletion(-) Approvals: Philip Zeyliger: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: merged Gerrit-Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Gerrit-Change-Number: 10367 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR](2.x) IMPALA-6972: Disable parallel dataload on MINICLUSTER PROFILE=2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10367 ) Change subject: IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6 Gerrit-Change-Number: 10367 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 10 May 2018 21:48:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6997: Avoid redundant dumping in SetMemLimitExceeded()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10364 ) Change subject: IMPALA-6997: Avoid redundant dumping in SetMemLimitExceeded() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10364/1/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/10364/1/be/src/runtime/runtime-state.cc@211 PS1, Line 211: if (!query_status_.ok()) return; This probably works in practice but we don't guarantee that Status is thread-safe. It's probably best to hold query_status_lock_ when checking this (or do something with atomics, but the lock approach seems fine to me). -- To view, visit http://gerrit.cloudera.org:8080/10364 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I92b87f370a68a2f695ebbc2520a98dd143730701 Gerrit-Change-Number: 10364 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 22:09:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5216: Make admission control queuing async
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10060 ) Change subject: IMPALA-5216: Make admission control queuing async .. Patch Set 7: (8 comments) I think these are my final comments for this iteration round. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.cc@112 PS7, Line 112: Q nit: don't capitalize queue but maybe use "Cancelled (queued)" which seems consistent with e.g. "Admitted (queued)" and "Timed out (queued)" which I think means the action happened while in the queue. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.cc@899 PS7, Line 899: false; use && rather than ?:false http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@481 PS7, Line 481: query-exec-state Is there a reason we can't update the group name to client-request-state? It might be just an oversight. (Maybe good to do a small commit in front of this one for that). http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@482 PS7, Line 482: SubmitToAdmissionController, this, : &admission_controller_thread_, I think it's a bit confusing that we call this the "admission controller" thread given that this thread also finishes the Exec path (i.e. Coordinator::Exec()). How about we rename ProcessQueryOrDmlRequest() to ExecAsyncQueryOrDmlRequest(). Then we rename SubmitToAdmissionController() to FinishExecQueryOrDmlRequest() or similar, and we call the thread the "exec-thread"? I think that follows the naming pattern elsewhere (e.g. WaitAsync()/Wait(), child-query ExecAsync()/ExecChildQueries() etc). http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@1148 PS7, Line 1148: if (!UpdateQueryStatus(admit_status).ok()) return; in the case that cancel happened while inside AdmitQuery(), AdmitQuery() can return CANCELLED, which then will set the status to CANCELLED here. Is that intentional? If so, it seems like we should just get rid of the cause != nullptr nonsense in Cancel() and always set the status at that point. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@1155 PS7, Line 1155: // set it to CANCELLED. isn't that a user visible behavior change? (though it's probably okay). but same point - why not just proactively set it to CANCELLED when Cancel() was called? we really need to clean up the state machine (and clarify the legal states/transitions) for this class soon. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.cc@1147 PS7, Line 1147: lock_guard l(lock_); : if (!UpdateQueryStatus(admit_status).ok()) return; : if (is_cancelled_) { : VLOG_QUERY << "Cancelled right after successful admission, query id=" :<< schedule_->query_id(); : // If query cancelled after being admitted, release Admission control resources. : exec_env_->admission_controller()->ReleaseQuery(*schedule_.get()); : // If cancellation was user initiated, then query_status would be OK, therefore : // set it to CANCELLED. : discard_result(UpdateQueryStatus(Status::CANCELLED)); : return; : } > why is that needed? I guess it's really just an optimization to avoid Coord After reading the admission controller changes, I see that this window is the smaller one since the promise would propagate the cancel and turn the status into CANCELLED. I think we need to simplify the cancellation logic. http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-hs2-server.cc@451 PS7, Line 451: if (!request_state->uses_admission_control()) { : request_state->UpdateNonErrorOperationState(TOperationState::RUNNING_STATE); : } maybe it'd be clearer to tuck this into CRS::Execute(). That way, this code doesn't really have to care as much about how the query will be executed. -- To view, visit http://gerrit.cloudera.org:8080/10060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf Gerrit-Change-Number: 10060 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bik
[Impala-ASF-CR] IMPALA-6966: sort table memory by size in catalogd web UI
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10292 ) Change subject: IMPALA-6966: sort table memory by size in catalogd web UI .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85 Gerrit-Change-Number: 10292 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 10 May 2018 22:39:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6966: sort table memory by size in catalogd web UI
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10292 ) Change subject: IMPALA-6966: sort table memory by size in catalogd web UI .. IMPALA-6966: sort table memory by size in catalogd web UI This patch fix the sorting order in "Top-K Tables with Highest Memory Requirements" in which "Estimated memory" column is sorted as strings. Values got from the catalog-server are changed from pretty-printed strings to bytes numbers. So the web UI is able to sort and render them correctly. Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85 Reviewed-on: http://gerrit.cloudera.org:8080/10292 Reviewed-by: Dimitris Tsirogiannis Tested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M www/catalog.tmpl 2 files changed, 17 insertions(+), 4 deletions(-) Approvals: Dimitris Tsirogiannis: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85 Gerrit-Change-Number: 10292 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-7003: Deflake erasure coding data loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10362 ) Change subject: IMPALA-7003: Deflake erasure coding data loading .. Patch Set 1: (1 comment) Did you test data loading on a Jenkins worker to make sure it's not broken? http://gerrit.cloudera.org:8080/#/c/10362/1/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl: http://gerrit.cloudera.org:8080/#/c/10362/1/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl@123 PS1, Line 123: Comment why we need this setting -- To view, visit http://gerrit.cloudera.org:8080/10362 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I219106cd3ec7ffab7a834700f2a722b165e5f66c Gerrit-Change-Number: 10362 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Thu, 10 May 2018 22:47:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required
Hello Balazs Jeszenszky, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10339 to look at the new patch set (#2). Change subject: IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required .. IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required Change-Id: I2124e14900d0f82569c061cc46006447bb054b36 --- M docs/shared/impala_common.xml M docs/topics/impala_invalidate_metadata.xml 2 files changed, 133 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/10339/2 -- To view, visit http://gerrit.cloudera.org:8080/10339 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2124e14900d0f82569c061cc46006447bb054b36 Gerrit-Change-Number: 10339 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/10339 ) Change subject: IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required .. Patch Set 2: The newly generated google doc is shared with you. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/10339 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2124e14900d0f82569c061cc46006447bb054b36 Gerrit-Change-Number: 10339 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 10 May 2018 23:06:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/10165/10/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/10165/10/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@216 PS10, Line 216: public boolean isFileCompressionTypeSupported(String fileName, I took another look at this and realised we don't have any code coverage for calling this for compressed files. It's only used in "LOAD DATA" and we don't test that with compressed files. After digging in more, I think we can actually just delete this code and the check. The current scenarios are: * No file suffix -> this check passes * Recognised and supported compression suffix -> this check passes * Recognised and unsupported compression suffix -> this check fails * Unrecognised file suffix -> treated as NONE, the check passes This is inconsistent, in that in the unrecognised case we're "optimistic" and proceed assuming that it's uncompressed, but in the case where we don't have the plugin loaded yet, we fail the operation. "LOAD DATA" doesn't actually read the file contents and doesn't try to validate the files in any other ways. I think it would be best to just remove this check and proceed optimistically. -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 23:17:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/10165/10/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/10165/10/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@216 PS10, Line 216: public boolean isFileCompressionTypeSupported(String fileName, > I took another look at this and realised we don't have any code coverage fo Doing this would also prevent a minor regression where loading, say, files with a ".lz4" suffix would work before this patch but not after this patch. -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 23:23:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6337: Fix infinite loop in Impala shell
Fredy Wijaya has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/9195 ) Change subject: IMPALA-6337: Fix infinite loop in Impala shell .. IMPALA-6337: Fix infinite loop in Impala shell This patch fixes a bug in sqlparse where sqlparse incorrectly splits a statement that has a new line inside double quotes. The bug in sqlparse causes Impala shell to go to infinite loop when a statement contains a new line inside double quotes. The patch in sqlparse is based on the upstream fix at https://github.com/andialbrecht/sqlparse/pull/396 Testing: - Added new end-to-end shell tests - Ran end-to-end shell tests Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b --- M shell/ext-py/sqlparse-0.1.19/sqlparse/lexer.py M shell/ext-py/sqlparse-0.1.19/tests/test_split.py M tests/shell/test_shell_interactive.py 3 files changed, 25 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/9195/13 -- To view, visit http://gerrit.cloudera.org:8080/9195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b Gerrit-Change-Number: 9195 Gerrit-PatchSet: 13 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6337: Fix infinite loop in Impala shell
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/9195 ) Change subject: IMPALA-6337: Fix infinite loop in Impala shell .. Patch Set 13: > Patch Set 12: > > (1 comment) > > > Yeah I think it's a good idea to upgrade sqlparse to the last version that > > supports Python 2.6 especially if we intend to use a vendored sqlparse. IMO > > it's cleaner to do the upgrade in a separate patch and the apply the bug fix > > in another patch. We can either upgrade first and apply the bug fix or apply > > the bug fix first and then upgrade. Either way works for me. Thoughts? > > I think upgrading first makes the most sense to me. > > And thanks for clarifying my other questions. The backported bugfix is now based on the upgraded sqlparse-0.1.19. Please re-review this CR again. -- To view, visit http://gerrit.cloudera.org:8080/9195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b Gerrit-Change-Number: 9195 Gerrit-PatchSet: 13 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 10 May 2018 23:26:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10158 to look at the new patch set (#19). Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. IMPALA-5384, part 2: Simplify Coordinator locking and clarify state The is the final change to clarify and break up the Coordinator's lock. The state machine for the coordinator is made explicit, distinguishing between executing state and multiple terminal states. Logic to transition into a terminal state is centralized in one location and executes exactly once for each coordinator object. Derived from a patch for IMPALA_5384 by Marcel Kornacker. Testing: - exhaustive functional tests - stress test on minicluster with memory overcommitment. Verified from the logs that this exercises all these paths: - successful queries - client requested cancellation - error from exec FInstances RPC - error reported asynchronously via report status RPC - eos before backend execution completed Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e --- M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/client-request-state.cc M be/src/service/impala-server.h M be/src/util/counting-barrier.h 6 files changed, 387 insertions(+), 391 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/19 -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 19 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10158 ) Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.cc@599 PS19, Line 599: WaitForBackends(); somewhere along the way I broke Kudu DML. Moving this back outside the if-stmt fixes that. -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 19 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 23:36:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Hello Zoltan Borok-Nagy, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10165 to look at the new patch set (#11). Change subject: IMPALA-6941: load more text scanner compression plugins .. IMPALA-6941: load more text scanner compression plugins Add extensions for LZ4 and ZSTD (which are supported by Hadoop). Even without a plugin this results in better behaviour because we don't try to treat the files with unknown extensions as uncompressed text. Also allow loading tables containing files with unsupported compression types. There was weird behaviour before we knew of the file extension but didn't support querying the table - the catalog would load the table but the impalad would fail processing the catalog update. The simplest way to fix it is to just allow loading the tables. Similarly, make the "LOAD DATA" operation more permissive - we can copy files into a directory even if we can't decompress them. Switch to always checking plugin version - running mismatched plugin is inherently unsafe. Testing: Positive case where LZO is loaded is exercised. Added coverage for negative case where LZO is disabled. Fixed test gaps: * Querying LZO table with LZO plugin not available. * Interacting with tables with known but unsupported text compressions. * Querying files with unknown compression suffixes (which are treated as uncompressed text). Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 --- M be/src/common/global-flags.cc M be/src/exec/CMakeLists.txt D be/src/exec/hdfs-lzo-text-scanner.cc D be/src/exec/hdfs-lzo-text-scanner.h A be/src/exec/hdfs-plugin-text-scanner.cc A be/src/exec/hdfs-plugin-text-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M common/fbs/CatalogObjects.fbs M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java A testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test A testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test A tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_partition_metadata.py 18 files changed, 458 insertions(+), 280 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/10165/11 -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6957: calc thread resource requirement in planner
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10256 ) Change subject: IMPALA-6957: calc thread resource requirement in planner .. Patch Set 6: (2 comments) FE and explain changes look good to me http://gerrit.cloudera.org:8080/#/c/10256/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/10256/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@261 PS6, Line 261: .setRequiredThreads(1).build() Why is this 1 and not getNumInstancesPerHost() http://gerrit.cloudera.org:8080/#/c/10256/4/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test File testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test: http://gerrit.cloudera.org:8080/#/c/10256/4/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test@64 PS4, Line 64: instances > I'm open to the idea but maybe we should decouple it from this change? I like short names. I don't think a longer name will necessarily make it clearer. Even if we call it "total-instances-estimate" I can definitely see users asking about its precise meaning. Many things here are estimates, so not sure it makes sense to add an explicit "estimate" prefix. -- To view, visit http://gerrit.cloudera.org:8080/10256 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I982837ef883457fa4d2adc3bdbdc727353469140 Gerrit-Change-Number: 10256 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 23:42:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7010: don't run memory usage tests on non-HDFS
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10370 Change subject: IMPALA-7010: don't run memory usage tests on non-HDFS .. IMPALA-7010: don't run memory usage tests on non-HDFS Moved a number of tests with tuned mem_limits. In some cases this required separating the tests from non-tuned functional tests. TestQueryMemLimit used very high and very low limits only, so seemed safe to run in all configurations. Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662 --- A testdata/workloads/functional-query/queries/QueryTest/insert-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test A testdata/workloads/functional-query/queries/QueryTest/kudu_insert_mem_limit.test A testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test M testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test R testdata/workloads/functional-query/queries/QueryTest/spilling-regression-exhaustive.test M tests/common/skip.py M tests/query_test/test_insert.py M tests/query_test/test_kudu.py M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_nested_types.py M tests/query_test/test_sort.py M tests/query_test/test_spilling.py 15 files changed, 183 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/10370/1 -- To view, visit http://gerrit.cloudera.org:8080/10370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662 Gerrit-Change-Number: 10370 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-6957: calc thread resource requirement in planner
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10256 ) Change subject: IMPALA-6957: calc thread resource requirement in planner .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/10256/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/10256/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@261 PS6, Line 261: .setRequiredThreads(1).build() > Why is this 1 and not getNumInstancesPerHost() The resource profile being computed here is for each instance of the fragment (see the method comment), computing the per-host value is done outside of this. http://gerrit.cloudera.org:8080/#/c/10256/4/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test File testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test: http://gerrit.cloudera.org:8080/#/c/10256/4/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test@64 PS4, Line 64: instances > I like short names. I don't think a longer name will necessarily make it cl Sounds like deferring it is a good idea :) -- To view, visit http://gerrit.cloudera.org:8080/10256 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I982837ef883457fa4d2adc3bdbdc727353469140 Gerrit-Change-Number: 10256 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 10 May 2018 23:51:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/10165/10/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/10165/10/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@216 PS10, Line 216: public boolean isFileCompressionTypeSupported(String fileName, > Doing this would also prevent a minor regression where loading, say, files Done -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 23:52:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 11: The last PS includes a non-trivial change, so this really needs another quick look. Hopefully the change makes sense. -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 10 May 2018 23:53:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7010: don't run memory usage tests on non-HDFS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10370 ) Change subject: IMPALA-7010: don't run memory usage tests on non-HDFS .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662 Gerrit-Change-Number: 10370 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 11 May 2018 00:15:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5842: Write page index in Parquet files
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 ) Change subject: IMPALA-5842: Write page index in Parquet files .. Patch Set 16: Code-Review+1 (1 comment) LGTM aside from one bug http://gerrit.cloudera.org:8080/#/c/9693/16/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/9693/16/be/src/exec/hdfs-parquet-table-writer.cc@769 PS16, Line 769: page_index_memory_consumption_ += new_memory_allocation; Need to increment this after the TryConsume() succeeds, otherwise we'll release too much. -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Anonymous Coward #248 Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 11 May 2018 00:17:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10158 ) Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. Patch Set 19: (12 comments) LGTM overall, just a few comments, some relating to style. http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.h@93 PS19, Line 93: /// 2. exec_state_lock_, filter_lock_, ExecSummary::lock (leafs) What about backend_states_init_lock_ ? http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.h@206 PS19, Line 206: lock_ comment requires an update. http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@325 PS18, Line 325: ExecState state const ref http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@350 PS18, Line 350: ExecState old_state, ExecState new_state These look like they're read only, so they should be passed as const refs. http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@357 PS18, Line 357: ExecState state const ref http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@360 PS18, Line 360: ExecState s const ref http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@127 PS14, Line 127: You mean Cancel() and UpdateBackendExecStatus()? Maybe this is not worth mentioning here and can be specified in those functions instead. http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@279 PS14, Line 279: _RESULTS: GetNext() set eos To be more explicit, we should say "... or when Cancel() is called." And we should document that Cancel() is called on an error in the appropriate place if we haven't done that already. http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@457 PS18, Line 457: ret_status Preferable to get rid of the automatic variable 'ret_status' and just return exec_status_ here instead. Every time we copy statuses around, we're calling the copy constructor that does a string copy. http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@462 PS18, Line 462: Status ret_status; Same here. http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@476 PS18, Line 476: DCHECK(exec_status_.ok()) << exec_status_; If an error happens after exec_state_ is already set to ExecState::RETURNED_RESULTS, do we still want to relay that in the logs? Eg: Limit queries. Currently, we will be adding that to the logs at L493 but we need to think through whether that's necessary, since calling HandleExecStateTransition() in L500 will also be a no-op in this case. http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@486 PS18, Line 486: !status.ok() && exec_status_.IsCancelled() (!status.ok() && !status.IsCancelled() && exec_status_.IsCancelled()) -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 19 Gerrit-Owner: Dan Hecht Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 11 May 2018 00:33:59 + Gerrit-HasComments: Yes