[impala] branch master updated: IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables
This is an automated email from the ASF dual-hosted git repository. tarmstrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new b66045c IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables b66045c is described below commit b66045c8a5d48c268a4dfad967021ff9bcbdd937 Author: Gabor Kaszab AuthorDate: Thu Sep 17 15:43:55 2020 +0200 IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables The DESCRIBE HISTORY works for Iceberg tables and displays the snapshot history of the table. An example output: DESCRIBE HISTORY iceberg_multi_snapshots; ++-+-+-+ | creation_time | snapshot_id | parent_id | is_current_ancestor | ++-+-+-+ | 2020-10-13 14:01:07.234000 | 4400379706200951771 | NULL| TRUE| | 2020-10-13 14:01:19.307000 | 4221472712544505868 | 4400379706200951771 | TRUE| ++-+-+-+ The purpose here was to have similar output with this new feature as what SparkSql returns for "SELECT * from tablename.history". See "History" section of https://iceberg.apache.org/spark/#inspecting-tables Testing: - iceberg-negative.test was extended to check that DESCRIBE HISTORY is not applicable for non-Iceberg tables. - iceberg-table-history.test: Covers basic usage of DESCRIBE HISTORY. Tests on tables created with Impala and also with Spark. Change-Id: I56a4b92c27e8e4a79109696cbae62735a00750e5 Reviewed-on: http://gerrit.cloudera.org:8080/16599 Reviewed-by: Zoltan Borok-Nagy Reviewed-by: wangsheng Tested-by: Impala Public Jenkins --- be/src/service/client-request-state.cc | 29 ++ be/src/service/frontend.cc | 6 ++ be/src/service/frontend.h | 5 ++ common/thrift/Frontend.thrift | 23 fe/src/main/cup/sql-parser.cup | 25 .../apache/impala/analysis/AnalysisContext.java| 11 +++- .../impala/analysis/DescribeHistoryStmt.java | 67 ++ .../java/org/apache/impala/service/Frontend.java | 46 +++ .../org/apache/impala/service/JniFrontend.java | 21 +++ .../org/apache/impala/analysis/ParserTest.java | 15 - testdata/data/README | 3 +- .../queries/QueryTest/iceberg-negative.test| 5 ++ .../queries/QueryTest/iceberg-table-history.test | 20 +++ tests/query_test/test_iceberg.py | 22 +++ 14 files changed, 283 insertions(+), 15 deletions(-) diff --git a/be/src/service/client-request-state.cc b/be/src/service/client-request-state.cc index 7c5b023..763b7f5 100644 --- a/be/src/service/client-request-state.cc +++ b/be/src/service/client-request-state.cc @@ -30,6 +30,7 @@ #include "catalog/catalog-service-client-wrapper.h" #include "common/status.h" #include "exec/kudu-util.h" +#include "exprs/timezone_db.h" #include "kudu/rpc/rpc_controller.h" #include "rpc/rpc-mgr.inline.h" #include "runtime/coordinator.h" @@ -38,6 +39,8 @@ #include "runtime/query-driver.h" #include "runtime/row-batch.h" #include "runtime/runtime-state.h" +#include "runtime/timestamp-value.h" +#include "runtime/timestamp-value.inline.h" #include "scheduling/admission-control-client.h" #include "scheduling/scheduler.h" #include "service/frontend.h" @@ -423,6 +426,32 @@ Status ClientRequestState::ExecLocalCatalogOp( result_metadata_ = response.schema; return Status::OK(); } +case TCatalogOpType::DESCRIBE_HISTORY: { + // This operation is supported for Iceberg tables only. + const TDescribeHistoryParams& params = catalog_op.describe_history_params; + TGetTableHistoryResult result; + RETURN_IF_ERROR(frontend_->GetTableHistory(params, )); + + request_result_set_.reset(new vector); + request_result_set_->resize(result.result.size()); + for (int i = 0; i < result.result.size(); ++i) { +const TGetTableHistoryResultItem item = result.result[i]; +TResultRow _row = (*request_result_set_.get())[i]; +result_row.__isset.colVals = true; +result_row.colVals.resize(4); +const Timezone* local_tz = TimezoneDatabase::FindTimezone( +query_options().timezone); +TimestampValue tv = TimestampValue::FromUnixTimeMicros( +item.creation_time * 1000, local_tz); +result_row.colVals[0].__set_string_val(tv.ToString()); +
[impala] branch master updated: IMPALA-10334: test_stats_extrapolation output doesn't match on erasure coding build
This is an automated email from the ASF dual-hosted git repository. tarmstrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 6493f87 IMPALA-10334: test_stats_extrapolation output doesn't match on erasure coding build 6493f87 is described below commit 6493f8735731bb1d7beadf5e093ebd812d8ad8d2 Author: Qifan Chen AuthorDate: Fri Nov 20 13:32:11 2020 -0500 IMPALA-10334: test_stats_extrapolation output doesn't match on erasure coding build This patch skips test_stats_extrapolation for erasure code builds. The reason is that an extra erasure code information line can be included in the scan explain section when a hdfs table is erasure coded. This makes the explain output different between a normal build and an erasure code build. A new reason 'contain_full_explain' is added to SkipIfEC to facilitate this. Testing: Ran erasure coding version of the EE and CLUSTER tests. Ran core tests Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950 Reviewed-on: http://gerrit.cloudera.org:8080/16756 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- tests/common/skip.py | 2 ++ tests/metadata/test_stats_extrapolation.py | 2 ++ 2 files changed, 4 insertions(+) diff --git a/tests/common/skip.py b/tests/common/skip.py index b9aa6c8..a340360 100644 --- a/tests/common/skip.py +++ b/tests/common/skip.py @@ -193,6 +193,8 @@ class SkipIfEC: "features relying on local read do not work.") oom = pytest.mark.skipif(IS_EC, reason="Probably broken by HDFS-13540.") fix_later = pytest.mark.skipif(IS_EC, reason="It should work but doesn't.") + contain_full_explain = pytest.mark.skipif(IS_EC, reason="Contain full explain output " + "for hdfs tables.") class SkipIfDockerizedCluster: diff --git a/tests/metadata/test_stats_extrapolation.py b/tests/metadata/test_stats_extrapolation.py index 4dc14ff..8de917d 100644 --- a/tests/metadata/test_stats_extrapolation.py +++ b/tests/metadata/test_stats_extrapolation.py @@ -17,6 +17,7 @@ from os import path from tests.common.impala_test_suite import ImpalaTestSuite +from tests.common.skip import SkipIfEC from tests.common.test_dimensions import ( create_exec_option_dimension, create_single_exec_option_dimension, @@ -38,6 +39,7 @@ class TestStatsExtrapolation(ImpalaTestSuite): cls.ImpalaTestMatrix.add_dimension( create_uncompressed_text_dimension(cls.get_workload())) + @SkipIfEC.contain_full_explain def test_stats_extrapolation(self, vector, unique_database): vector.get_value('exec_option')['num_nodes'] = 1 vector.get_value('exec_option')['explain_level'] = 2
[impala] 02/02: IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift versions
This is an automated email from the ASF dual-hosted git repository. stigahuang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit cc8ecd0926633133dc2db291ac65c317da34bad7 Author: stiga-huang AuthorDate: Mon Nov 23 11:30:26 2020 +0800 IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift versions In some branches that impala-shell still uses older version of thrift, e.g. thrift-0.9.3-p8, test_utf8_decoding_error_handling will fail since the internal string representation of thrift versions lower than 0.10.0 is still bytes. Strings won't be decoded to unicodes so there won't be any decoding errors. The test expects some bytes that can't be decoded correctly be replaced with U+FFFD so fails. This patch improve the test by also expecting results from older thrift versions. So it can be cherry-picked to older branches. Tests: - Verify the test in master branch and a downstream branch that still uses thrift-0.9.3-p8 in impala-shell. Change-Id: Ieb0baa9b3a1480673af77f7cc35c05eacf4b449f Reviewed-on: http://gerrit.cloudera.org:8080/16767 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- tests/shell/test_shell_commandline.py | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/tests/shell/test_shell_commandline.py b/tests/shell/test_shell_commandline.py index 8ad2f07..a09e7f1 100644 --- a/tests/shell/test_shell_commandline.py +++ b/tests/shell/test_shell_commandline.py @@ -472,10 +472,20 @@ class TestImpalaShell(ImpalaTestSuite): characters.""" result = run_impala_shell_cmd(vector, ['-B', '-q', "select substr('引擎', 1, 4)"]) assert 'UnicodeDecodeError' not in result.stderr -assert '引�' in result.stdout +# Thrift changes its internal strings representation from bytes to unicodes since +# 0.10.0. The results differ when impala-shell uses different versions of Thrift. +# The UTF-8 encoded bytes of "引擎" are \xe5\xbc\x95\xe6\x93\x8e. The substr result +# gets the first 4 bytes. In thrift-0.9.3-p8, it will be raw bytes, i.e. "引\xe6". +# In thrift-0.11.0-p4, it will be decoded to utf-8 strings. The last byte can't be +# decoded correctly so it will be replaced to \xef\xbf\xbd, i.e. U+FFFD. The result +# is "引\xef\xbf\xbd". To make this test robust in all branches, here we just check +# the existense of "引". +assert '引' in result.stdout result = run_impala_shell_cmd(vector, ['-B', '-q', "select unhex('aa')"]) assert 'UnicodeDecodeError' not in result.stderr -assert '�' in result.stdout +# Same as above, the result using thrift <0.10.0 is '\xaa'. The result using +# thrift >=0.10.0 is '\xef\xbf\xbd'. +assert '\xef\xbf\xbd' in result.stdout or '\xaa' in result.stdout def test_global_config_file(self, vector): """Test global and user configuration files."""
[impala] branch master updated (ef109e3 -> cc8ecd0)
This is an automated email from the ASF dual-hosted git repository. stigahuang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from ef109e3 IMPALA-10156: test_unmatched_schema should use unique_database new acc3de4 IMPALA-10283: Fix IllegalStateException in applying incremental partition updates new cc8ecd0 IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift versions The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../java/org/apache/impala/catalog/HdfsTable.java | 4 +- .../org/apache/impala/catalog/ImpaladCatalog.java | 15 +- .../test_incremental_metadata_updates.py | 63 ++ tests/shell/test_shell_commandline.py | 14 - 4 files changed, 91 insertions(+), 5 deletions(-) create mode 100755 tests/custom_cluster/test_incremental_metadata_updates.py
[impala] 01/02: IMPALA-10283: Fix IllegalStateException in applying incremental partition updates
This is an automated email from the ASF dual-hosted git repository. stigahuang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit acc3de40fb6633af21f33fb51631a6b567191887 Author: stiga-huang AuthorDate: Mon Oct 26 16:11:54 2020 +0800 IMPALA-10283: Fix IllegalStateException in applying incremental partition updates When incremental metadata updates are enabled (by default), catalogd sends incremental partition updates based on the last sent table snapshot. Coordinators will apply these partition updates on their existing table snapshots. Each partition update is via a partition instance. Partition instances are identified by partition ids. Each partition instance is a snapshot of the metadata of a partition. When applying incremental partition updates, ImpaladCatalog#addTable() has a Precondition check assuming that new partition updates should not be duplicated with existing partition ids. The motivation of this check is to detect whether catalogd is sending duplicate partition updates. However, it could be hitted when the coordinator has a newer version of the table than the last sent table snapshot in catalogd. This happens when two coordinators both execute DMLs on the same table (e.g. insert into different partitions), and the DMLs finish within a catalog topic update time window. Note that coordinator will receive a table snapshot from catalogd as a response of the DML request. So one of the coordinator will have a table version that is lower than the latest version in catalogd but larger than the last sent table version in catalogd. For an example, let's see the following sequence of events on a table: t0: coord1 and coord2 both have the latest version as catalogd t1: coord1 executes a DML to add a partition p2 t2: coord2 executes a DML to add another partition p3 t3: catalogd sends topic update with {p2, p3} t1 and t2 happen inside a topic-update window. So catalogd will send the update of {p2, p3}. The following table shows the table version and corresponding partition instances in each server. ++---+--+---+ || catalogd | coordinator1 | coordinator2 | ++---+--+---+ | t0 | v0:{p1} | v0:{p1} | v0:{p1} | ++---+--+---+ | t1 | v1:{p1,p2}| v1:{p1,p2} | v0:{p1} | ++---+--+---+ | t2 | v2:{p1,p2,p3} | v1:{p1,p2} | v2:{p1,p2,p3} | ++---+--+---+ At t3, coordinator2 will skip the table update since it already has a version equal to the one in the topic update. However, on coordinator1, the table version is smaller than v2, so it will apply the incremental updates of {p2,p3} and then hit the Precondition check complaining that p2 already exists. It's legal that a coordinator has got some partition instances in the DML responses. So we can't assume that all partition updates in a topic update don't exist in the coordinator. This patch removes this Precondition check to accept this case. Tests: - Add a test to reproduce the scenario mentioned above. It fails without this patch. Change-Id: I1657684f8853b76b1524475a3b3c35fa22a0e36e Reviewed-on: http://gerrit.cloudera.org:8080/16649 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../java/org/apache/impala/catalog/HdfsTable.java | 4 +- .../org/apache/impala/catalog/ImpaladCatalog.java | 15 +- .../test_incremental_metadata_updates.py | 63 ++ 3 files changed, 79 insertions(+), 3 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java index 5d82fe1..92aabf4 100644 --- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java +++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java @@ -877,8 +877,8 @@ public class HdfsTable extends Table implements FeFsTable { } /** - * Adds the partition to the HdfsTable. Skips if a partition with the same partition id - * already exists. + * Adds the partition to the HdfsTable. Returns false if a partition with the same + * partition id already exists. */ public boolean addPartitionNoThrow(HdfsPartition partition) { if (partitionMap_.containsKey(partition.getId())) return false; diff --git a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java index 85b447d..d1452a1 100644 --- a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java +++ b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java @@ -509,10