[impala] branch master updated: IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables

2020-11-23 Thread tarmstrong
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new b66045c  IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables
b66045c is described below

commit b66045c8a5d48c268a4dfad967021ff9bcbdd937
Author: Gabor Kaszab 
AuthorDate: Thu Sep 17 15:43:55 2020 +0200

IMPALA-10288: Implement DESCRIBE HISTORY for Iceberg tables

The DESCRIBE HISTORY works for Iceberg tables and displays the
snapshot history of the table.

An example output:
DESCRIBE HISTORY iceberg_multi_snapshots;

++-+-+-+
| creation_time  | snapshot_id | parent_id   | 
is_current_ancestor |

++-+-+-+
| 2020-10-13 14:01:07.234000 | 4400379706200951771 | NULL| 
TRUE|
| 2020-10-13 14:01:19.307000 | 4221472712544505868 | 4400379706200951771 | 
TRUE|

++-+-+-+

The purpose here was to have similar output with this new feature as
what SparkSql returns for "SELECT * from tablename.history".
See "History" section of
https://iceberg.apache.org/spark/#inspecting-tables

Testing:
  - iceberg-negative.test was extended to check that DESCRIBE HISTORY
is not applicable for non-Iceberg tables.
  - iceberg-table-history.test: Covers basic usage of DESCRIBE
HISTORY. Tests on tables created with Impala and also with Spark.

Change-Id: I56a4b92c27e8e4a79109696cbae62735a00750e5
Reviewed-on: http://gerrit.cloudera.org:8080/16599
Reviewed-by: Zoltan Borok-Nagy 
Reviewed-by: wangsheng 
Tested-by: Impala Public Jenkins 
---
 be/src/service/client-request-state.cc | 29 ++
 be/src/service/frontend.cc |  6 ++
 be/src/service/frontend.h  |  5 ++
 common/thrift/Frontend.thrift  | 23 
 fe/src/main/cup/sql-parser.cup | 25 
 .../apache/impala/analysis/AnalysisContext.java| 11 +++-
 .../impala/analysis/DescribeHistoryStmt.java   | 67 ++
 .../java/org/apache/impala/service/Frontend.java   | 46 +++
 .../org/apache/impala/service/JniFrontend.java | 21 +++
 .../org/apache/impala/analysis/ParserTest.java | 15 -
 testdata/data/README   |  3 +-
 .../queries/QueryTest/iceberg-negative.test|  5 ++
 .../queries/QueryTest/iceberg-table-history.test   | 20 +++
 tests/query_test/test_iceberg.py   | 22 +++
 14 files changed, 283 insertions(+), 15 deletions(-)

diff --git a/be/src/service/client-request-state.cc 
b/be/src/service/client-request-state.cc
index 7c5b023..763b7f5 100644
--- a/be/src/service/client-request-state.cc
+++ b/be/src/service/client-request-state.cc
@@ -30,6 +30,7 @@
 #include "catalog/catalog-service-client-wrapper.h"
 #include "common/status.h"
 #include "exec/kudu-util.h"
+#include "exprs/timezone_db.h"
 #include "kudu/rpc/rpc_controller.h"
 #include "rpc/rpc-mgr.inline.h"
 #include "runtime/coordinator.h"
@@ -38,6 +39,8 @@
 #include "runtime/query-driver.h"
 #include "runtime/row-batch.h"
 #include "runtime/runtime-state.h"
+#include "runtime/timestamp-value.h"
+#include "runtime/timestamp-value.inline.h"
 #include "scheduling/admission-control-client.h"
 #include "scheduling/scheduler.h"
 #include "service/frontend.h"
@@ -423,6 +426,32 @@ Status ClientRequestState::ExecLocalCatalogOp(
   result_metadata_ = response.schema;
   return Status::OK();
 }
+case TCatalogOpType::DESCRIBE_HISTORY: {
+  // This operation is supported for Iceberg tables only.
+  const TDescribeHistoryParams& params = 
catalog_op.describe_history_params;
+  TGetTableHistoryResult result;
+  RETURN_IF_ERROR(frontend_->GetTableHistory(params, ));
+
+  request_result_set_.reset(new vector);
+  request_result_set_->resize(result.result.size());
+  for (int i = 0; i < result.result.size(); ++i) {
+const TGetTableHistoryResultItem item = result.result[i];
+TResultRow _row = (*request_result_set_.get())[i];
+result_row.__isset.colVals = true;
+result_row.colVals.resize(4);
+const Timezone* local_tz = TimezoneDatabase::FindTimezone(
+query_options().timezone);
+TimestampValue tv = TimestampValue::FromUnixTimeMicros(
+item.creation_time * 1000, local_tz);
+result_row.colVals[0].__set_string_val(tv.ToString());
+

[impala] branch master updated: IMPALA-10334: test_stats_extrapolation output doesn't match on erasure coding build

2020-11-23 Thread tarmstrong
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 6493f87  IMPALA-10334: test_stats_extrapolation output doesn't match 
on erasure coding build
6493f87 is described below

commit 6493f8735731bb1d7beadf5e093ebd812d8ad8d2
Author: Qifan Chen 
AuthorDate: Fri Nov 20 13:32:11 2020 -0500

IMPALA-10334: test_stats_extrapolation output doesn't match on erasure 
coding build

This patch skips test_stats_extrapolation for erasure code builds. The
reason is that an extra erasure code information line can be included
in the scan explain section when a hdfs table is erasure coded. This
makes the explain output different between a normal build and an
erasure code build. A new reason 'contain_full_explain' is added to
SkipIfEC to facilitate this.

Testing:
  Ran erasure coding version of the EE and CLUSTER tests.
  Ran core tests

Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
Reviewed-on: http://gerrit.cloudera.org:8080/16756
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 tests/common/skip.py   | 2 ++
 tests/metadata/test_stats_extrapolation.py | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/tests/common/skip.py b/tests/common/skip.py
index b9aa6c8..a340360 100644
--- a/tests/common/skip.py
+++ b/tests/common/skip.py
@@ -193,6 +193,8 @@ class SkipIfEC:
   "features relying on local read do not work.")
   oom = pytest.mark.skipif(IS_EC, reason="Probably broken by HDFS-13540.")
   fix_later = pytest.mark.skipif(IS_EC, reason="It should work but doesn't.")
+  contain_full_explain = pytest.mark.skipif(IS_EC, reason="Contain full 
explain output "
+  "for hdfs tables.")
 
 
 class SkipIfDockerizedCluster:
diff --git a/tests/metadata/test_stats_extrapolation.py 
b/tests/metadata/test_stats_extrapolation.py
index 4dc14ff..8de917d 100644
--- a/tests/metadata/test_stats_extrapolation.py
+++ b/tests/metadata/test_stats_extrapolation.py
@@ -17,6 +17,7 @@
 
 from os import path
 from tests.common.impala_test_suite import ImpalaTestSuite
+from tests.common.skip import SkipIfEC
 from tests.common.test_dimensions import (
 create_exec_option_dimension,
 create_single_exec_option_dimension,
@@ -38,6 +39,7 @@ class TestStatsExtrapolation(ImpalaTestSuite):
 cls.ImpalaTestMatrix.add_dimension(
 create_uncompressed_text_dimension(cls.get_workload()))
 
+  @SkipIfEC.contain_full_explain
   def test_stats_extrapolation(self, vector, unique_database):
 vector.get_value('exec_option')['num_nodes'] = 1
 vector.get_value('exec_option')['explain_level'] = 2



[impala] 02/02: IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift versions

2020-11-23 Thread stigahuang
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit cc8ecd0926633133dc2db291ac65c317da34bad7
Author: stiga-huang 
AuthorDate: Mon Nov 23 11:30:26 2020 +0800

IMPALA-10333: Fix utf-8 test failures when impala-shell using older thrift 
versions

In some branches that impala-shell still uses older version of thrift,
e.g. thrift-0.9.3-p8, test_utf8_decoding_error_handling will fail since
the internal string representation of thrift versions lower than 0.10.0
is still bytes. Strings won't be decoded to unicodes so there won't be
any decoding errors. The test expects some bytes that can't be decoded
correctly be replaced with U+FFFD so fails.

This patch improve the test by also expecting results from older thrift
versions. So it can be cherry-picked to older branches.

Tests:
 - Verify the test in master branch and a downstream branch that still
   uses thrift-0.9.3-p8 in impala-shell.

Change-Id: Ieb0baa9b3a1480673af77f7cc35c05eacf4b449f
Reviewed-on: http://gerrit.cloudera.org:8080/16767
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 tests/shell/test_shell_commandline.py | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tests/shell/test_shell_commandline.py 
b/tests/shell/test_shell_commandline.py
index 8ad2f07..a09e7f1 100644
--- a/tests/shell/test_shell_commandline.py
+++ b/tests/shell/test_shell_commandline.py
@@ -472,10 +472,20 @@ class TestImpalaShell(ImpalaTestSuite):
 characters."""
 result = run_impala_shell_cmd(vector, ['-B', '-q', "select substr('引擎', 1, 
4)"])
 assert 'UnicodeDecodeError' not in result.stderr
-assert '引�' in result.stdout
+# Thrift changes its internal strings representation from bytes to 
unicodes since
+# 0.10.0. The results differ when impala-shell uses different versions of 
Thrift.
+# The UTF-8 encoded bytes of "引擎" are \xe5\xbc\x95\xe6\x93\x8e. The substr 
result
+# gets the first 4 bytes. In thrift-0.9.3-p8, it will be raw bytes, i.e. 
"引\xe6".
+# In thrift-0.11.0-p4, it will be decoded to utf-8 strings. The last byte 
can't be
+# decoded correctly so it will be replaced to \xef\xbf\xbd, i.e. U+FFFD. 
The result
+# is "引\xef\xbf\xbd". To make this test robust in all branches, here we 
just check
+# the existense of "引".
+assert '引' in result.stdout
 result = run_impala_shell_cmd(vector, ['-B', '-q', "select unhex('aa')"])
 assert 'UnicodeDecodeError' not in result.stderr
-assert '�' in result.stdout
+# Same as above, the result using thrift <0.10.0 is '\xaa'. The result 
using
+# thrift >=0.10.0 is '\xef\xbf\xbd'.
+assert '\xef\xbf\xbd' in result.stdout or '\xaa' in result.stdout
 
   def test_global_config_file(self, vector):
 """Test global and user configuration files."""



[impala] branch master updated (ef109e3 -> cc8ecd0)

2020-11-23 Thread stigahuang
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from ef109e3  IMPALA-10156: test_unmatched_schema should use unique_database
 new acc3de4  IMPALA-10283: Fix IllegalStateException in applying 
incremental partition updates
 new cc8ecd0  IMPALA-10333: Fix utf-8 test failures when impala-shell using 
older thrift versions

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../java/org/apache/impala/catalog/HdfsTable.java  |  4 +-
 .../org/apache/impala/catalog/ImpaladCatalog.java  | 15 +-
 .../test_incremental_metadata_updates.py   | 63 ++
 tests/shell/test_shell_commandline.py  | 14 -
 4 files changed, 91 insertions(+), 5 deletions(-)
 create mode 100755 tests/custom_cluster/test_incremental_metadata_updates.py



[impala] 01/02: IMPALA-10283: Fix IllegalStateException in applying incremental partition updates

2020-11-23 Thread stigahuang
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit acc3de40fb6633af21f33fb51631a6b567191887
Author: stiga-huang 
AuthorDate: Mon Oct 26 16:11:54 2020 +0800

IMPALA-10283: Fix IllegalStateException in applying incremental partition 
updates

When incremental metadata updates are enabled (by default), catalogd
sends incremental partition updates based on the last sent table
snapshot. Coordinators will apply these partition updates on their
existing table snapshots.

Each partition update is via a partition instance. Partition instances
are identified by partition ids. Each partition instance is a snapshot
of the metadata of a partition. When applying incremental partition
updates, ImpaladCatalog#addTable() has a Precondition check assuming
that new partition updates should not be duplicated with existing
partition ids.

The motivation of this check is to detect whether catalogd is sending
duplicate partition updates. However, it could be hitted when the
coordinator has a newer version of the table than the last sent table
snapshot in catalogd. This happens when two coordinators both execute
DMLs on the same table (e.g. insert into different partitions), and the
DMLs finish within a catalog topic update time window. Note that
coordinator will receive a table snapshot from catalogd as a response of
the DML request. So one of the coordinator will have a table version
that is lower than the latest version in catalogd but larger than the
last sent table version in catalogd. For an example, let's see the
following sequence of events on a table:

t0: coord1 and coord2 both have the latest version as catalogd
t1: coord1 executes a DML to add a partition p2
t2: coord2 executes a DML to add another partition p3
t3: catalogd sends topic update with {p2, p3}

t1 and t2 happen inside a topic-update window. So catalogd will send the
update of {p2, p3}. The following table shows the table version and
corresponding partition instances in each server.
++---+--+---+
|| catalogd  | coordinator1 | coordinator2  |
++---+--+---+
| t0 | v0:{p1}   | v0:{p1}  | v0:{p1}   |
++---+--+---+
| t1 | v1:{p1,p2}| v1:{p1,p2}   | v0:{p1}   |
++---+--+---+
| t2 | v2:{p1,p2,p3} | v1:{p1,p2}   | v2:{p1,p2,p3} |
++---+--+---+
At t3, coordinator2 will skip the table update since it already has a
version equal to the one in the topic update. However, on coordinator1,
the table version is smaller than v2, so it will apply the incremental
updates of {p2,p3} and then hit the Precondition check complaining that
p2 already exists.

It's legal that a coordinator has got some partition instances in the
DML responses. So we can't assume that all partition updates in a topic
update don't exist in the coordinator. This patch removes this
Precondition check to accept this case.

Tests:
 - Add a test to reproduce the scenario mentioned above. It fails
   without this patch.

Change-Id: I1657684f8853b76b1524475a3b3c35fa22a0e36e
Reviewed-on: http://gerrit.cloudera.org:8080/16649
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../java/org/apache/impala/catalog/HdfsTable.java  |  4 +-
 .../org/apache/impala/catalog/ImpaladCatalog.java  | 15 +-
 .../test_incremental_metadata_updates.py   | 63 ++
 3 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 
b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
index 5d82fe1..92aabf4 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
@@ -877,8 +877,8 @@ public class HdfsTable extends Table implements FeFsTable {
   }
 
   /**
-   * Adds the partition to the HdfsTable. Skips if a partition with the same 
partition id
-   * already exists.
+   * Adds the partition to the HdfsTable. Returns false if a partition with 
the same
+   * partition id already exists.
*/
   public boolean addPartitionNoThrow(HdfsPartition partition) {
 if (partitionMap_.containsKey(partition.getId())) return false;
diff --git a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java 
b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
index 85b447d..d1452a1 100644
--- a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
@@ -509,10