Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
......................................................................


Patch Set 5:

(26 comments)

Thanks for the review.

http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 305:   // parquet-mr.
> Comment that the FE guarantees that this is a valid timezone.
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 246:       DCHECK((is_timestamp_dependent_timezone_ && timezone_ == NULL) 
||
> simplify condition:
Done


Line 549:   /// Used to cache the timezone object corresponding to 
"parquet.mr.int96.write.zone"
> the "parquet.mr.int96.write.zone" table property
Done


Line 550:   /// table property to avoid repeated calls to 
'TimezoneDatabase::FindTimezone()'.
> no need to single-quote here
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 190:   // See if they specified a zone id
> Who is "they"? Suggest rephrasing
Done


Line 202:   return time_zone_ptr();
> return NULL?
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/BackendGflags.thrift
File common/thrift/BackendGflags.thrift:

Line 58:   // If true, all newly created Parquet tables will have the 
parquet.mr.int96.write.zone
> ... all newly created HDFS tables regardless of format ...
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 218:   9: optional string parquet_mr_write_zone;
> remove trailing semicolon for consistency
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 169:   private static void analyzeParquetMrWriteZone(Table table,
> These cases need tests in AnalyzeDDL
Done. I also changed the exception message for consistency.


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 112:       throw new AnalysisException("Invalid Time Zone: " + timezone);
> Mention that the timezone is in the table property 'parquet.mr.int96.write.
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 183:     if 
(getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE)) {
> These cases need tests in AnalyzeDDL
Done


Line 186:             "Only HDFS tables can use '%s' table property.",
> the '%s' table property
Done. I also changed the exception message for consistency.


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 54: import org.apache.impala.common.InternalException;
> not needed
Done


http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py
File tests/custom_cluster/test_parquet_timestamp_compatibility.py:

Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite):
> Test needs a comment. In particular, what is covered here and what is cover
I've added a comment that describes tests in this file.

What do you mean by "integration tests not part of Impala tests"? I don't think 
we have any integration tests upstream.


Line 26:   TEST_DB_NAME = 'timezone_test_db'
> Why not use the unique_database fixture? That's the best practice.
Switched to unique_database.


Line 39:     self.client = impalad.service.create_beeswax_client()
> I think we already have a self.client from ImpalaTestSuite.
Removed creating a client from here.


Line 49:         
"/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet"
> We need the appropriate filesystem prefix here. This will break on local fs
Done


Line 56:     self.client.execute('USE ' + self.TEST_DB_NAME)
> Why? Better to avoid changing the session state
Removed it.


Line 59:     self.client.execute('INVALIDATE METADATA')
> why?
Removed it, it's not needed.


Line 61:   def _set_impala_table_timezone(self, timezone):
> simplify to pass table name as param
Done


Line 71:   @pytest.mark.execute_serially
> How long does this test take? I'm thinking we should only have minimal test
Moved tests that use startup options to 
./tests/custom_cluster/test_hive_parquet_timestamp_conversion.py


Line 81:     statement = '''ALTER TABLE hive_table
> Should be an analyzer test in AnalyzeDDL
Removed it from here.


Line 104:     select_from_impala_table = '''SELECT timestamp_col FROM 
impala_table WHERE id = 1'''
> Can you think of a way to validate all values in the table in bulk e.g. usi
Done


Line 113:     self._set_impala_table_timezone('EST')
> Add some comments about the behavior here. We are getting different values 
Done


Line 141:     statement = '''ALTER TABLE hive_table
> analyzer test is more appropriate (also fix elsewhere)
Done


Line 185:   def test_ddl(self, vector):
> We need a test where Hive sets a garbage timezone and Impala throws an erro
1. Added 'test_garbage_parquet_mr_write_zone' to test how Impala handles 
invalid time zone values. It works for now, but please note that in the future 
Hive might not let users set 'parquet.mr.int96.write.zone' to an invalid time 
zone.

2. Added 'test_conversion_to_tbl_prop_timezone' to 
test_hive_parquet_timestamp_conversion.py to test the scenario when the table 
property is set and the convert_legacy_hive_parquet_utc_timestamps option is 
used.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to