Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
......................................................................


Patch Set 13:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS13, Line 1630: 'addPartParams.isIf_not_exists()' is false
"IF NOT EXISTS is used "


PS13, Line 1683: have failed
"fail"


PS13, Line 1684: So, we need to
"In that case, we"


PS13, Line 1687: if (hmsPartitionsToAdd.size() == addedHmsPartitions.size()) {
               :         hmsPartitionsToAdd = addedHmsPartitions;
               :       } else {
               :         hmsPartitionsToAdd = getPartitionsFromHMS(msClient, 
tableName, hmsPartitionsToAdd,
               :             addedHmsPartitions);
               :       }
The semantics of getPartitionFromHMS() (L1720) and the way it is used here is 
kind of confusing. What you essentially need in the else case is to compute the 
set difference between hmsPartitionsToAdd and addHmsPartitions and retrieve 
from HMS the partiions in the result. The partitions can then be added to 
addedHmsPartitions list which is the one to process in the following operations 
(L1695-1709). 

L1687-1692 could be rewritten into something like:
if (hmsPartitionsToAdd.size() != addedHmsPartitions.size()) {
   List<Partition> difference = computeDifference(hmsPartitionsToAdd, 
addedHmsPartittions);
   addedHmsPartitions.addAll(getPartitionsFromHms(client, tableName, 
difference));
}

The names of the functions/variables can be improved but I think you get the 
idea.


PS13, Line 1715: Returns a list of partitions retrieved from HMS for each 
'hmsPartitionsToAdd' element.
               :    * - If 'addedHmsPartitions' contains a partition with the 
same partition values, no need
               :    * to get the partition object again from HMS. Use the one 
from 'addedHmsPartitions' instead.
               :    * - Otherwise, get the new Partition object directly from 
HMS.
               :    */
               :   private List<Partition> getPartitionsFromHMS(MetaStoreClient 
msClient, TableName tableName,
               :       List<Partition> hmsPartitionsToAdd, List<Partition> 
addedHmsPartitions)
This function and the way it is used is kind of confusing. What you essentially 
need is to compute the difference (set) between hmsPartitionsToAdd and 
addHmsPartitions and then retrieve these partitions from hms.


http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java:

PS13, Line 205: // Add multiple partitions.
              :     AnalyzesOk("alter table functional.alltypes add " +
              :         "partition(year=2050, month=10) " +
              :         "partition(year=2050, month=11) " +
              :         "partition(year=2050, month=12)");
              :     // Duplicate partition specifications.
              :     AnalysisError("alter table functional.alltypes add " +
              :         "partition(year=2050, month=10) " +
              :         "partition(year=2050, month=11) " +
              :         "partition(Month=10, YEAR=2050)",
              :         "Duplicate partition spec: (month=10, year=2050)");
              :     AnalysisError("alter table functional.alltypes add if not 
exists " +
              :         "partition(year=2050, month=10) " +
              :         "partition(year=2050, month=11) " +
              :         "partition(Month=10, YEAR=2050)",
              :         "Duplicate partition spec: (month=10, year=2050)");
              :     AnalysisError("alter table functional.alltypes add " +
              :         "partition(year=2050, month=10) " +
              :         "partition(year=2050, month=11) " +
              :         "partition(Month=10, YEAR=2050) location " +
              :         
"'hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10'",
              :         "Duplicate partition spec: (month=10, year=2050)");
              : 
              :     // Multiple partitions with locations and caching
              :     AnalyzesOk("alter table functional.alltypes add " +
              :         "partition(year=2050, month=10) location " +
              :         "'/test-warehouse/alltypes/y2050m10' cached in 
'testPool' " +
              :         "partition(year=2050, month=11) location " +
              :         
"'hdfs://localhost:20500/test-warehouse/alltypes/y2050m11' " +
              :         "cached in 'testPool' with replication = 7 "+
              :         "partition(year=2050, month=12) location " +
              :         "'file:///test-warehouse/alltypes/y2050m12' uncached");
              :     // One of the partitions points to an invalid URI
              :     AnalysisError("alter table functional.alltypes add " +
              :         "partition(year=2050, month=10) location " +
              :         "'/test-warehouse/alltypes/y2050m10' cached in 
'testPool' " +
              :         "partition(year=2050, month=11) location " +
              :         
"'hdfs://localhost:20500/test-warehouse/alltypes/y2050m11' " +
              :         "cached in 'testPool' with replication = 7 "+
              :         "partition(year=2050, month=12) location " +
              :         "'fil:///test-warehouse/alltypes/y2050m12' uncached",
              :         "No FileSystem for scheme: fil");
One easy way to increase test coverage could be to parameterize these test 
cases as we do in L60-61, using "IF NOT EXISTS" and "" as parameter values.


http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

Line 1236: 
Also, can you add a test case where the user doesn't have privileges to add 
partitions to a table?


http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
File fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java:

PS13, Line 1038: caching
Also a partition with location


http://gerrit.cloudera.org:8080/#/c/4144/13/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS13, Line 1037: ====
Another test case to consider is adding partitions to a table that is already 
cached. Make sure that partitions inherit the cacheop from the table.


http://gerrit.cloudera.org:8080/#/c/4144/13/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

PS13, Line 256: # Should now have privileges to add partitions
Add also a case where user doesn't have permissions to add partitions to a 
table?


http://gerrit.cloudera.org:8080/#/c/4144/13/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS13, Line 188: def impala_partition_names_and_caching(self, table_name):
There is a lot of duplicate code between this function and 
impala_partitions_names. Can't we have one function that is parameterized with 
the fields of interest?


PS13, Line 208: def assert_sql_error(self, engine, command, *strs_in_error):
Plz add comments. It's not clear to me how to use this function.


http://gerrit.cloudera.org:8080/#/c/4144/13/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

PS13, Line 118: def test_add_overlapping_partitions(self, vector, 
unique_database):
I think some of the test cases in this function probably belong to the 
hms_integration tests.


Line 121:     # Create a table in Impala.
You also need to add some additional cases:
1. partitions have actual data. For example, add partition in Hive, insert rows 
(hive), try to add the same partition in impala, verify the partitions are 
loaded and that the data exist (using a select stmt).
2. Add partitions using Impala, add some rows, run refresh/invalidate metadata, 
verify that the partitions exist.


PS13, Line 148: (['2'], '0B', '2'),
I don't think this is correct. You essentially updated partition x=2. Think of 
the following case in Hive: add a partition x= 2 and then try to add the same 
partition (x=2) with some additional params (e.g. location, caching) using the 
IF NOT EXIST clause. I believe Hive will not update the partition. So, we 
should see the same behavior here. You may also want to add a similar case 
where you try to add an existing partitions but with a different location.


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to