Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION ......................................................................
Patch Set 13: (15 comments) http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: PS13, Line 1630: 'addPartParams.isIf_not_exists()' is false "IF NOT EXISTS is used " PS13, Line 1683: have failed "fail" PS13, Line 1684: So, we need to "In that case, we" PS13, Line 1687: if (hmsPartitionsToAdd.size() == addedHmsPartitions.size()) { : hmsPartitionsToAdd = addedHmsPartitions; : } else { : hmsPartitionsToAdd = getPartitionsFromHMS(msClient, tableName, hmsPartitionsToAdd, : addedHmsPartitions); : } The semantics of getPartitionFromHMS() (L1720) and the way it is used here is kind of confusing. What you essentially need in the else case is to compute the set difference between hmsPartitionsToAdd and addHmsPartitions and retrieve from HMS the partiions in the result. The partitions can then be added to addedHmsPartitions list which is the one to process in the following operations (L1695-1709). L1687-1692 could be rewritten into something like: if (hmsPartitionsToAdd.size() != addedHmsPartitions.size()) { List<Partition> difference = computeDifference(hmsPartitionsToAdd, addedHmsPartittions); addedHmsPartitions.addAll(getPartitionsFromHms(client, tableName, difference)); } The names of the functions/variables can be improved but I think you get the idea. PS13, Line 1715: Returns a list of partitions retrieved from HMS for each 'hmsPartitionsToAdd' element. : * - If 'addedHmsPartitions' contains a partition with the same partition values, no need : * to get the partition object again from HMS. Use the one from 'addedHmsPartitions' instead. : * - Otherwise, get the new Partition object directly from HMS. : */ : private List<Partition> getPartitionsFromHMS(MetaStoreClient msClient, TableName tableName, : List<Partition> hmsPartitionsToAdd, List<Partition> addedHmsPartitions) This function and the way it is used is kind of confusing. What you essentially need is to compute the difference (set) between hmsPartitionsToAdd and addHmsPartitions and then retrieve these partitions from hms. http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java: PS13, Line 205: // Add multiple partitions. : AnalyzesOk("alter table functional.alltypes add " + : "partition(year=2050, month=10) " + : "partition(year=2050, month=11) " + : "partition(year=2050, month=12)"); : // Duplicate partition specifications. : AnalysisError("alter table functional.alltypes add " + : "partition(year=2050, month=10) " + : "partition(year=2050, month=11) " + : "partition(Month=10, YEAR=2050)", : "Duplicate partition spec: (month=10, year=2050)"); : AnalysisError("alter table functional.alltypes add if not exists " + : "partition(year=2050, month=10) " + : "partition(year=2050, month=11) " + : "partition(Month=10, YEAR=2050)", : "Duplicate partition spec: (month=10, year=2050)"); : AnalysisError("alter table functional.alltypes add " + : "partition(year=2050, month=10) " + : "partition(year=2050, month=11) " + : "partition(Month=10, YEAR=2050) location " + : "'hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10'", : "Duplicate partition spec: (month=10, year=2050)"); : : // Multiple partitions with locations and caching : AnalyzesOk("alter table functional.alltypes add " + : "partition(year=2050, month=10) location " + : "'/test-warehouse/alltypes/y2050m10' cached in 'testPool' " + : "partition(year=2050, month=11) location " + : "'hdfs://localhost:20500/test-warehouse/alltypes/y2050m11' " + : "cached in 'testPool' with replication = 7 "+ : "partition(year=2050, month=12) location " + : "'file:///test-warehouse/alltypes/y2050m12' uncached"); : // One of the partitions points to an invalid URI : AnalysisError("alter table functional.alltypes add " + : "partition(year=2050, month=10) location " + : "'/test-warehouse/alltypes/y2050m10' cached in 'testPool' " + : "partition(year=2050, month=11) location " + : "'hdfs://localhost:20500/test-warehouse/alltypes/y2050m11' " + : "cached in 'testPool' with replication = 7 "+ : "partition(year=2050, month=12) location " + : "'fil:///test-warehouse/alltypes/y2050m12' uncached", : "No FileSystem for scheme: fil"); One easy way to increase test coverage could be to parameterize these test cases as we do in L60-61, using "IF NOT EXISTS" and "" as parameter values. http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java: Line 1236: Also, can you add a test case where the user doesn't have privileges to add partitions to a table? http://gerrit.cloudera.org:8080/#/c/4144/13/fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java File fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java: PS13, Line 1038: caching Also a partition with location http://gerrit.cloudera.org:8080/#/c/4144/13/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: PS13, Line 1037: ==== Another test case to consider is adding partitions to a table that is already cached. Make sure that partitions inherit the cacheop from the table. http://gerrit.cloudera.org:8080/#/c/4144/13/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: PS13, Line 256: # Should now have privileges to add partitions Add also a case where user doesn't have permissions to add partitions to a table? http://gerrit.cloudera.org:8080/#/c/4144/13/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: PS13, Line 188: def impala_partition_names_and_caching(self, table_name): There is a lot of duplicate code between this function and impala_partitions_names. Can't we have one function that is parameterized with the fields of interest? PS13, Line 208: def assert_sql_error(self, engine, command, *strs_in_error): Plz add comments. It's not clear to me how to use this function. http://gerrit.cloudera.org:8080/#/c/4144/13/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: PS13, Line 118: def test_add_overlapping_partitions(self, vector, unique_database): I think some of the test cases in this function probably belong to the hms_integration tests. Line 121: # Create a table in Impala. You also need to add some additional cases: 1. partitions have actual data. For example, add partition in Hive, insert rows (hive), try to add the same partition in impala, verify the partitions are loaded and that the data exist (using a select stmt). 2. Add partitions using Impala, add some rows, run refresh/invalidate metadata, verify that the partitions exist. PS13, Line 148: (['2'], '0B', '2'), I don't think this is correct. You essentially updated partition x=2. Think of the following case in Hive: add a partition x= 2 and then try to add the same partition (x=2) with some additional params (e.g. location, caching) using the IF NOT EXIST clause. I believe Hive will not update the partition. So, we should see the same behavior here. You may also want to add a similar case where you try to add an existing partitions but with a different location. -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-HasComments: Yes