Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485 PS2, Line 1485: if (partitions == null) { > - I have seen tables with > 100k partitions. If we take this route, we'd ne This is an interesting topic and I spent some time to decide which one is better. How I reasoned not to do the looping approach is somewhat similar to Bharath's first bullet point: - In case we have an insert that affects multiple partitions (let's say this is N), and we have 100k existing partitions then we have to loop N*100k to find all the ones that have to be reloaded. - This looping has to be done even if we have no partitions that point to the same location. I saw this an overhead that is too big. On the other side I made some math how much extra memory storing this mapping would take: - AFAIK Strings in Java are stored in some common place and in case you have multiple String variables containing the same literal than in memory it's actually stored only once. So I assume it takes a size of a memory location (8byte) for each entry to store the keys for this internal map. - The HdfsPartitions are stored as reference so it's (# of partitions) * 8byte to store all of them. - I took 100k as the total # of partitions, and memory-vise the worst case when there is no overlap in locations between partitions. So it's 100k * 8byte for the locations and 100k * 8byte for the HdfsPartition references. Doing the math for a worst case scenario it's around 1,5MB plus the overhead of the HashMap and the HashSets. I'm not really aware of the overhead but my assumption is that the whole memory consumption including the overhead shouldn't be more than 3-4MB. I'm not a catalog expert, though and don't know what increase is still acceptable but for me the trade-offs for having an internal mapping seemed more consumable than for looping every time someone inserts something. Bharath, Sailesh, what do you think? Do you find my considerations reasonable? http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1487 PS2, Line 1487: partition.getLocation()) > partition.getLocation() to avoid unnecessary indirection ? Sure, done. http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159 PS1, Line 159: assert data.split('\t') == ['21', '6'] > I see. Did you get a chance to see how Apache Hive behaves in this case? I managed to check this in Hive: It also drops the folder of the partition no matter if other partitions points there. In Hive if you execute a select * on the table right after dropping the partition then apparently all the partitions are dropped that point to the same location. Note, in Impala this is the line that drops the folder: https://github.com/cloudera/Impala/blob/a48c2006890d7975974eeeabdbf797e7d63996b1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2237 Taking into account that this is done through HiveClient I'm confident that until this point Impala is in line with Hive's functionality. Conclusion 1: I think then internally Impala should remove all the other partitions on the same location from it's catalog and not just the one requested explicitly by the 'drop partition' command. Conclusion 2: If we take this as the expected behavior (dropping all other partitions on the same location) we should document it somewhere. I don't see any reference to this edge case here: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_alter_table.html -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 01 Jun 2018 15:48:50 +0000 Gerrit-HasComments: Yes