[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location

Gabor Kaszab (Code Review) Fri, 01 Jun 2018 08:49:14 -0700

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )


Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485
PS2, Line 1485:       if (partitions == null) {
> - I have seen tables with > 100k partitions. If we take this route, we'd ne
This is an interesting topic and I spent some time to decide which one is 
better. How I reasoned not to do the looping approach is somewhat similar to 
Bharath's first bullet point:
- In case we have an insert that affects multiple partitions (let's say this is 
N), and we have 100k existing partitions then we have to loop N*100k to find 
all the ones that have to be reloaded.
- This looping has to be done even if we have no partitions that point to the 
same location.

I saw this an overhead that is too big.

On the other side I made some math how much extra memory storing this mapping 
would take:
- AFAIK Strings in Java are stored in some common place and in case you have 
multiple String variables containing the same literal than in memory it's 
actually stored only once. So I assume it takes a size of a memory location 
(8byte) for each entry to store the keys for this internal map.
- The HdfsPartitions are stored as reference so it's (# of partitions) * 8byte 
to store all of them.
- I took 100k as the total # of partitions, and memory-vise the worst case when 
there is no overlap in locations between partitions. So it's 100k * 8byte for 
the locations and 100k * 8byte for the HdfsPartition references.

Doing the math for a worst case scenario it's around 1,5MB plus the overhead of 
the HashMap and the HashSets. I'm not really aware of the overhead but my 
assumption is that the whole memory consumption including the overhead 
shouldn't be more than 3-4MB.
I'm not a catalog expert, though and don't know what increase is still 
acceptable but for me the trade-offs for having an internal mapping seemed more 
consumable than for looping every time someone inserts something.

Bharath, Sailesh, what do you think? Do you find my considerations reasonable?


http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1487
PS2, Line 1487: partition.getLocation())
> partition.getLocation() to avoid unnecessary indirection ?
Sure, done.


http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159
PS1, Line 159:       assert data.split('\t') == ['21', '6']
> I see. Did you get a chance to see how Apache Hive behaves in this case?
I managed to check this in Hive: It also drops the folder of the partition no 
matter if other partitions points there. In Hive if you execute a select * on 
the table right after dropping the partition then apparently all the partitions 
are dropped that point to the same location.

Note, in Impala this is the line that drops the folder:
https://github.com/cloudera/Impala/blob/a48c2006890d7975974eeeabdbf797e7d63996b1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2237
Taking into account that this is done through HiveClient I'm confident that 
until this point Impala is in line with Hive's functionality.

Conclusion 1:
I think then internally Impala should remove all the other partitions on the 
same location from it's catalog and not just the one requested explicitly by 
the 'drop partition' command.

Conclusion 2:
If we take this as the expected behavior (dropping all other partitions on the 
same location) we should document it somewhere. I don't see any reference to 
this edge case here: 
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_alter_table.html



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Jun 2018 15:48:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location

Reply via email to