Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 18:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1435
PS18, Line 1435: HdfsPartition partition: partitionMap_.values()
This loop constructs partitionsToUpdateFileMdByPath for the case where 
partitions are not explicitly specified (among other things). That map is a 
subset of locationToPartMap_ (loosely speaking). So we're already looping over 
all partitions here. If the block on L1446 can be modified to also build up 
such a map for explicitly specified partitions (e.g., when partitionsToUpdate 
is != null), that map can be passed into getPartitionsByPath (L1484) to be used 
instead of the table-wide locationToPartMap_.

Unless I've missed something, this looks like it adds little to current 
complexity for this code path and uses a small amount of additional memory when 
needed. I see no other use of getPartitionsByPath.


http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2252
PS18, Line 2252: for (HdfsPartition part : parts) {
if the path -> partition map is dropped, an additional loop would need to 
consider all partitions and check vs. the specified partitions for other 
partitions that share the same location. this can be sped up via small indexes 
(index the specified partitions by location). so we'd increase the cost here 
from scanning the specified partitions to scanning all partitions-- how much of 
a performance hit is this for 100k partitions and how performance critical is 
drop partitions?



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 18
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 16:30:43 +0000
Gerrit-HasComments: Yes

Reply via email to