[ 
https://issues.apache.org/jira/browse/HIVE-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhas Kumar Samanta updated HIVE-22054:
-----------------------------------------
    Description: 
During drop partition on a managed table, first we delete the directory 
corresponding to the partition. After that we recursively delete the parent 
directory as well if parent directory becomes empty. To do this emptiness 
check, we call Warehouse::getContentSummary(), which in turn recursively check 
all files and subdirectories. This is a costly operation when a directory has a 
lot of files or subdirectories. This overhead is even more prominent for cloud 
based file systems like s3. And for emptiness check, this is unnecessary too.

This is recursive listing was introduced as part of HIVE-5220. Code snippet for 
reference :
{code:java}
public boolean isEmpty(Path path) throws IOException, MetaException {
  ContentSummary contents = getFs(path).getContentSummary(path);
  if (contents != null && contents.getFileCount() == 0 && 
contents.getDirectoryCount() == 1) {
    return true;
  }
  return false;
}

// Note: FileSystem::getContentSummary() performs a recursive listing.{code}

> Avoid recursive listing to check if a directory is empty
> --------------------------------------------------------
>
>                 Key: HIVE-22054
>                 URL: https://issues.apache.org/jira/browse/HIVE-22054
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.13.0, 1.2.0, 2.1.0, 3.1.1, 2.3.5
>            Reporter: Prabhas Kumar Samanta
>            Priority: Major
>
> During drop partition on a managed table, first we delete the directory 
> corresponding to the partition. After that we recursively delete the parent 
> directory as well if parent directory becomes empty. To do this emptiness 
> check, we call Warehouse::getContentSummary(), which in turn recursively 
> check all files and subdirectories. This is a costly operation when a 
> directory has a lot of files or subdirectories. This overhead is even more 
> prominent for cloud based file systems like s3. And for emptiness check, this 
> is unnecessary too.
> This is recursive listing was introduced as part of HIVE-5220. Code snippet 
> for reference :
> {code:java}
> public boolean isEmpty(Path path) throws IOException, MetaException {
>   ContentSummary contents = getFs(path).getContentSummary(path);
>   if (contents != null && contents.getFileCount() == 0 && 
> contents.getDirectoryCount() == 1) {
>     return true;
>   }
>   return false;
> }
> // Note: FileSystem::getContentSummary() performs a recursive listing.{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to