-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
-----------------------------------------------------------

Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.


Bugs: HIVE-22411
    https://issues.apache.org/jira/browse/HIVE-22411


Repository: hive-git


Description
-------

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. In order to calculate these, it traverses the 
directory recursively. During the recursion for each path a separate listStatus 
call is executed. In the end the more delta directory you have the more time it 
takes to calculate the statistics.

Therefore insertion time goes up linearly.


Diffs
-----

  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
 38e843aeacf 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
 155ecb18bf5 


Diff: https://reviews.apache.org/r/71707/diff/1/


Testing
-------

measured and plotted insertation time


Thanks,

Attila Magyar

Reply via email to