----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71707/ -----------------------------------------------------------
(Updated Nov. 7, 2019, 9:23 a.m.) Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra. Changes ------- adressing review comments Bugs: HIVE-22411 https://issues.apache.org/jira/browse/HIVE-22411 Repository: hive-git Description ------- Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. In order to calculate these, it traverses the directory recursively. During the recursion for each path a separate listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics. Therefore insertion time goes up linearly. Diffs (updated) ----- common/src/java/org/apache/hadoop/hive/common/FileUtils.java 651b842f688 common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 09343e56166 standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 38e843aeacf standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java bf206fffc26 Diff: https://reviews.apache.org/r/71707/diff/3/ Changes: https://reviews.apache.org/r/71707/diff/2-3/ Testing ------- measured and plotted insertation time Thanks, Attila Magyar