Sahil Takiar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15998 )
Change subject: IMPALA-9777: Set hive.optimize.sort.dynamic.partition to true for dynamic inserts ...................................................................... IMPALA-9777: Set hive.optimize.sort.dynamic.partition to true for dynamic inserts This sets hive.optimize.sort.dynamic.partition to true when loading tpcds.store_sales. This option takes effect during Hive dynamic partitioning inserts. It introduces a sort into the insert query so that all data is sorted on the partition key. This allows the reducers to only open a single file at a time when writing out files. When this config is set to false, Hive will write to multiple partitions at the same time. So a single Hive container will have multiple file handles open at once. This can lead to OOM issues on the Hive side as well as diskspace issues with HDFS. When a file is opened on HDFS, the Namenode reserves an entire block for each file, even if the resulting file is less than a block size. If there isn't enough disk space for all file reservations, inserts will start failing because HDFS says there is not enough capacity on the cluster. The change is only necessary when loading tpcds.store_sales. Adding it to other dynamic partitioning inserts does not seem to be necessary. It is likely that the issue only shows up when reading from an unpartitioned table and inserting into a partitioned table. In this case, loading tpcds.store_sales requires reading from tpcds_unpartitioned.store_sales. The other dynamic partitioning inserts all read from a partitioned table and write to a partitioned table. This patch does not introduce a significant performance regression to the runtime of data-load generation. Testing: * Ran core tests * Ran core tests for Impala-EC Change-Id: Ic2b7c0ec40a02da2640fae20cf640517fd1f4fef Reviewed-on: http://gerrit.cloudera.org:8080/15998 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Sahil Takiar <[email protected]> --- M testdata/datasets/tpcds/tpcds_schema_template.sql 1 file changed, 2 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Sahil Takiar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/15998 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic2b7c0ec40a02da2640fae20cf640517fd1f4fef Gerrit-Change-Number: 15998 Gerrit-PatchSet: 6 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]>
