Sahil Takiar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15998 )

Change subject: IMPALA-9777: Set hive.optimize.sort.dynamic.partition to true 
for dynamic inserts
......................................................................

IMPALA-9777: Set hive.optimize.sort.dynamic.partition to true for dynamic 
inserts

This sets hive.optimize.sort.dynamic.partition to true when loading
tpcds.store_sales. This option takes effect during Hive dynamic partitioning
inserts. It introduces a sort into the insert query so that all data is
sorted on the partition key. This allows the reducers to only open a single
file at a time when writing out files.

When this config is set to false, Hive will write to multiple partitions
at the same time. So a single Hive container will have multiple file
handles open at once. This can lead to OOM issues on the Hive side as well
as diskspace issues with HDFS. When a file is opened on HDFS, the
Namenode reserves an entire block for each file, even if the resulting
file is less than a block size. If there isn't enough disk space for all
file reservations, inserts will start failing because HDFS says there is
not enough capacity on the cluster.

The change is only necessary when loading tpcds.store_sales. Adding it
to other dynamic partitioning inserts does not seem to be necessary. It
is likely that the issue only shows up when reading from an
unpartitioned table and inserting into a partitioned table. In this
case, loading tpcds.store_sales requires reading from
tpcds_unpartitioned.store_sales. The other dynamic partitioning inserts
all read from a partitioned table and write to a partitioned table.

This patch does not introduce a significant performance regression to
the runtime of data-load generation.

Testing:
* Ran core tests
* Ran core tests for Impala-EC

Change-Id: Ic2b7c0ec40a02da2640fae20cf640517fd1f4fef
Reviewed-on: http://gerrit.cloudera.org:8080/15998
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Sahil Takiar <[email protected]>
---
M testdata/datasets/tpcds/tpcds_schema_template.sql
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Sahil Takiar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/15998
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic2b7c0ec40a02da2640fae20cf640517fd1f4fef
Gerrit-Change-Number: 15998
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>

Reply via email to