Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15980 to look at the new patch set (#2). Change subject: IMPALA-9777: Use Impala to do text tpcds.store_sales load ...................................................................... IMPALA-9777: Use Impala to do text tpcds.store_sales load tpcds.store_sales is populated by selecting from tpcds.store_sales_unpartitioned. Currently, this runs the insert statement via Hive. Since a large number of partitions are being created, this holds a large number of files open for writing. By an analysis of the namenode log, this peaks at over 450 open files. The open files reserve disk space corresponding to the HDFS block size, even though the resulting file is significantly smaller. This currently requires dozens of GB of free disk space to run successfully. Impala's inserts are clustered. The input is sorted and the partitions are created one by one. This means that it does not keep a large number of files open. Using Impala for these inserts would reduce the reserved diskspace requirement. This switches the inserts into the text version of tpcds.store_sales to use Impala. It introduces a "LOAD_IMPALA" section that is executed immediately after the Hive "LOAD" section. The non-text versions of store_sales are not impacted. Since the non-text versions are being created by selecting from the text version, Hive can process one partition at a time and avoid keeping many files open. Testing: - Ran a core job - Processed namenode logs and verified reduced number of outstanding files - Ran an erasure coding job Change-Id: Idfdfedd38a8001bdffd971cabd7df95020c88159 --- M bin/load-data.py M testdata/bin/generate-schema-statements.py M testdata/datasets/tpcds/tpcds_schema_template.sql 3 files changed, 26 insertions(+), 129 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/15980/2 -- To view, visit http://gerrit.cloudera.org:8080/15980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idfdfedd38a8001bdffd971cabd7df95020c88159 Gerrit-Change-Number: 15980 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>