Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8894 to look at the new patch set (#6). Change subject: PREVIEW: IMPALA-6372: Go parallel for Hive dataload ...................................................................... PREVIEW: IMPALA-6372: Go parallel for Hive dataload This changes generate-schema-statements.py to produce separate SQL files for different file formats for Hive. This changes load-data.py to go parallel on these separate Hive SQL files. For correctness, the text version of all tables must be loaded before any of the other file formats. load-data.py runs DDLs to create the tables in Impala and goes parallel. Currently, there are some minor dependencies so that text tables must be created prior to creating the other table formats. This changes the definitions of some tables in testdata/datasets/functional/functional_schema_template.sql to remove these dependencies. Now, the DDLs for the text tables can run in parallel to the other file formats. To unify the parallelism for Impala and Hive, load-data.py now uses a single fixed-size pool of processes to run all SQL files rather than spawning a thread per SQL file. This currently switches Hive execution to go through Impyla's HS2 support rather than Beeline. This part is in flux. Speeding up Hive causes TPC-H to finish very quickly, while TPC-DS and functional are still doing DDLs. TPC-H's invalidate metadata can cause errors in TPC-DS or functional due IMPALA-5087. To avoid this, generate-schema-statements.py generates a SQL file to invalidate metadata for each table individually and load-data.py uses this rather than a universal invalidate. This saves about 15-20 minutes on dataload (including for GVO). Change-Id: I34b71e6df3c8f23a5a31451280e35f4dc015a2fd --- M bin/load-data.py M testdata/bin/generate-schema-statements.py M testdata/datasets/functional/functional_schema_template.sql 3 files changed, 137 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8894/6 -- To view, visit http://gerrit.cloudera.org:8080/8894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34b71e6df3c8f23a5a31451280e35f4dc015a2fd Gerrit-Change-Number: 8894 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins