Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10120 )
Change subject: IMPALA-6899: Optimize the HDFS commands used in dataload ...................................................................... Patch Set 3: (3 comments) Rebasing and going to do some test runs http://gerrit.cloudera.org:8080/#/c/10120/3/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/10120/3/testdata/bin/create-load-data.sh@156 PS3, Line 156: hadoop fs -put -f ${IMPALA_HOME}/testdata/data/chars-formats.avro \ > Can you combine these (through line 163) too? Changed this to use a temporary directory and combined the schemas + the chars_formats calls into a single HDFS call. http://gerrit.cloudera.org:8080/#/c/10120/3/testdata/bin/create-load-data.sh@261 PS3, Line 261: hadoop fs -rm -f ${FILESYSTEM_PREFIX}/test-warehouse/authz-policy.ini > Wouldn't put overwrite this? i.e., I don't buy that this line is necessary. Added force option to put and removed the rm. http://gerrit.cloudera.org:8080/#/c/10120/3/testdata/bin/create-load-data.sh@279 PS3, Line 279: hadoop fs -rm -f /test-warehouse/alltypessmall/year=2009/month=1/_hidden \ > I think we overwrite this below; do we need to remove it? Add force option to put and removed the rm call. -- To view, visit http://gerrit.cloudera.org:8080/10120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0934353329dc7312394fc4457ab8db2a272c6282 Gerrit-Change-Number: 10120 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Sat, 21 Apr 2018 00:05:52 +0000 Gerrit-HasComments: Yes