John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 )
Change subject: IMPALA-10552: External Frontend CTAS support ...................................................................... Patch Set 4: (3 comments) Thanks Joe for the feedback http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12 PS4, Line 12: - When this is set, the external frontend is responsible for : for moving and managing the results > I assume the external frontend is also handling any metadata operations as This is correct. I'll update the commit message to make that clearer. http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127 PS4, Line 127: if (HasExternalStagingDir()) { : // When an external FE has provided a staging path, we use it directly. : staging_dir_ = external_staging_dir_; : } else { : staging_dir_ = Substitute("$0/_impala_insert_staging/$1", : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), "_")); : } > The external staging directory is a staging directory from the point of vie I can certainly change everything to be external output rather than external staging. I'll verify why I'm modifying staging_dir_ here - reading the code I assumed I initially did this so BuildHdfsFileNames would use it correctly to build tmp_hdfs_* variables but indeed I don't think these paths would utilize them. http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279 PS4, Line 279: // The 0 padding on base and delta is to match the behavior of Hive since various : // systems will expect a certain format for dynamic partition creation. Additionally : // include an 0 statement id for delta directory so various Hive AcidUtils detect : // the directory (such as AcidUtils.baseOrDeltaSubdir()). Multiple statements in a : // single transaction is not supported. : if (overwrite_) { : output_partition->final_hdfs_file_name_prefix += StringPrintf("/base_%07ld/", : write_id_); : } else { : output_partition->final_hdfs_file_name_prefix += StringPrintf( : "/delta_%07ld_%07ld_0000/", write_id_, write_id_); : } > Should this be triggered by its own variable independently from the externa It is a bit orthogonal. I think ideally Impala/Hive and external frontend implementations would standardize on something consistent. I took the paranoid approach of only applying this to external frontends as I do not want to cause any inadvertent regressions. I'm not sure it is currently worth it to plumb through a flag indicating which format is expected. If another implementation of a frontend shows up,I think it might be worth it. Feel free to push back if you feel strongly. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman <j...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: John Sherman <j...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Mon, 08 Mar 2021 22:15:51 +0000 Gerrit-HasComments: Yes