John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
......................................................................


Patch Set 4:

(3 comments)

Thanks Joe for the feedback

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12
PS4, Line 12:   - When this is set, the external frontend is responsible for
            :   for moving and managing the results
> I assume the external frontend is also handling any metadata operations as
This is correct. I'll update the commit message to make that clearer.


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127
PS4, Line 127:   if (HasExternalStagingDir()) {
             :     // When an external FE has provided a staging path, we use 
it directly.
             :     staging_dir_ = external_staging_dir_;
             :   } else {
             :     staging_dir_ = Substitute("$0/_impala_insert_staging/$1",
             :         table_desc_->hdfs_base_dir(), PrintId(state->query_id(), 
"_"));
             :   }
> The external staging directory is a staging directory from the point of vie
I can certainly change everything to be external output rather than external 
staging.

I'll verify why I'm modifying staging_dir_ here - reading the code I assumed I 
initially did this so BuildHdfsFileNames would use it correctly to build 
tmp_hdfs_* variables but indeed I don't think these paths would utilize them.


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279
PS4, Line 279:       // The 0 padding on base and delta is to match the 
behavior of Hive since various
             :       // systems will expect a certain format for dynamic 
partition creation. Additionally
             :       // include an 0 statement id for delta directory so 
various Hive AcidUtils detect
             :       // the directory (such as AcidUtils.baseOrDeltaSubdir()). 
Multiple statements in a
             :       // single transaction is not supported.
             :       if (overwrite_) {
             :         output_partition->final_hdfs_file_name_prefix += 
StringPrintf("/base_%07ld/",
             :             write_id_);
             :       } else {
             :         output_partition->final_hdfs_file_name_prefix += 
StringPrintf(
             :             "/delta_%07ld_%07ld_0000/", write_id_, write_id_);
             :       }
> Should this be triggered by its own variable independently from the externa
It is a bit orthogonal. I think ideally Impala/Hive and external frontend 
implementations would standardize on something consistent. I took the paranoid 
approach of only applying this to external frontends as I do not want to cause 
any inadvertent regressions. I'm not sure it is currently worth it to plumb 
through a flag indicating which format is expected. If another implementation 
of a frontend shows up,I think it might be worth it. Feel free to push back if 
you feel strongly.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman <j...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: John Sherman <j...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Mar 2021 22:15:51 +0000
Gerrit-HasComments: Yes

Reply via email to