[ 
https://issues.apache.org/jira/browse/HIVE-25990?focusedWorklogId=737418&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-737418
 ]

ASF GitHub Bot logged work on HIVE-25990:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Mar/22 08:21
            Start Date: 07/Mar/22 08:21
    Worklog Time Spent: 10m 
      Work Description: rbalamohan commented on a change in pull request #3058:
URL: https://github.com/apache/hive/pull/3058#discussion_r820468948



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -1514,6 +1523,18 @@ public static void mvFileToFinalPath(Path specPath, 
Configuration hconf,
     fs.delete(taskTmpPath, true);
   }
 
+  private static void createFileList(Set<FileStatus> filesKept, Path srcPath, 
Path targetPath, FileSystem fs)
+      throws IOException {
+    String files = srcPath.toString() + System.lineSeparator();
+    for (FileStatus file : filesKept) {
+      files += file.getPath().toString() + System.lineSeparator();
+    }
+    try (FSDataOutputStream outStream = fs.create(new Path(targetPath, 
BLOB_FILES_KEPT))) {
+      outStream.writeBytes(files);

Review comment:
       Why not loop through filesKept here and directly write them to 
outStream? That will mem pressure in constructing "files"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 737418)
    Time Spent: 20m  (was: 10m)

> Optimise multiple copies in case of CTAS in external tables for Object stores
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-25990
>                 URL: https://issues.apache.org/jira/browse/HIVE-25990
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Presently for CTAS with external tables, there are two renames, operations, 
> one from tmp to _ext and then from _ext to actual target.
> In case of object stores, the renames lead to actual copy. Avoid renaming by 
> avoiding rename from tmp to _ext, but by creating a list of files to be 
> copied in that directly, which can be consumed in the move task, to copy 
> directly from tmp to actual target.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to