[ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716215
 ]

ASF GitHub Bot logged work on HIVE-21100:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/22 06:43
            Start Date: 27/Jan/22 06:43
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #2921:
URL: https://github.com/apache/hive/pull/2921#discussion_r793293292



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -97,6 +101,40 @@ public MoveTask() {
     super();
   }
 
+  public void flattenUnionSubdirectories(Path sourcePath) throws HiveException 
{
+    try {
+      FileSystem fs = sourcePath.getFileSystem(conf);
+      LOG.info("Checking " + sourcePath + " for subdirectories to flatten");
+      Set<Path> unionSubdirs = new HashSet<>();
+      if (fs.exists(sourcePath)) {
+        RemoteIterator<LocatedFileStatus> i = fs.listFiles(sourcePath, true);
+        String prefix = AbstractFileMergeOperator.UNION_SUDBIR_PREFIX;
+        while (i.hasNext()) {
+          Path path = i.next().getPath();
+          Path parent = path.getParent();
+          if (parent.getName().startsWith(prefix)) {
+            // We do rename by including the name of parent directory into the 
filename so that there are no clashes
+            // when we move the files to the parent directory. Ex. 
HIVE_UNION_SUBDIR_1/000000_0 -> 1_000000_0
+            String parentOfParent = parent.getParent().toString();
+            String parentNameSuffix = 
parent.getName().substring(prefix.length());
+
+            fs.rename(path, new Path(parentOfParent + "/" + parentNameSuffix + 
"_" + path.getName()));

Review comment:
       What happens if we already has this filename used? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 716215)
    Time Spent: 2h  (was: 1h 50m)

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-21100
>                 URL: https://issues.apache.org/jira/browse/HIVE-21100
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: George Pachitariu
>            Assignee: George Pachitariu
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to