[ 
https://issues.apache.org/jira/browse/HIVE-28335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28335:
--------------------------------
    Description: 
This is the followup after HIVE-27884 where I finally decided to leave 
deleteOnExit as is because I didn't need to change it.

 so in the scope of this we need to check and remove all deleteOnExit calls 
that belong to hadoop FileSystem objects (doesn't necessarily apply to 
java.io.File.deleteOnExit calls):
{code}
grep -iRH "deleteOnExit" --include="*.java" | grep -v "test"
...
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        // in 
recent hadoop versions, use deleteOnExit to clean tmp files.
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        
autoDelete = fs.deleteOnExit(fsp.outPaths[filesIdx]);
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/PathInfo.java:
        fileSystem.deleteOnExit(dir);
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:      
parentDir.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:      
tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:  
      parentDir.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:  
      tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/ObjectContainer.java:    
    tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java:      
  autoDelete = fs.deleteOnExit(outPath);
{code}

as a reference from previous ticket: 
[commit|https://github.com/apache/hive/pull/4882/commits/7a9d299f6994ca5a8c17486549103b25692b5cba]
it caused some hdfs counters difference in q.outs, needs to investigate

  was:
This is the followup after HIVE-27884 where I finally decided to leave 
deleteOnExit as is because I didn't need to change it.

 so in the scope of this we need to check and remove all deleteOnExit calls 
that belong to hadoop FileSystem objects (doesn't necessarily apply to 
java.io.File.deleteOnExit calls):
{code}
grep -iRH "deleteOnExit" --include="*.java" | grep -v "test"
...
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        // in 
recent hadoop versions, use deleteOnExit to clean tmp files.
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        
autoDelete = fs.deleteOnExit(fsp.outPaths[filesIdx]);
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/PathInfo.java:
        fileSystem.deleteOnExit(dir);
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:      
parentDir.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:      
tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:  
      parentDir.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:  
      tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/ObjectContainer.java:    
    tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java:      
  autoDelete = fs.deleteOnExit(outPath);
{code}



> Review deleteOnExitUsage
> ------------------------
>
>                 Key: HIVE-28335
>                 URL: https://issues.apache.org/jira/browse/HIVE-28335
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> This is the followup after HIVE-27884 where I finally decided to leave 
> deleteOnExit as is because I didn't need to change it.
>  so in the scope of this we need to check and remove all deleteOnExit calls 
> that belong to hadoop FileSystem objects (doesn't necessarily apply to 
> java.io.File.deleteOnExit calls):
> {code}
> grep -iRH "deleteOnExit" --include="*.java" | grep -v "test"
> ...
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        // 
> in recent hadoop versions, use deleteOnExit to clean tmp files.
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        
> autoDelete = fs.deleteOnExit(fsp.outPaths[filesIdx]);
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/PathInfo.java:
>         fileSystem.deleteOnExit(dir);
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:     
>  parentDir.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:     
>  tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:
>         parentDir.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:
>         tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/ObjectContainer.java:  
>       tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java:    
>     autoDelete = fs.deleteOnExit(outPath);
> {code}
> as a reference from previous ticket: 
> [commit|https://github.com/apache/hive/pull/4882/commits/7a9d299f6994ca5a8c17486549103b25692b5cba]
> it caused some hdfs counters difference in q.outs, needs to investigate



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to