[ https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-14028: ------------------------------------ Status: Open (was: Patch Available) > S3A block output streams don't delete temporary files in multipart uploads > -------------------------------------------------------------------------- > > Key: HADOOP-14028 > URL: https://issues.apache.org/jira/browse/HADOOP-14028 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 2.8.0 > Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2 > Reporter: Seth Fitzsimmons > Assignee: Steve Loughran > Priority: Critical > Attachments: HADOOP-14028-branch-2-001.patch, > HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, > HADOOP-14028-branch-2.8-004.patch > > > I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I > was looking for after running into the same OOM problems) and don't see it > cleaning up the disk-cached blocks. > I'm generating a ~50GB file on an instance with ~6GB free when the process > starts. My expectation is that local copies of the blocks would be deleted > after those parts finish uploading, but I'm seeing more than 15 blocks in > /tmp (and none of them have been deleted thus far). > I see that DiskBlock deletes temporary files when closed, but is it closed > after individual blocks have finished uploading or when the entire file has > been fully written to the FS (full upload completed, including all parts)? > As a temporary workaround to avoid running out of space, I'm listing files, > sorting by atime, and deleting anything older than the first 20: `ls -ut | > tail -n +21 | xargs rm` > Steve Loughran says: > > They should be deleted as soon as the upload completes; the close() call > > that the AWS httpclient makes on the input stream triggers the deletion. > > Though there aren't tests for it, as I recall. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org