[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

Aaron Fabbri (JIRA) Mon, 06 Feb 2017 15:37:12 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854968#comment-15854968
 ]


Aaron Fabbri commented on HADOOP-14028:
---------------------------------------

Quick review of v4 patch

{noformat}
@@ -392,8 +391,15 @@ private void putObject() throws IOException {
         executorService.submit(new Callable<PutObjectResult>() {
           @Override
           public PutObjectResult call() throws Exception {
-            PutObjectResult result = fs.putObjectDirect(putObjectRequest);
-            block.close();
+            PutObjectResult result;
+            try {
+              // the put object call automatically closes the input
+              // stream afterwards.
+              result = writeOperationHelper.putObject(putObjectRequest);
+              block.close();
+            } finally {
+              closeCloseables(LOG, block);
+            }
             return
{noformat}

Do you still need block.close() above finally block?

{noformat}
-      /**
-       * Return the buffer to the pool after the stream is closed.
-       */
-      @Override
-      public synchronized void close() {
-        if (byteBuffer != null) {
-          LOG.debug("releasing buffer");
-          releaseBuffer(byteBuffer);
+        /**
+         * Return the buffer to the pool after the stream is closed.
+         */
+        @Override
+        public synchronized void close() {
+          LOG.debug("ByteBufferInputStream.close() for {}",
+              ByteBufferBlock.super.toString());
           byteBuffer = null;
         }
-      }
  
-      /**
-       * Verify that the stream is open.
-       * @throws IOException if the stream is closed
-       */
-      private void verifyOpen() throws IOException {
-        if (byteBuffer == null) {
-          throw new IOException(FSExceptionMessages.STREAM_IS_CLOSED);
{noformat}

This part of diff was hard to read due to indent change, but did you eliminate 
the call to releaseBuffer(byteBuffer)?
If so can you explain that a bit?




> S3A block output streams don't delete temporary files in multipart uploads
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-14028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14028
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>         Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>            Reporter: Seth Fitzsimmons
>            Assignee: Steve Loughran
>            Priority: Critical
>         Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

Reply via email to