[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HADOOP-18873:

Labels: pull-request-available  (was: )

> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-08-30 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18873:
---
Description: 
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().

 

startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.

 

Method which uses the DataBlock object: 
https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298

  was:
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().

 

startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.


> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/a

[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-08-30 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18873:
---
Description: 
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().

 

startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.

  was:
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().


> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org