[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HADOOP-18873: Labels: pull-request-available (was: ) > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. > > Method which uses the DataBlock object: > https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18873: --- Description: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock. Method which uses the DataBlock object: https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298 was: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock. > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. > > Method which uses the DataBlock object: > https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/a
[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18873: --- Description: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock. was: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org