[jira] [Commented] (HDFS-14308) DFSStripedInputStream should implement unbuffer()

2019-02-26 Thread Joe McDonnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778595#comment-16778595
 ] 

Joe McDonnell commented on HDFS-14308:
--

[~knanasi] Yes, good point, DFSStripedInputStream does implement unbuffer(). I 
think the issue revolves around curStripeBuf. It is allocated from the 
BUFFER_POOL in resetCurStripeBuffer(true), but it doesn't get returned to the 
BUFFER_POOL until close(). closeCurrentBlockReaders() calls 
resetCurStripeBuffer(false), which clears curStripeBuf but does not return it 
to the BUFFER_POOL.

I will update the description / title.

> DFSStripedInputStream should implement unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended 
> to move an input stream to a lower memory state to support these caching use 
> cases. Both Impala and HBase call unbuffer() when a file handle is being 
> cached and potentially unused for significant chunks of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14308) DFSStripedInputStream should implement unbuffer()

2019-02-26 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777837#comment-16777837
 ] 

Kitti Nanasi commented on HDFS-14308:
-

Thanks [~joemcdonnell] for reporting the issue!

DFSStripedInputStream does implement unbuffer as it derives from DFSInputStream 
which calls closeCurrentBlockReaders(); and that method clears all the block 
readers in DFSStripedInputStream, so that should be fine.

I think the issue might be that either CryptoInputStream or HdfsDataInputStream 
does not call the correct unbuffer method. But I have to check that to be sure.

> DFSStripedInputStream should implement unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended 
> to move an input stream to a lower memory state to support these caching use 
> cases. Both Impala and HBase call unbuffer() when a file handle is being 
> cached and potentially unused for significant chunks of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14308) DFSStripedInputStream should implement unbuffer()

2019-02-21 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774616#comment-16774616
 ] 

Wei-Chiu Chuang commented on HDFS-14308:


[~knanasi] could you take a look?

> DFSStripedInputStream should implement unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended 
> to move an input stream to a lower memory state to support these caching use 
> cases. Both Impala and HBase call unbuffer() when a file handle is being 
> cached and potentially unused for significant chunks of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14308) DFSStripedInputStream should implement unbuffer()

2019-02-21 Thread Joe McDonnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774576#comment-16774576
 ] 

Joe McDonnell commented on HDFS-14308:
--

Adding a link to the original IMPALA issue

> DFSStripedInputStream should implement unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should implement the unbuffer() call. See HDFS-7694. "unbuffer()" is intended 
> to move an input stream to a lower memory state to support these caching use 
> cases. Both Impala and HBase call unbuffer() when a file handle is being 
> cached and potentially unused for significant chunks of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org