Bryan Beaudreault created HBASE-28338:
-----------------------------------------

             Summary: Bounded leak of FSDataInputStream buffers from checksum 
switching
                 Key: HBASE-28338
                 URL: https://issues.apache.org/jira/browse/HBASE-28338
             Project: HBase
          Issue Type: Bug
            Reporter: Bryan Beaudreault


In FSDataInputStreamWrapper, the unbuffer() method caches an unbuffer instance 
the first time it is called. When an FSDataInputStreamWrapper is initialized, 
it has hbase checksum disabled.

In HFileInfo.initTrailerAndContext we get the stream, read the trailer, then 
call unbuffer. At this point, checksums have not been enabled yet via 
prepareForBlockReader. So the call to unbuffer() caches the current 
non-checksum stream as the unbuffer instance.

Later, in initMetaAndIndex we do a similar thing. This time, 
prepareForBlockReader has been called, so we are now using hbase checksums. 
When initMetaAndIndex calls unbuffer(), it uses the old unbuffer instance which 
actually has been closed when we switched to hbase checksums. So that call does 
nothing, and the new no-checksum input stream is never unbuffered.

I haven't seen this cause an issue with normal hdfs replication (though haven't 
gone looking). It's very problematic for Erasure Coding because 
DFSStripedInputStream holds a large buffer (numDataBlocks * cellSize, so 6mb 
for RS-6-3-1024k) that is only used for stream reads NOT pread. The 
FSDataInputStreamWrapper we are talking about here is only used for pread in 
hbase, so those 6mb buffers just hang around totally unused but unreclaimable. 
Since there is an input stream per StoreFile, this can add up very quickly on 
big servers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to