[ https://issues.apache.org/jira/browse/HBASE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085032#comment-14085032 ]
Jeffrey Zhong commented on HBASE-11660: --------------------------------------- The issue is that HBase code is assuming InputStream.available() returns an accurate bytes left for reading. We have quite a few other places in IO operations rely on that. Even in ProtobufLogReader code we have a place like the following: {code} available = this.inputStream.available(); LOG.info("available size from inputStream.available()=" + available); if (available > 0 && available < size) { throw new EOFException("Available stream not enough for edit, " + "inputStream.available()= " + this.inputStream.available() + ", " + "entry size= " + size); } {code} The change will also incur one more IO seek because we can't rely on available() to exit early and after the change we force a last seek to trigger an EOF exception from underlying input stream. I'd suggest we do a more wider evaluation to see the possibility to remove the dependency on available(). > Make WAL reader follow contract for java.io.InputStream.available() > ------------------------------------------------------------------- > > Key: HBASE-11660 > URL: https://issues.apache.org/jira/browse/HBASE-11660 > Project: HBase > Issue Type: Bug > Components: wal > Reporter: Eric Hanson > Priority: Minor > Attachments: hbase-11660.01.patch > > > In the process of building support to running HBase on Microsoft Azure > HDInsight, I hit an issue in the HBase WAL reading process that took a lot of > time to debug. The WAL reading code depends on available() for the log > InputStream never returing 0 until end of file. This is not the same as the > contract in java.io.InputStream for available. > To prevent future grief for others that may want to port HBase onto storage > systems other than HDFS, I propose to change the HBase WAL reader so it does > not assume that EOF has been reached when available() == 0. It instead would > treat available only as described in InputStream, i.e. available() is merely > the number of bytes that could be read from the stream without blocking. That > could be 0 even before EOF. -- This message was sent by Atlassian JIRA (v6.2#6252)