[ https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524795#comment-14524795 ]
Hadoop QA commented on HDFS-6596: --------------------------------- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12653150/HDFS-6596.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10608/console | This message was automatically generated. > Improve InputStream when read spans two blocks > ---------------------------------------------- > > Key: HDFS-6596 > URL: https://issues.apache.org/jira/browse/HDFS-6596 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 2.4.0 > Reporter: Zesheng Wu > Assignee: Zesheng Wu > Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, > HDFS-6596.2.patch, HDFS-6596.3.patch, HDFS-6596.3.patch > > > In the current implementation of DFSInputStream, read(buffer, offset, length) > is implemented as following: > {code} > int realLen = (int) Math.min(len, (blockEnd - pos + 1L)); > if (locatedBlocks.isLastBlockComplete()) { > realLen = (int) Math.min(realLen, locatedBlocks.getFileLength()); > } > int result = readBuffer(strategy, off, realLen, corruptedBlockMap); > {code} > From the above code, we can conclude that the read will return at most > (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the > caller must call read() second time to complete the request, and must wait > second time to acquire the DFSInputStream lock(read() is synchronized for > DFSInputStream). For latency sensitive applications, such as hbase, this will > result in latency pain point when they under massive race conditions. So here > we propose that we should loop internally in read() to do best effort read. > In the current implementation of pread(read(position, buffer, offset, > lenght)), it does loop internally to do best effort read. So we can refactor > to support this on normal read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)