[ 
https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008621#comment-14008621
 ] 

Liang Xie commented on HDFS-6286:
---------------------------------

bq. Yes, hedged reads only work for pread() now. We ought to extend it to all 
forms of read(). This will be a big latency win across the board, and not only 
for local reads.
Seems we have not a special issue be filed against it, right ? 

Just a minor update, i wrote some codes with my previous proposed manner and 
did a simple testing, it shows works.  Most of changes are in 
BlockReaderLocal.read().  Replaced "dataIn.read(buf,off,len) with:
{code}
      Callable<Integer> readCallable = new Callable<Integer>() {
        @Override
        public Integer call() throws Exception {
          return dataIn.read(buf, off, len);
        }
      };
      Future<Integer> future = null;
      try {
        future = localReadPool.submit(readCallable);
      } catch (RejectedExecutionException e) {
        //It's not a good idea to catch a runtime exception in usual, emmm
        LOG.warn("", e);
        throw new IOException(e);
      }
      long timeout = localReadTimeoutMs > 0 ? localReadTimeoutMs : 10000L;
      try {
        return future.get(timeout, TimeUnit.MILLISECONDS).intValue();
      } catch (InterruptedException e) {
        // probably a close() request comes now?
        LOG.warn("", e);
        throw new IOException(e);
      } catch (ExecutionException e) {
        //the real read i/o error ?
        LOG.warn("", e);
        throw new IOException(e);
      } catch (TimeoutException e) {
        LOG.warn("read timeout:" + timeout + "ms, gc issue? bad disk?", e);
        throw new IOException(e);
      }
{code}

> adding a timeout setting for local read io
> ------------------------------------------
>
>                 Key: HDFS-6286
>                 URL: https://issues.apache.org/jira/browse/HDFS-6286
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>
> Currently, if a write or remote read requested into a sick disk, 
> DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to 
> return back. but it doesn't work on local read. Take HBase scan for example,
> DFSInputStream.read -> readWithStrategy -> readBuffer -> 
> BlockReaderLocal.read ->  dataIn.read -> FileChannelImpl.read
> if it hits a bad disk, the low read io probably takes tens of seconds,  and 
> what's worse is, the "DFSInputStream.read" hold a lock always.
> Per my knowledge, there's no good mechanism to cancel a running read 
> io(Please correct me if it's wrong), so my opinion is adding a future around 
> the read request, and we could set a timeout there, if the threshold reached, 
> we can add the local node into deadnode probably...
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to