[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881341#comment-13881341 ]
Arpit Agarwal commented on HDFS-5776: ------------------------------------- I reviewed the v8 patch. The implementation of {{hedgedFetchBlockByteRange}} looks great. Nice use of synchronization tools to make the code easy to understand. {quote} how much? 5000? 500000? or sth else? i think if enabling this feature explicitly, the end user/application should know a little backgroud at least, right? since we don't have any knowledge about end-user's storage configuration, just image if they have a fast flash(with HDFS-2832 enabled), say fusionio, probably one real disk read only cost tens of microseconds, {quote} [~xieliang007] #2 and #4 from my previous comment remain unaddressed. Threads are not free. If you really want to provide a user configurable setting for the thread count there should be a limit on the order of 64/128. I leave the exact number to you. The best approach is to use a small multiple of the processor count. If an app is not well behaved then the absence of limits can create a positive feedback loop. The slower the storage layer the more threads will get created when the correct behavior under load should be back off. Please add a thread count limit or ideally let’s not expose this setting at all. The same goes for the delay. Please add a lower bound. The exact value is up to you. We can always revisit the value if it turns out to be a bottleneck. {quote} let's keep it there, it's more easier to understand for developer or end user. {quote} I don't think it helps to have these functions and as Jing pointed out there is no purpose for it. I think it would be best to leave a single config setting i.e. either a boolean or a thread count, and a single method {{#isHedgedReadsEnabled}} to query the status of the feature. {quote} yes, but the loop is very very light, only when some exceptions like AccessControlException/InvalidEncryptionKeyException/InvalidBlockTokenException happened, will do extra loop, and all those have a fast quit mechanism, like refetchToken/refetchEncryptionKey or disableLegacyBlockReaderLocal, so this loop will only be executed just a very few times ... yes, you had a misunderstanding here. that's why i catch IOException fbae around fetchBlockAt. If we don't catch here, there will be always new refetch from outside loop and will have a spin loop {quote} I still do not understand how you are guarding against multiple refetch. Previously these counters were initialized outside any loop, now they are being reinitialized inside a loop. {{chooseDataNode(LocatedBlock block)}} function looks redundant and should be removed. > Support 'hedged' reads in DFSClient > ----------------------------------- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, > HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, > HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)