[ 
https://issues.apache.org/jira/browse/HDFS-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523684#comment-14523684
 ] 

Jing Zhao commented on HDFS-8281:
---------------------------------

Thanks for the comments, Zhe!

For 4&5, thanks for the explanation about the "short read"! Some of my thoughts 
here:
# At the current stage, I think our main use case is still sequential read, and 
it's good to read in parallel to serve this kind of request so that we can 
achieve better throughput. This means that the basic unit for each individual 
read should still be a cell.
# Actually the tradeoff here is the throughput and the biggest latency of 
serving a single read request. The parallel read may get delayed by a 
slow/unavailable DN. But we always have to handle slow/unavailable DN during 
the read. The difference is the stripe size during the decoding: let's say each 
time we only return 64KB (for simplicity assuming they come from the same DN), 
and if the data is unavailable, a corresponding (64KB * 6) stripe will be read. 
In the current case we read 256KB * 6 (and if the cell size is 64KB it's 
actually the same).
# For the possible decoding use case we need to have a buffer to keep the data 
that has been served. If reading a complete stripe becomes a real concern 
because of its latency, a simple way to improve is to read less data into the 
buffer each time but without changing the buffer size. But currently without 
detailed benchmark data I'm not sure whether we want to add this logic 
immediately. I think this is something we must explore while doing the 
performance test and we can do improvement as a follow-on work.
# One question is why we choose 256KB as the cell size instead of the original 
64KB?

I will update the patch later to address comments 1~3.


> Erasure Coding: implement parallel stateful reading for striped layout
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8281
>                 URL: https://issues.apache.org/jira/browse/HDFS-8281
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-8281-HDFS-7285.001.patch, 
> HDFS-8281-HDFS-7285.001.patch, HDFS-8281.000.patch
>
>
> This jira aims to support parallel reading for stateful read in 
> {{DFSStripedInputStream}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to