Hi Rakesh,

Thanks for sharing your thoughts and updates.

(a) In your last email, I am sure you meant => "... submitting read
requests to fetch "any" (instead of all) the 'k' chunk (out of k+m-x
surviving chunks)  ?
Do you have any optimization in place to decide which data-nodes will be
part of those "k" ?

(b) Are there any caching being done (as proposed for QFS in the previously
attached "PPR" paper) ?

(c) When you mentioned stripping is being done, I assume it is probably to
reduce the chunk sizes and hence k*c ?
Now, if my object sizes are large (e.g. super HD images) where I would have
to get data from multiple stripes to rebuild the images before I can
display to the client, do you think stripping would still help ?
Is there a possibility that since I know that all the segments of the HD
image would always be read together, by stripping and distributing it on
different nodes, I am ignoring its special/temporal locality and further
increase any associated delays ?

Just wanted to know your thoughts.
I am looking forward to the future performance improvements in HDFS.

Regards,
R.

On Fri, Jul 22, 2016 at 8:52 AM, Rakesh Radhakrishnan <rake...@apache.org>
wrote:

> I'm adding one more point to the above. In my previous mail reply, I've
> explained the striped block reconstruction task which will be triggered by
> the Namenode on identifying a missing/bad block. Similarly, in case of hdfs
> client read failure, currently hdfs client internally submitting read
> requests to fetch all the 'k' chunks(belonging to the same stripe as the
> failed chunk) from k data nodes and perform decoding to rebuild the lost
> data chunk at the client side.
>
> Regards,
> Rakesh
>
> On Fri, Jul 22, 2016 at 5:43 PM, Rakesh Radhakrishnan <rake...@apache.org>
> wrote:
>
>> Hi Roy,
>>
>> Thanks for the interest in hdfs erasure coding feature and helping us in
>> making the feature more attractive to the users by sharing performance
>> improvement ideas.
>>
>> Presently, the reconstruction work has been implemented in a centralized
>> manner in which the reconstruction task will be given to one data
>> node(first in the pipeline). For example, we have (k, m) erasure code
>> schema, assume one chunk (say c bytes) is lost because of a disk or server
>> failure, k * c bytes of data need to be retrieved from k servers to recover
>> the lost data. The reconstructing data node will fetch k chunks (belonging
>> to the same stripe as the failed chunk) from k different servers and
>> perform decoding to rebuild the lost data chunk. Yes, this k-factor
>> increases the network traffic causes reconstruction to be very slow. IIUC,
>> during the implementation time this point has come up but I think the
>> priority has given for supporting the basic functionality first. I could
>> see quite few jira tasks HDFS-7717, HDFS-7344 where it discussed about
>> distributing the coding works to data nodes which includes - converting a
>> file to a striped layout, reconstruction, error handling etc. But I feel,
>> there is still room for discussing/implementing new approaches to get
>> better performance results.
>>
>> In the shared doc, its mentioned that Partial-Parallel-Repair technique
>> is successfully implemented on top of the Quantcast File System (QFS) [30],
>> which supports RS-based erasure coded storage and got promising results.
>> Its really an encouraging factor for us. I haven't gone through this doc
>> deeply, it would be really great if you (or me or some other folks) could
>> come up with the thoughts to discuss/implement similar mechanisms in HDFS
>> as well. Mostly, will kick start the performance improvement activities
>> after the much awaiting 3.0.0-alpha release:)
>>
>> >>>> Also, I would like to know what others have done to sustain good
>> >>>> performance even under failures (other than keeping fail-over
>> replicas).
>> I'm not having much idea about this part, probably some other folks can
>> pitch in and share thoughts.
>>
>> Regards,
>> Rakesh
>>
>> On Fri, Jul 22, 2016 at 2:03 PM, Roy Leonard <roy.leonard...@gmail.com>
>> wrote:
>>
>>> Greetings!
>>>
>>> We are evaluating erasure coding on HDFS to reduce storage cost.
>>> However, the degraded read latency seems like a crucial bottleneck for
>>> our
>>> system.
>>> After exploring some strategies for alleviating the pain of degraded read
>>> latency,
>>> I found a "tree-like recovery" technique might be useful, as described in
>>> the following paper:
>>> "Partial-parallel-repair (PPR): a distributed technique for repairing
>>> erasure coded storage" (Eurosys-2016)
>>> http://dl.acm.org/citation.cfm?id=2901328
>>>
>>> My question is:
>>>
>>> Do you already have such tree-like recovery implemented in HDFS-EC if
>>> not,
>>> do you have any plans to add similar technique is near future ?
>>>
>>> Also, I would like to know what others have done to sustain good
>>> performance even under failures (other than keeping fail-over replicas).
>>>
>>> Regards,
>>> R.
>>>
>>
>>
>

Reply via email to