Hi Roy,

Thanks for the interest in hdfs erasure coding feature and helping us in
making the feature more attractive to the users by sharing performance
improvement ideas.

Presently, the reconstruction work has been implemented in a centralized
manner in which the reconstruction task will be given to one data
node(first in the pipeline). For example, we have (k, m) erasure code
schema, assume one chunk (say c bytes) is lost because of a disk or server
failure, k * c bytes of data need to be retrieved from k servers to recover
the lost data. The reconstructing data node will fetch k chunks (belonging
to the same stripe as the failed chunk) from k different servers and
perform decoding to rebuild the lost data chunk. Yes, this k-factor
increases the network traffic causes reconstruction to be very slow. IIUC,
during the implementation time this point has come up but I think the
priority has given for supporting the basic functionality first. I could
see quite few jira tasks HDFS-7717, HDFS-7344 where it discussed about
distributing the coding works to data nodes which includes - converting a
file to a striped layout, reconstruction, error handling etc. But I feel,
there is still room for discussing/implementing new approaches to get
better performance results.

In the shared doc, its mentioned that Partial-Parallel-Repair technique is
successfully implemented on top of the Quantcast File System (QFS) [30],
which supports RS-based erasure coded storage and got promising results.
Its really an encouraging factor for us. I haven't gone through this doc
deeply, it would be really great if you (or me or some other folks) could
come up with the thoughts to discuss/implement similar mechanisms in HDFS
as well. Mostly, will kick start the performance improvement activities
after the much awaiting 3.0.0-alpha release:)

>>>> Also, I would like to know what others have done to sustain good
>>>> performance even under failures (other than keeping fail-over
replicas).
I'm not having much idea about this part, probably some other folks can
pitch in and share thoughts.

Regards,
Rakesh

On Fri, Jul 22, 2016 at 2:03 PM, Roy Leonard <roy.leonard...@gmail.com>
wrote:

> Greetings!
>
> We are evaluating erasure coding on HDFS to reduce storage cost.
> However, the degraded read latency seems like a crucial bottleneck for our
> system.
> After exploring some strategies for alleviating the pain of degraded read
> latency,
> I found a "tree-like recovery" technique might be useful, as described in
> the following paper:
> "Partial-parallel-repair (PPR): a distributed technique for repairing
> erasure coded storage" (Eurosys-2016)
> http://dl.acm.org/citation.cfm?id=2901328
>
> My question is:
>
> Do you already have such tree-like recovery implemented in HDFS-EC if not,
> do you have any plans to add similar technique is near future ?
>
> Also, I would like to know what others have done to sustain good
> performance even under failures (other than keeping fail-over replicas).
>
> Regards,
> R.
>

Reply via email to