I'm adding one more point to the above. In my previous mail reply, I've
explained the striped block reconstruction task which will be triggered by
the Namenode on identifying a missing/bad block. Similarly, in case of hdfs
client read failure, currently hdfs client internally submitting read
requests to fetch all the 'k' chunks(belonging to the same stripe as the
failed chunk) from k data nodes and perform decoding to rebuild the lost
data chunk at the client side.

Regards,
Rakesh

On Fri, Jul 22, 2016 at 5:43 PM, Rakesh Radhakrishnan <rake...@apache.org>
wrote:

> Hi Roy,
>
> Thanks for the interest in hdfs erasure coding feature and helping us in
> making the feature more attractive to the users by sharing performance
> improvement ideas.
>
> Presently, the reconstruction work has been implemented in a centralized
> manner in which the reconstruction task will be given to one data
> node(first in the pipeline). For example, we have (k, m) erasure code
> schema, assume one chunk (say c bytes) is lost because of a disk or server
> failure, k * c bytes of data need to be retrieved from k servers to recover
> the lost data. The reconstructing data node will fetch k chunks (belonging
> to the same stripe as the failed chunk) from k different servers and
> perform decoding to rebuild the lost data chunk. Yes, this k-factor
> increases the network traffic causes reconstruction to be very slow. IIUC,
> during the implementation time this point has come up but I think the
> priority has given for supporting the basic functionality first. I could
> see quite few jira tasks HDFS-7717, HDFS-7344 where it discussed about
> distributing the coding works to data nodes which includes - converting a
> file to a striped layout, reconstruction, error handling etc. But I feel,
> there is still room for discussing/implementing new approaches to get
> better performance results.
>
> In the shared doc, its mentioned that Partial-Parallel-Repair technique is
> successfully implemented on top of the Quantcast File System (QFS) [30],
> which supports RS-based erasure coded storage and got promising results.
> Its really an encouraging factor for us. I haven't gone through this doc
> deeply, it would be really great if you (or me or some other folks) could
> come up with the thoughts to discuss/implement similar mechanisms in HDFS
> as well. Mostly, will kick start the performance improvement activities
> after the much awaiting 3.0.0-alpha release:)
>
> >>>> Also, I would like to know what others have done to sustain good
> >>>> performance even under failures (other than keeping fail-over
> replicas).
> I'm not having much idea about this part, probably some other folks can
> pitch in and share thoughts.
>
> Regards,
> Rakesh
>
> On Fri, Jul 22, 2016 at 2:03 PM, Roy Leonard <roy.leonard...@gmail.com>
> wrote:
>
>> Greetings!
>>
>> We are evaluating erasure coding on HDFS to reduce storage cost.
>> However, the degraded read latency seems like a crucial bottleneck for our
>> system.
>> After exploring some strategies for alleviating the pain of degraded read
>> latency,
>> I found a "tree-like recovery" technique might be useful, as described in
>> the following paper:
>> "Partial-parallel-repair (PPR): a distributed technique for repairing
>> erasure coded storage" (Eurosys-2016)
>> http://dl.acm.org/citation.cfm?id=2901328
>>
>> My question is:
>>
>> Do you already have such tree-like recovery implemented in HDFS-EC if not,
>> do you have any plans to add similar technique is near future ?
>>
>> Also, I would like to know what others have done to sustain good
>> performance even under failures (other than keeping fail-over replicas).
>>
>> Regards,
>> R.
>>
>
>

Reply via email to