[
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090068#comment-13090068
]
Konstantin Shvachko commented on HDFS-1108:
-------------------------------------------
bq. I and several others am working on approach #2
Thanks Todd for clarifying this. Could you please also share the design
document for your approach? I would like to learn the details and understand
why you choose an approach which at this point does not seem to me optimal for
the project.
bq. option 1 causes dataloss regardless of your opinions on HA
This is not a data loss, Todd. This is a tradeoff between performance and the
persistence of data. Having flush and sync one can control when to choose
performance in favor of guaranteed persistence and when vice versa. Regardless
of my opinion on HA
Is anybody asking to remove this flexibility?
bq. How do you differentiate between logSync() to that stream Vs stream to the
disk?
I do not differentiate between streams. As I said addBlock() transaction should
be treated the same way as setTimes(), that is it is logged (and batched) but
not synced. There is no consistency issue here. Transactions will eventually be
committed to the journal by another sync-able transaction or by a file close().
bq. The approach 2 has different requirements that many are interested in. I
have repeated this many times.
Suresh, yes you do repeat it. But you never answered MY question, which HA
approach are you implementing. As you see you have to make choices even with
issues that seemed to be common part for all approaches.
I like Milind's idea about an implementation "without shared storage
assumption".
Sticking to the point.
- Without HA consideration this patch removes flexibility to choose between
performance and guaranteed persistence of data
- There should be a good reason for that
- HA solution with shared storage seems to be the reason
- The community has not seen the design, hasn't discussed it, and is not aware
of why this one is better than other three (or was it four) published in
different jiras.
> Log newly allocated blocks
> --------------------------
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Reporter: dhruba borthakur
> Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not
> persisted in the NN transaction log when the block is allocated. Instead, a
> hflush() or a close() on the file persists the blocks into the transaction
> log. It would be nice if we can immediately persist newly allocated blocks
> (as soon as they are allocated) for specific files.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira