[jira] [Commented] (HDFS-1108) Log newly allocated blocks

Konstantin Shvachko (JIRA) Wed, 24 Aug 2011 00:58:14 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090068#comment-13090068
 ]


Konstantin Shvachko commented on HDFS-1108:
-------------------------------------------

bq. I and several others am working on approach #2

Thanks Todd for clarifying this. Could you please also share the design 
document for your approach? I would like to learn the details and understand 
why you choose an approach which at this point does not seem to me optimal for 
the project.

bq. option 1 causes dataloss regardless of your opinions on HA

This is not a data loss, Todd. This is a tradeoff between performance and the 
persistence of data. Having flush and sync one can control when to choose 
performance in favor of guaranteed persistence and when vice versa. Regardless 
of my opinion on HA 
Is anybody asking to remove this flexibility?

bq. How do you differentiate between logSync() to that stream Vs stream to the 
disk?

I do not differentiate between streams. As I said addBlock() transaction should 
be treated the same way as setTimes(), that is it is logged (and batched) but 
not synced. There is no consistency issue here. Transactions will eventually be 
committed to the journal by another sync-able transaction or by a file close().

bq. The approach 2 has different requirements that many are interested in. I 
have repeated this many times.

Suresh, yes you do repeat it. But you never answered MY question, which HA 
approach are you implementing. As you see you have to make choices even with 
issues that seemed to be common part for all approaches.

I like Milind's idea about an implementation "without shared storage 
assumption".

Sticking to the point.
- Without HA consideration this patch removes flexibility to choose between 
performance and guaranteed persistence of data
- There should be a good reason for that
- HA solution with shared storage seems to be the reason
- The community has not seen the design, hasn't discussed it, and is not aware 
of why this one is better than other three (or was it four) published in 
different jiras.

> Log newly allocated blocks
> --------------------------
>
>                 Key: HDFS-1108
>                 URL: https://issues.apache.org/jira/browse/HDFS-1108
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1108) Log newly allocated blocks

Reply via email to