On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, [EMAIL PROTECTED] wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was 
> >>discussed...
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
> You aren't going to find proof, any more than you'll find proof that you 
> won't lose data if you do lose a journalling fs. (Because there isn't 
> any.) Unfortunately, many people misunderstand the what a metadata 
> journal does for you, and overstate its importance in this type of 
> application.

Yes, many people do. :-)

> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
> *That's what the postgres log is for.* If the latest xlog entries don't 
> make it to disk, they won't be replayed; if they didn't make it to 
> disk, the transaction would not have been reported as commited. An 
> application that understands filesystem semantics can guarantee data 
> integrity without metadata journaling.

No. This is not true. Updating the file system structure (inodes, indirect
blocks) touches a separate part of the disk than the actual data. If
the file system structure is modified, say, to extend a file to allow
it to contain more data, but the data itself is not written, then upon
a restore, with a system such as ext2, or ext3 with writeback, or xfs,
it is possible that the end of the file, even the postgres log file,
will contain a random block of data from the disk. If this random block
of data happens to look like a valid xlog block, it may be played back,
and the database corrupted.

If the file system is only used for xlog data, the chance that it looks
like a valid block increases, would it not?

> >>The bottom line is that the only reason you need a metadata journalling 
> >>filesystem is to save the fsck time when you come up. On a little 
> >>partition like xlog, that's not an issue.
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken. 
> fsck is needed to reconcile the metadata with the on-disk allocations. 
> To do that, it reads all the inodes and their corresponding directory 
> entries. The time to do that is proportional to the size of the 
> filesystem, hence the comment about time. fsck is not needed "because 
> the filesystem is broken", it's needed because the filesystem is marked 
> dirty. 

This is also wrong. fsck is needed because the file system is broken.

It takes time, because it doesn't have a journal to help it, therefore it
must look through the entire file system and guess what the problems are.
There are classes of problems such as I describe above, for which fsck
*cannot* guess how to solve the problem. There is not enough information
available for it to deduce that anything is wrong at all.

The probability is low, for sure - but then, the chance of a file system
failure is already low.

Betting on ext2 + postgresql xlog has not been confirmed to me as reliable.

Telling me that journalling is misunderstood doesn't prove to me that you
understand it.

I don't mean to be offensive, but I won't accept what you say, as it does
not make sense with my understanding of how file systems work. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]     
__________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to