On Tue, Apr 18, 2017 at 07:31:34AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-04-17 15:39, Chris Murphy wrote:
> >On Mon, Apr 17, 2017 at 1:26 PM, Austin S. Hemmelgarn
> ><ahferro...@gmail.com> wrote:
> >>On 2017-04-17 14:34, Chris Murphy wrote:
[...]
> >>>>>It's almost like we need these things to not fsync at all, and just
> >>>>>rely on the filesystem commit time...
> >>>>
> >>>>
> >>>>Essentially yes, but that causes all kinds of other problems.
> >>>
> >>>
> >>>Drat.
> >>>
> >>Admittedly most of the problems are use-case specific (you can't afford to
> >>lose transactions in a financial database  for example, so it functionally
> >>has to call fsync after each transaction), but most of it stems from the
> >>fact that BTRFS is doing a lot of the same stuff that much of the 'problem'
> >>software is doing itself internally.
> >>
> >
> >Seems like the old way of doing things, and the staleness of the
> >internet, have colluded to create a lot of nervousness and misuse of
> >fsync. The very fact Btrfs needs a log tree to deal with fsync's in a
> >semi-sane way...
> Except that BTRFS is somewhat unusual.  Prior to this, the only
> 'mainstream' filesystem that provided most of these features was
> ZFS, and that does a good enough job that this doesn't matter.
> 
> For something like a database though, where you need ACID
> guarantees, you pretty much have to have COW semantics internally,
> and you have to force things to stable storage after each
> transaction that actually modifies data.  Looking at it another way,
> most database storage formats are essentially record-oriented
> filesystems (as opposed to block-oriented filesystems that most
> people think of).  This is part of why you see such similar access
> patterns in databases and VM disk images (even if the VM isn't
> running database software), they are essentially doing the same
> things at a low level.

   I remember thinking, when I was learning about the internals of
btrfs, that it looked an awful lot like the high-level description of
the internals of Oracle which I'd just been learning about. Most of
the same pieces, doing mostly the same kinds operations to achieve the
same effective results.

   Hugo.

-- 
Hugo Mills             | Don't worry, he's not drunk. He's like that all the
hugo@... carfax.org.uk | time.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                           A.H. Deakin

Attachment: signature.asc
Description: Digital signature

Reply via email to