On Tue, Apr 18, 2017 at 07:31:34AM -0400, Austin S. Hemmelgarn wrote: > On 2017-04-17 15:39, Chris Murphy wrote: > >On Mon, Apr 17, 2017 at 1:26 PM, Austin S. Hemmelgarn > ><ahferro...@gmail.com> wrote: > >>On 2017-04-17 14:34, Chris Murphy wrote: [...] > >>>>>It's almost like we need these things to not fsync at all, and just > >>>>>rely on the filesystem commit time... > >>>> > >>>> > >>>>Essentially yes, but that causes all kinds of other problems. > >>> > >>> > >>>Drat. > >>> > >>Admittedly most of the problems are use-case specific (you can't afford to > >>lose transactions in a financial database for example, so it functionally > >>has to call fsync after each transaction), but most of it stems from the > >>fact that BTRFS is doing a lot of the same stuff that much of the 'problem' > >>software is doing itself internally. > >> > > > >Seems like the old way of doing things, and the staleness of the > >internet, have colluded to create a lot of nervousness and misuse of > >fsync. The very fact Btrfs needs a log tree to deal with fsync's in a > >semi-sane way... > Except that BTRFS is somewhat unusual. Prior to this, the only > 'mainstream' filesystem that provided most of these features was > ZFS, and that does a good enough job that this doesn't matter. > > For something like a database though, where you need ACID > guarantees, you pretty much have to have COW semantics internally, > and you have to force things to stable storage after each > transaction that actually modifies data. Looking at it another way, > most database storage formats are essentially record-oriented > filesystems (as opposed to block-oriented filesystems that most > people think of). This is part of why you see such similar access > patterns in databases and VM disk images (even if the VM isn't > running database software), they are essentially doing the same > things at a low level.
I remember thinking, when I was learning about the internals of btrfs, that it looked an awful lot like the high-level description of the internals of Oracle which I'd just been learning about. Most of the same pieces, doing mostly the same kinds operations to achieve the same effective results. Hugo. -- Hugo Mills | Don't worry, he's not drunk. He's like that all the hugo@... carfax.org.uk | time. http://carfax.org.uk/ | PGP: E2AB1DE4 | A.H. Deakin
signature.asc
Description: Digital signature