On 2017年11月13日 09:17, Hans van Kranenburg wrote: > On 11/13/2017 01:41 AM, Qu Wenruo wrote: >> >> On 2017年11月13日 06:01, Hans van Kranenburg wrote: >>> On 11/12/2017 09:58 PM, Robert White wrote: >>>> Is the commit interval monotonic, or is it seconds after sync? >>>> >>>> What I mean is that if I manually call sync(2) does the commit timer >>>> reset? I'm thinking it does not, but I can imagine a workload where it >>>> ideally would. >>> >>> The magic happens inside the transaction kernel thread: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/disk-io.c?h=v4.14#n1925 >>> >>> You can see the delay being computed: >>> delay = HZ * fs_info->commit_interval; >>> >>> Almost at the end of the function, you see: >>> schedule_timeout(delay) >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?h=v4.14#n1676 >>> >>> This schedule_timeout function sets a timer and then the thread goes to >>> sleep. If nothing happens, the kernel will wake up the thread after the >>> timer expires (can be later, but not earlier) and then it will redo the >>> loop. >>> >>> If something else wakes up the transaction thread, the timer is >>> discarded if it's not expired yet. >> >> So far so good. >> >>> >>> So it works like you would want. >> >> Not exactly. > > Ah, interesting. > >> Sync or commit_transaction won't wake up transaction_kthread. >> >> transaction_kthread will mostly be woken by trans error, remount or >> under certain case of btrfs_end_transaction. >> >> So manually sync will not (at least not always) interrupt commit interval. > > The fun thing is, when I just do sync, I see that the time it takes for > a next generation bump to happen is reset (while doing something simple > like touch x in a loop in another terminal).
Maybe something else is related. You could dig it a little further by tracking which caller committed the transaction, and I can totally be wrong about this. > >> And even more, transaction_kthread will only commit transaction, which >> means it will only ensure metadata consistent. >> >> It won't ensure buffered write to reach disk if its extent is not >> allocated yet (delalloc). > > Hm, I have seen things like that in BTRFS_IOC_SYNC... > > Actually, I first responded on the timer reset question, because that > one was easy to answer. I don't know if I want to descend the path > further into (f)sync. I heard it can get really messy down there. :] Yep, very messy. Not messy within btrfs itself, but also related to kernel memory management. And welcome to the hell of filesystem development. Thanks, Qu > >> >> Thanks, >> Qu >>> >>> You can test this yourself by looking at the "generation" number of your >>> filesystem. It's in the output of btrfs inspect dump-super: >>> >>> This is the little test filesystem I just used: >>> >>> -# btrfs inspect dump-super /dev/dorothy/mekker | grep ^generation >>> generation 35 >>> >>> If you print the number in a loop, like every second, you can see it >>> going up after a transaction happened. Now play around with other things >>> and see when it changes. >>> >>>> (Again, this is purely theoretical, I have no such workload as I am >>>> about to describe.) >>>> >>>> So suppose I have some sort of system, like a database, that I know will >>>> do scattered writes and extends through some files and then call some >>>> variant of sync(2). And I know that those sync() calls will be every >>>> forty-to-sixty seconds because of reasons. It would be "neat" to be able >>>> to set the commit=n to some high value, like 90, and then "normally" the >>>> sync() behaviours would follow the application instead of the larger >>>> commit interval. >>>> >>>> The value would be that the file system would tend _not_ to go into sync >>>> while the application was still skittering about in the various files. >>>> >>>> Of course any other applications could call sync from their own contexts >>>> for their own reasons. And there's an implicit fsync() on just about any >>>> close() (at least if everything is doing its business "correctly") >>>> >>>> It may be a strange idea but I can think of some near realtime >>>> applications might be able to leverage a modicum of control over the >>>> sync event. There is no API, and not strong reason to desire one, for >>>> controlling the commit via (low privelege) applications. >>>> >>>> But if the plumbing exists, then having a mode where sync() or fsync() >>>> (which I think causes a general sync because of the journal) resets the >>>> commit timer could be really interesting. >>>> >>>> With any kind of delayed block choice/mapping it could actually reduce >>>> the entropy of the individual files for repeated random small writes. >>>> The application would have to be reasonably aware, of course. >>>> >>>> Since something is causing a sync() the commit=N guarantee is still >>>> being met for the whole system for any N, but applications could tend to >>>> avoid mid-write commits by planing their sync()s. >>>> >>>> Just a thought. >>> >>> > >
signature.asc
Description: OpenPGP digital signature