On 25Dec2020 09:29, Steven D'Aprano <[email protected]> wrote:
>On Thu, Dec 24, 2020 at 12:15:08PM -0500, Michael A. Smith wrote:
>
>> With all the buffering that modern disks and filesystems do, a
>> specific question has come up a few times with respect to whether or
>> not data was actually written after flush. I think it would be pretty
>> useful for the standard library to have a variant in the io module
>> that would explicitly fsync on close.
>
>One argument against this idea is that "disks and file systems buffer
>for a reason, you should trust them, explicitly calling sync after every
>written file is just going to slow I/O down".
>
>Personally I don't believe this argument, I've been bitten many, many
>times until I learned to explicitly sync files, but its an argument you
>should counter.
By contrast, I support this argument. The _vast_ majority of things
don't need to sync their data all the way to the hardware base substrate
(eg magnetic fields on spinning rust).
And on the whole, if I do care, I issue a single sync() call at the end
of a large task (typically interactively, at a prompt!) rather than
forcing a heap of performance impairing stutters all the way through
some process because many per-file syncs force that.
IMO, per-file syncs fall into the "policy" arena: aside from low level
tools (example: fdisk, a disc partition editor), to my mind the purpose
of the kernel is to accept responsibility for my data when I hand it
off.
Perhaps for you that isn't enough; for me it normally is. And when it
isn't, I'll take steps myself, _outside_ the programme, to ensure the
sync or commit or off site backup is complete when it matters. Thus the
policy is in my hands.
The tool which causes a per-file sync all on every close, or even after
every write, is a performance killer. The faster our hardware, the less
that may seem to matter (and, conversely, the less the risk as the
ordinary kernel I/O flushing will catch up faster). But when the
hardware slowness _is_ relevant, if I can't turn that off I have a
needlessly unperformant task.
The example which stands out in my own mind is when I was using firefox
on a laptop with a spinning rust hard drive (and being a laptop
hardware, a low power physically slow piece of spinning rust). There was
once a setting to turn off the synchronous-write sqlite setting (used
for history and bookmarks). That was _visibly obvious_ in the user
experience. And I turned it off. As a matter of policy, those data
didn't need such care.
So I'm resistant to this kind of thing because IMO it leads to an
attractive nuisance: over use of sync or fsync for everything. And it
will usually not be exposed as policy the user can adjust/disable.
My rule of thumb:
If it can't be turned off, it's not a feature. - Karl Heuer
>Another argument is that even syncing your data doesn't mean that the
>data is actually written to disk, since the hardware can lie. On the
>other hand, I don't know what anyone can do, not even the kernel, in the
>face of deceitful hardware.
Aye.
But in principle, after a sync() or fsync() the kernel at least believes
that. Hardware which lies, or which claims saved data without having the
rresources to guarrentee it (eg a small battery to complete the writes
if there's a power out) is indeed nasty.
>> You might be tempted to argue that this can be done very easily in
>> Python already, so why include it in the standard io module?
I would indeed. There _should_ be a small bar which at least causes the
programmer to think "do I really need this here"? I suppose a
"fsync=False" default parameter is a visible bar.
[...]
>I mean, the obvious way is:
>
> try:
> with open(..., 'w') as f:
> f.write("stuff")
> finally:
> os.sync()
An os.fsync(f.fileno()) is lower impact - os.sync() requests a sync of
all filesystems.
>so maybe all we really need is a "sync file" context manager.
Aye. Fully agree here, and frankly think this is a "write your own"
situation. Except, of course, that like all "write your own" one/few
liners there will be suboptimal or buggy ones released. Such as the
"overly wide sync" from your os.sync() above.
Personally I'm -1 on this. A context manager while goes f.flush()
os.fsync(f.fileno()) seems plenty, and easy to roll your own.
Cheers,
Cameron Simpson <[email protected]>
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/QSDOU4NA2YIZSOKM6OJKCBSEVMMXMRVZ/
Code of Conduct: http://python.org/psf/codeofconduct/