Am Sunday 06 March 2011 schrieb Thibaut VARENE:
> On Sat, Mar 5, 2011 at 4:22 PM, Martin Steigerwald 
<mar...@lichtvoll.de>wrote:
> > Since after loosing uptime records once again due to a crash while
> > testing kernels I am so through with it that it is not even funny
> > anymore, Theodore T'so said that one should not fear the fsync() [1]
> > - especially not with Ext4 - and I prefer not loosing uptime records
> > over and over and over again I build a test version using fsync() at
> > the location this bug
> 
> > report is about:
> As far as I understand it, upstream isn't willing to consider a fsync()
> patch which is /bad/ for all the previously mentioned reasons, which
> the post you quote only marginally addresses. Uptimed doesn't need
> atomicity nor durability: it keeps a *backup* of its previous
> database. Also, to put things in a little more perspective: uptimed
> doesn't only run on Linux. It runs on a variety of other platforms,
> where fsync() may have a greater costs than what Ted suggests. And
> finally, and probably more to the point:
> 
> FSYNC(2)                    BSD System Calls Manual
> FSYNC(2)
> 
> [...]
> 
>      Note that while fsync() will flush all data from the host to the
> drive (i.e. the "permanent storage device"), the drive itself may not
> physi- cally write the data to the platters for quite some time and it
> may be written in an out-of-order sequence.
> 
>      Specifically, if the drive loses power or the OS crashes, the
> application
>      may find that only some or none of their data was written.  The
> disk drive may also re-order the data so that later writes may be
> present, while earlier writes are not.

This explains that BSD does not have any notion of barriers or explicit 
cache flushes. Which is a pity. But to some extent also a OS problem. If 
the OS doesn't guarentee anything how could the application do it?

Honestly whats the point of an fsync() if it doesn't work? Then they could 
do a return 0 implementation as well.

> This explains that even fsync() cannot /certify/ that the data will hit
> the disk in the event of a crash (this is especially true with
> nowadays larger caches on disks).

Well that stuff is *immediately written* is not the point of it at all, as 
I understand it. But to my understanding at least Linux will guarantee 
that the fsync() has happened, *before* the file will be renamed. And thats 
the only guarantee thats important here. Thats what, upto Linux 2.6.36, 
barriers were made for, and thats also what explicit cache flushes since 
kernel 2.6.37 were made for.

Thats what is the whole point of it: A guaranteed order of writes down to 
the disk, including explicat cache flushes to the disk cache or FUA 
requests.

> I'm absolutely against such a patch which is the wrong solution to this
> problem either (and no, I'm not going to add a patch to tune for a
> specific filesystem - not everyone uses ext4 - especially not to work
> around system crashes, which, *again*, do not constitute a "normal use
> of the system").

Systems crash. They aren't perfect. Thats a reality. They do, likely 
desktop machines where people test this and that more often than servers, 
but also a server could face a power outage or what not. Either software 
is written with something like that in mind or software is broken. Thats 
at least my oppinion on the matter.

Rejecting a simple fix that potentially fixes a data loss issue like that 
doesn't contribute to solve the problem. Maybe the fix could be made with a 
conditionally define to that its only compiled in on Linux targets.

> As stated before, the correct solution would be to add another layer of
> checks during daemon startup, which would assert that the file it's
> reading is valid (i.e. to begin with "not empty" and "has parseable
> data"), and fall back to the backup copy otherwise. This, by design,
> is the correct approach and has /none/ of the drawbacks of your
> fsync() patch.
> 
> I would gladly review such a patch.

But thats at least an offer. And it would help BSD as well. Let's see what 
I will do when there will be the next rainy day during my holidays in 
Sevilla.

But until then I likely have a version that works and I will fork it that 
way as long as it serves the purpose of keeping my uptime records data 
safe in a easy and simple way. This is likely to fix the issue *now*, not 
somewhen in the future. And at least I have done something efficient about 
it in the time I had at my hand.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to