records cause file losses

Thibaut VARENE Sat, 24 May 2008 05:35:32 -0700

tags 482643 moreinfo unreproducible
thanks

On Sat, May 24, 2008 at 9:57 AM, Raphael Manfredi
<[EMAIL PROTECTED]> wrote:


> Apparently, uptimed is not updating its /var/spool/uptimed/records file
> in a safe way: after a system crash, I have seen on many occasions fsck
> clearing the corresponding inode and loosing all the uptime records.
>
> The update procedure should be something like:
>
>        rename /var/spool/uptimed/records as /var/spool/uptimed/records.old
>        write new data to /var/spool/uptimed/records.new
>        fsync()
>        rename /var/spool/uptimed/records.new as /var/spool/uptimed/records
>        unlink /var/spool/uptimed/records.old

It's pretty much what is done, save for the fsync():

void save_records(int max, time_t log_threshold) {
        FILE *f;
        Urec *u;
        int i = 0;

        f = fopen(FILE_RECORDS".tmp", "w");
        if (!f) {
                printf("uptimed: cannot write to %s\n", FILE_RECORDS);
                return;
        }

        for(u=urec_list; u; u = u->next) {
                /* Ignore everything below the threshold */
                if (u->utime >= log_threshold) {
                        fprintf(f, "%lu:%lu:%s\n", (unsigned
long)u->utime, (unsigned long)u->btime, u->sys);
                        /* Stop processing when we've logged the max
number specified. */
                        if ((max > 0) && (++i >= max)) break;
                }
        }
        fclose(f);
        rename(FILE_RECORDS".tmp", FILE_RECORDS);
}


In my opinion, you don't want a process running fsync() every 60s,
anyway. At least certainly not a subcritical process such as uptimed.

> And upon startup, if a /var/spool/uptimed/records.old file is present
> it should be used as the main database because it means the above procedure
> was somehow interrupted by a crash.

Keeping the old records db as a fallback is an idea worth
investigating, indeed. Cc'ing upstream maintainer.

> To minimise disruption, I've increased the frequency of database savings
> to 600 seconds on my systems.  Note that my machines do not crash frequently
> but when they do, it is usually because one of the IDE disks loses an 
> interrupt
> and linux then hangs, requiring a reboot (the software watchdog is of no help
> here, hangup seems to be at the kernel level).

Yeah I've been planning to increase the default frequency in a future
upload. 60s is insane.

> I'm flagging this bug as "important" because it is somehow defeating the
> purpose of having a tool record uptimes if a sudden crash causes years of
> history to get trashed. (Data is not valuable enough to go through a tape
> restore).

Agreed. What filesystem are you using on /var?

Thanks

T-Bone


-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#482643: "unsafe" update of /var/spool/uptimed/records cause file losses

Reply via email to