Am Samstag, 19. März 2011 schrieb Thibaut VARENE: > On Sat, Mar 19, 2011 at 5:47 PM, Martin Steigerwald <mar...@lichtvoll.de> wrote: > > Am Sunday 06 March 2011 schrieb Thibaut VARENE: > >> Would you be kind enough to test it? I don't crash my systems as > >> often as you do, and they're setup in a way that apparently makes > >> it impossible for me to reproduce this bug. > > > > wanted to integrate it into the package well - by splitting up to > > different quilt patches, before adding this patch -, but then other > > things > > Doesn't make sense to me: your patch was a 1-liner. Anyway... > > > I am not likely to invest much work into this the next time as I will > > be holding trainings and do lots of other stuff as my holidays end > > on Monday. > > Good for you. I take it it's a negative answer to my previous inquiry? > > > You wrote you build it already. Do you have that package still > > available? Then I'd test it after I am convinced that the fsync() > > based version does what it should. > > Well no, I don't have the test build anymore. Since you were able to > test your own patch, I assumed you'd be capable of testing another > one. > > > I am still not convinced that adding those checks alone is the > > correct solution. The original problem is that the records file is > > truncated to > > [snipped blah] > > > That said I am still willing to test whether those checks will work > > as reliable as the fsync() did so far. > > Then please do so, and kindly report when you've done it. > > > I think a software should be written with the irregular case in mind > > and that this is a key factor that differentiates mediocre or quite > > good software from excellent software. Humans make errors, thus > > computers, their power sources, and programs constructed and > > developed by humans will fail, too. Software without that in mind is > > asking for trouble. > > Whatever. Even though you have a point, it would be silly to knock > down a fly with a hammer. When "fixes" for "irregular cases" get in > the way of "regular cases functioning", there is a problem. uptimed > must run on many platforms, just not on Linux/ext4. > > Bottomline: My patch affects uptimed at startup. Your patch affects > uptimed on /each and every write/.
While I still think, fsync() on Linux is a good thing I have to admit, that my patch does *not* work. Maybe I should have done fsync() in all places, but I am not convinced that this would have been worked. Maybe with current Ext4 the fsync() guarentee that I thought it gave is really borked, even on Linux. The console snippet below - partly stripped to 70 characters - also clearly shows that a patch that tells uptimed to never overwrite its backup with an empty file like the one proposed *is* necessary. So my approach failed. But I wonder whether your approach would have done more good than my daily backups, since uptimed doesn't do a regular backup of the configuration, but only on stopping it, maybe also on starting it. Thus I would easily have lost more than about one day of my uptime statistics. And it just doesn't go into my mind that it isn't possible to write a few KiB file in such a safe manner so that it doesn't get truncated. This is just insane. Now trying to fixup the boot last boot record manually. shambhala:~> uprecords # Uptime | System ----------------------------+------------------------------------------- -> 1 0 days, 00:12:04 | Linux 2.6.38.5-tp42-snap Wed May 11 20:20 ----------------------------+------------------------------------------- NewRec 0 days, 00:12:03 | since Wed May 11 20:20 up 0 days, 00:12:04 | since Wed May 11 20:20 down 0 days, 00:00:00 | since Wed May 11 20:20 %up 100.000 | since Wed May 11 20:20 shambhala:~> cd /var/spool/uptimed shambhala:/var/spool/uptimed> ls -lh insgesamt 16K -rw-r--r-- 1 daemon daemon 11 11. Mai 20:20 bootid -rw-r--r-- 1 daemon daemon 62 11. Mai 20:31 records -rw-rw-rw- 1 daemon daemon 757 4. Mär 21:10 records-2011-03-04 -rw-r--r-- 1 daemon daemon 62 11. Mai 20:26 records.old shambhala:/var/spool/uptimed> /etc/init.d/uptimed stop Stopping uptime daemon: uptimed. shambhala:/var/spool/uptimed> cp -p /home/martin/Backup/uptimed/records-2011-05-10 . shambhala:/var/spool/uptimed> ls -l insgesamt 20 -rw-r--r-- 1 daemon daemon 11 11. Mai 20:20 bootid -rw-r--r-- 1 daemon daemon 62 11. Mai 20:32 records -rw-rw-rw- 1 daemon daemon 757 4. Mär 21:10 records-2011-03-04 -rw-rw-rw- 1 martin martin 3015 10. Mai 22:10 records-2011-05-10 -rw-r--r-- 1 daemon daemon 62 11. Mai 20:31 records.old shambhala:/var/spool/uptimed> cp -p cp -p records.old records-2011-05-11 cp: angegebenes Ziel „records-2011-05-11“ ist kein Verzeichnis shambhala:/var/spool/uptimed#1> cp -p records.old records-2011-05-11 shambhala:/var/spool/uptimed> diff -u records records.old --- records 2011-05-11 20:32:43.995898664 +0200 +++ records.old 2011-05-11 20:31:13.308372746 +0200 @@ -1 +1 @@ -751:1305138013:Linux 2.6.38.5-tp42-snap-debug+resv-size-dirty +661:1305138013:Linux 2.6.38.5-tp42-snap-debug+resv-size-dirty shambhala:/var/spool/uptimed#1> cp records-2011-05-10 records shambhala:/var/spool/uptimed> /etc/init.d/uptimed start Starting uptime daemon: uptimed. shambhala:/var/spool/uptimed> uprecords # Uptime | System ----------------------------+------------------------------------------- 1 18 days, 11:00:44 | Linux 2.6.37-tp42-rtime- Thu Jan 13 12:44 2 13 days, 20:58:39 | Linux 2.6.37-rc8-tp42 Thu Dec 30 15:44 3 12 days, 00:05:29 | Linux 2.6.37-tp42-rtime- Mon Jan 31 23:50 4 9 days, 17:09:09 | Linux 2.6.37-tp42-rtime- Sat Feb 12 23:57 5 8 days, 20:53:21 | Linux 2.6.38.3-tp42-snap Mon Apr 18 21:51 6 8 days, 15:40:00 | Linux 2.6.37-tp42-rtime- Tue Feb 22 21:18 7 7 days, 20:04:48 | Linux 2.6.38-tp42-snapsh Thu Mar 17 23:47 8 7 days, 08:19:50 | Linux 2.6.37-rc7-tp42-at Wed Dec 22 13:02 9 6 days, 13:27:02 | Linux 2.6.38-rc7-tp42-sn Tue Mar 8 10:23 10 5 days, 23:32:20 | Linux 2.6.38.2-tp42-snap Tue Mar 29 22:18 ----------------------------+------------------------------------------- -> 44 0 days, 00:15:24 | Linux 2.6.38.5-tp42-snap Wed May 11 20:20 ----------------------------+------------------------------------------- 1up in 0 days, 00:00:55 | at Wed May 11 20:36 t10 in 5 days, 23:16:57 | at Tue May 17 19:52 no1 in 18 days, 10:45:21 | at Mon May 30 07:20 up 149 days, 08:08:47 | since Sat Dec 11 13:27 down 1 day , 21:59:49 | since Sat Dec 11 13:27 %up 98.733 | since Sat Dec 11 13:27 (well easiest would be to drop uptimed from my notebooks and be done with it, I might consider that) -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
signature.asc
Description: This is a digitally signed message part.