Re: [paludis-user] Rebuilding everything with cave --resume-file is awfully slow
On 01/01/2011 01:50 AM, Ciaran McCreesh wrote: On Fri, 31 Dec 2010 12:51:57 +0100 Rodolphe Rocca wrote: Off course every user would appreciate things to go a little bit faster if possible here :-) The next major release includes a workaround for some libstdc++ stupidity that tends to make it try to write() one byte at a time. That'll make a fair difference there... paludis-0.58 is much faster indeed :-) Thank you for this work ! ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] Rebuilding everything with cave --resume-file is awfully slow
On 12/31/2010 08:59 AM, Ciaran McCreesh wrote: On Wed, 29 Dec 2010 12:41:20 +0100 Rodolphe Rocca wrote: Each time an archive integrity is checked, the resume file is written. No, it's written after a successful fetch. Is everything written to the resume file really meaningful resume information ? Yes. The paludis client is many times faster here. The Paludis client doesn't store continue-on-failure information in there, which means it gets things wrong later on. Thank you for your answer Ciaran. Correctness is better than quick-and-dirtiness, I agree. Off course every user would appreciate things to go a little bit faster if possible here :-) Moreover If I CTRL-C, the resume file gets corrupted. Isn't there a signal handler catching this signal and waiting for the resume-file being completely written before exiting gracefully ? No. Is it a design choice or a question of priority ? NB: The reason why I want to do a CTRL-C here is that the installation is stuck at etqw-data waiting for my action to input a data dvd, but it fails to find it. So I want to modify the resume file by hand to remove etqw-data. Do you see any better way to handle this kind of situation ? You can't edit resume files. Any advice on how to handle this situation ? Because at this time I don't know how to rebuild my system with cave without uninstalling a few packages like etqw-data... -- Rodolphe Rocca ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
[paludis-user] Rebuilding everything with cave --resume-file is awfully slow
Hi, rebuilding my all system with the following command takes ages : $ cave resolve --resume-file /var/tmp/cave-resume-XX installed-packages -e -x --continue-on-failure always I have 1250 installed packages. Each time an archive integrity is checked, the resume file is written. [...] Checking 'acct-6.5.4.tar.gz'... ok Done fetch for sys-process/acct-6.5.4-r2:0::gentoo Writing resume information to /var/tmp/cave-resume-1... [...] My resume file is about 50 MB ; writing the resume file takes 30 seconds each time. (1250 * 30) / 3600 = 10 hours just to check file integrity ! Is it really needed to rewrite the resume file after each integrity check ? Is everything written to the resume file really meaningful resume information ? The paludis client is many times faster here. Moreover If I CTRL-C, the resume file gets corrupted. Isn't there a signal handler catching this signal and waiting for the resume-file being completely written before exiting gracefully ? NB: The reason why I want to do a CTRL-C here is that the installation is stuck at etqw-data waiting for my action to input a data dvd, but it fails to find it. So I want to modify the resume file by hand to remove etqw-data. Do you see any better way to handle this kind of situation ? -- Rodolphe Rocca ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] cave : libX11-1.4.0 issue
On 12/12/2010 03:10 PM, Ciaran McCreesh wrote: On Sun, 12 Dec 2010 11:43:33 +0100 Rodolphe Rocca wrote: Looking at the vlc ebuild I can see : opengl? ( virtual/opengl || (=x11-libs/libX11-1.3.99.901 ) ) Which looks conceptually good since libX11-1.4 lost its xcb use flag, but... I'm not an ebuild syntax expert. There is a similar line in the pulseaudio ebuild. Is it an ebuild bug or a cave one ? It's an ebuild bug. || ( a b ) means "prefer a, but b is fine too". Mmhh that's weird. Pulseaudio has been fixed in the tree. I've fixed vlc in a local repository. I've managed to reinstall both individually without worry : $ cave resolve pulseaudio vlc -x1 This though did not trigger libX11-1.4.0 bump. Checking the deps : $ paludis -q pulseaudio --show-deps | grep -A1 -B1 X11 || ( >=x11-libs/libX11-1.4.0 =x11-libs/libX11-1.4.0 =x11-libs/libX11-1.3.99.901 =x11-libs/libX11-1.3.99.901 Did not meet =x11-libs/libX11-1.4.0, never using existing, installing to / from target Did not meet >=x11-libs/libX11-1.3.99.901, use existing if possible, installing to / from media-libs/mesa * x11-libs/libX11-1.3.6:0::gentoo Did not meet =x11-libs/libX11-1.4.0, never using existing, installing to / from target Did not meet >=x11-libs/libX11-1.3.99.901, use existing if possible, installing to / from media-libs/mesa * x11-libs/libX11-1.4.0:0::gentoo Did not meet possible, installing to / from media-video/vlc * x11-libs/libX11-:0::x11 Masked by repository Repository masked /var/paludis/repositories/x11/profiles/package.mask Don't let people install these accidentally Masked by user Did not meet =x11-libs/libX11-1.4.0, never using existing, installing to / from target Did not meet possible, installing to / from media-video/vlc ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] cave : libX11-1.4.0 issue
On 12/12/2010 03:10 PM, Ciaran McCreesh wrote: On Sun, 12 Dec 2010 11:43:33 +0100 Rodolphe Rocca wrote: Looking at the vlc ebuild I can see : opengl? ( virtual/opengl || (=x11-libs/libX11-1.3.99.901 ) ) Which looks conceptually good since libX11-1.4 lost its xcb use flag, but... I'm not an ebuild syntax expert. There is a similar line in the pulseaudio ebuild. Is it an ebuild bug or a cave one ? It's an ebuild bug. || ( a b ) means "prefer a, but b is fine too". What would be the correct syntax then ? I tried : opengl? ( virtual/opengl || (=x11-libs/libX11-1.3.99.901 ) ) But it looks like a real syntax error, not a sematic mistake. Thank you for your help so that I can complete : http://bugs.gentoo.org/show_bug.cgi?id=348518 Oh, and feel free to add your word to this issue since you have much more ammo than I do on the subject. ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] cave : libX11-1.4.0 issue
On 12/12/2010 03:10 PM, Ciaran McCreesh wrote: On Sun, 12 Dec 2010 11:43:33 +0100 Rodolphe Rocca wrote: Looking at the vlc ebuild I can see : opengl? ( virtual/opengl || (=x11-libs/libX11-1.3.99.901 ) ) Which looks conceptually good since libX11-1.4 lost its xcb use flag, but... I'm not an ebuild syntax expert. There is a similar line in the pulseaudio ebuild. Is it an ebuild bug or a cave one ? It's an ebuild bug. || ( a b ) means "prefer a, but b is fine too". Before filling an issue in Gentoo's bugzilla, I'ld like to understand : If b is fine, why does cave reject it ? * x11-libs/libX11-1.4.0:0::gentoo Did not meet possible, installing to / from media-sound/pulseaudio Did not meet possible, installing to / from media-video/vlc ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
[paludis-user] cave : libX11-1.4.0 issue
Hi, trying to update my gentoo to libX11-1.4 / mesa-7.9, I'm facing this issue (paludis-0.56) : ! x11-libs/libX11 Reasons: target (installed-packages::installed), app-emulation/emul-linux-x86-xlibs, dev-dotnet/libgdiplus, 126 more Unsuitable candidates: * x11-libs/libX11-1.3.4:0::gentoo Did not meet >=x11-libs/libX11-1.3.99.901, use existing if possible, installing to / from media-libs/mesa * x11-libs/libX11-1.3.6:0::gentoo Did not meet >=x11-libs/libX11-1.3.99.901, use existing if possible, installing to / from media-libs/mesa * x11-libs/libX11-1.4.0:0::gentoo Masked by user Did not meet possible, installing to / from media-sound/pulseaudio Did not meet possible, installing to / from media-video/vlc * x11-libs/libX11-:0::x11 Masked by repository Repository masked /var/paludis/repositories/x11/profiles/package.mask Don't let people install these accidentally Masked by user Did not meet possible, installing to / from media-sound/pulseaudio Did not meet possible, installing to / from media-video/vlc Looking at the vlc ebuild I can see : opengl? ( virtual/opengl || ( >=x11-libs/libX11-1.3.99.901 ) ) Which looks conceptually good since libX11-1.4 lost its xcb use flag, but... I'm not an ebuild syntax expert. There is a similar line in the pulseaudio ebuild. Is it an ebuild bug or a cave one ? -- Rodolphe ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] Paludis, ext4, delayed allocation, zero file length
Richard Freeman wrote: Rodolphe Rocca wrote: Ciaran McCreesh wrote: On Sun, 04 Oct 2009 15:42:27 +0200 Rodolphe Rocca wrote: Maybe a forgotten fsync after writing the SLOT file ? No, it's unrelated. We don't fsync anything, at all. I'm not not a filesystem expert but isn't it the problem ? First - apologies for replying to an otherwise-closed thread. Glad to see that Ciaran isn't planning on making any changes. Since this didn't come up I figured I'd add it - many people consider the argument advanced in that blog you referenced bogus. That includes Linus Torvalds. I believe the most recent kernels include an ext4 patch that forces ordered writes by default as was the case with ext3 as a result. Applications generally shouldn't use fsync at all, except in very specific circumstances. If the application implements its own transactions (particularly across a network where one server might go down and another might not), then it is appropriate to use fsync to ensure transactions are synchronized. That comes with a significant penalty to filesystem performance. As an example of why fsync shouldn't be used, consider mythtv. By default it fsyncs video recordings every couple of seconds. On a loaded system with RAID that means that the disks are constantly seeking. On my backend that resulted in video loss due to buffer overruns. When I edited out the fsyncs from the source, the problem went away, as the underlying device drivers could pool data in a sane way and not rewrite the RAID5 stripes every time 1/10th of a stripe changed. Filesystems should safely store files and avoid zeroing out modified files any time they are written to normally. It shouldn't be up to application developers to figure out the implementation-level details of the filesystems they are running on. Sure, it is best to not have a system crash at all. However, if it is going to fail there are better ways of doing it than what the ext4 team came up with. That is why Linus overrode them in the kernel. Thanks for your answer Richard. I'm inclined to have the same opinion about a filesystem zeroing files upon a system crash. But this is an extreme and complex situation and there is still place for debate. Outside of my system crash context, it came to my mind today that the fact the VDB repository is not updated (almost) atomically could not be only an issue in the context of a system crash. Correct me if I'm wrong, but if _paludis_ crashes in the middle of a VDB update, /var/db/pkg will be left in an inconsistent state. Fortunately, being a paludis user from a long time, I happily admit that a paludis crash is very rare. Actually the few paludis crashes that I had in the past all came from a broken environment :-) But in the end, it just happens. Isn't it an issue ? ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] Paludis, ext4, delayed allocation, zero file length
Ciaran McCreesh wrote: > On Sun, 04 Oct 2009 21:53:11 +0200 > Rodolphe Rocca wrote: >>> or if you don't cleanly power off your computer after >>> unmounting a filesystem, things will break. >> but not this one. >> >> AFAIK when a fs is unmounted, a sync happens on the virtual device and >> the unmount operation is blocking until all data has been flushed to >> the disk (write cache). After what it's up to the hard drive firmware >> to flush the write cache if it is enabled. So normally when unmount >> returns, data is at least in the write cache of the drive. A power off >> during the internal write cache flushing process will trigger data >> loss. Not sure if the reboot will too. > > Unmounting a filesystem doesn't force the disk to really really write > its contents properly. In the general case, there's absolutely nothing > a computer can do to force data to be really really written, whatever > that even means. That's actually what I said... >> Concerning the merged files I agree that not much more can be done to >> make paludis more resilient to a system crash. >> >> But what about /var/cache/db ? > > Assuming you mean /var/db/pkg... Right sorry. > >> What I'm thinking about is letting paludis work as much as possible >> in a temporary vdb directory and rename this directory in the safest >> possible way once everything is done. > > No point. > >> Next time paludis runs, it could be able to detect inconsistencies and >> automatically fix them. Something like : > > A full VDB scan is slow, and might require permissions Paludis > doesn't currently have. That's really not something we want to do. Ok. The scan could be handled by an explicit option (--check-vdb). >> Would it be insane ? > > Yes, it would. > > First, there is absolutely nothing whatsoever that userland software > can do to deal with the user randomly powering off their computer. > > Second, this is a huge amount of effort to avoid something that is > entirely caused by a particularly unpleasant case of user error. Not a user error, a system or hardware error. Anyway I surrender, thanks a lot for your answers, I understand your point of view. ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] Paludis, ext4, delayed allocation, zero file length
Ciaran McCreesh wrote: > On Sun, 04 Oct 2009 18:25:14 +0200 > Rodolphe Rocca wrote: >> I understand that in case of a machine crashing, a package gets >> partially installed, as long as the package files content is either >> the old one or the new one. This can potentially cause trouble, but >> has a higher resilience than ending with some empty files in VDB >> causing paludis to be unusable until a manual intervention >> in /var/cache/db. > > No, it can also result in partially written new files being installed. > >> I tried to turn the auto_da_alloc ext4 mount option on as it is >> supposed to fix the "zero file length" issue for some file replacing >> patterns. A few minutes later I got a new crash and empty files >> again. So I guess the corruption is unrelated or paludis uses a >> rename pattern which is not detected by ext4. > > The problem is not a rename pattern, and it has nothing to do with > allocation. It's quite simple: if you don't cleanly unmount a > filesystem, OK I understand that... > or if you don't cleanly power off your computer after > unmounting a filesystem, things will break. but not this one. AFAIK when a fs is unmounted, a sync happens on the virtual device and the unmount operation is blocking until all data has been flushed to the disk (write cache). After what it's up to the hard drive firmware to flush the write cache if it is enabled. So normally when unmount returns, data is at least in the write cache of the drive. A power off during the internal write cache flushing process will trigger data loss. Not sure if the reboot will too. > Where Paludis does renames for merging, it does so to prevent a > partially written executable from existing. If anyone tried to launch > an executable when it was partially written, weird things would happen; > using a rename removes that case, although there is a small amount of > time between when the old executable is removed and the new one is > renamed into place. Handling unclean unmounts is not a consideration. Concerning the merged files I agree that not much more can be done to make paludis more resilient to a system crash. But what about /var/cache/db ? What I'm thinking about is letting paludis work as much as possible in a temporary vdb directory and rename this directory in the safest possible way once everything is done. For some "paludis -i pkg" command : 1. d1 = "/var/cache/db/cat/pkg" 2. d2 = "/var/cache/db/cat/.pkgtmp" 3. cp -R $d1/. $d2 4. compile, merge files etc. 5. update DB entries in $d2 6. fsync files in $d2 7. remove $d1 8. rename $d2 $d1 I even imagine a mechanism that would make recoverable a crash happening between 7 and 8. Something like : 7. move $d1 $d11(=/var/cache/db/cat/.pkg.original) 8. rename $d2 $d1 9. remove $d11 Next time paludis runs, it could be able to detect inconsistencies and automatically fix them. Something like : if $d11 exists: if ! $d1 exists: # crash between 7 and 8 mv $d11 $d1 else # crash between 8 and 9 remove $d11 fi I'm getting a bit tired so I probably miss some cases but you see the picture. Would it be insane ? >> Now I'm at the point where I disabled ext4 delayed allocation >> (nodelalloc mount option). Let's see what happens. > > Things will still break if you randomly power off your computer. The > only difference is that the breakage may display itself slightly > differently. You can still end up with partially written or empty > files; you may just not notice them as frequently. Agreed. ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] Paludis, ext4, delayed allocation, zero file length
Ciaran McCreesh wrote: > On Sun, 04 Oct 2009 17:25:48 +0200 > Rodolphe Rocca wrote: >>> No, it's unrelated. We don't fsync anything, at all. >> I'm not not a filesystem expert but isn't it the problem ? > > No. There's absolutely no need to call fsync in normal cases. The only > time you call fsync is if you need to get some kind of synchronisation > between two different processes. > > Your problem is that you killed your system in the middle of doing > something. That will cause breakage regardless of what an application > does. True. But there are different kinds of breakage. I understand that in case of a machine crashing, a package gets partially installed, as long as the package files content is either the old one or the new one. This can potentially cause trouble, but has a higher resilience than ending with some empty files in VDB causing paludis to be unusable until a manual intervention in /var/cache/db. Notice I'm not saying it's necessarily paludis' fault. Just opening a discussion to hear about your thoughts and share my experience with other potential users ;-) For the record, a few days ago my /var was under reiserfs. I had several machine crashes of the same kind while running paludis (and other programs), but never ended with such empty files, and it never really broke paludis in such an ugly way that a manual intervention was required other than reinstalling the package or removing a temporary directory. I tried to turn the auto_da_alloc ext4 mount option on as it is supposed to fix the "zero file length" issue for some file replacing patterns. A few minutes later I got a new crash and empty files again. So I guess the corruption is unrelated or paludis uses a rename pattern which is not detected by ext4. Now I'm at the point where I disabled ext4 delayed allocation (nodelalloc mount option). Let's see what happens. ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
Re: [paludis-user] Paludis, ext4, delayed allocation, zero file length
Ciaran McCreesh wrote: > On Sun, 04 Oct 2009 15:42:27 +0200 > Rodolphe Rocca wrote: >> After a reboot and a regular fsck, I found out that the >> /var/db/pkg///SLOT was empty (size 0) for a few >> packages that paludis was installing during the upgrade phase. >> >> This issue looks very much like the famous "ext4, delayed allocation >> and zero file length" issue. >> I've read many things about the subject, most of them written by the >> author of ext4, Theodore T'so. >> >> What I've read from Theodore and the fact that the corruption always >> happen on the same file (SLOT) makes me think that there may be an >> issue in paludis. >> >> Maybe a forgotten fsync after writing the SLOT file ? > > No, it's unrelated. We don't fsync anything, at all. I'm not not a filesystem expert but isn't it the problem ? Am I misinterpreting this blog discussion (especially comment #7) ? http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/ ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
[paludis-user] Paludis, ext4, delayed allocation, zero file length
Hi guys, I have a gentoo system with kernel gentoo-sources-2.6.31-r1. My /var partition is formatted as ext4 and mounted with default options (so delayed allocation is enabled). While upgrading some packages my system crashed. The crash in itself is not related to paludis or ext4. After a reboot and a regular fsck, I found out that the /var/db/pkg///SLOT was empty (size 0) for a few packages that paludis was installing during the upgrade phase. This issue looks very much like the famous "ext4, delayed allocation and zero file length" issue. I've read many things about the subject, most of them written by the author of ext4, Theodore T'so. What I've read from Theodore and the fact that the corruption always happen on the same file (SLOT) makes me think that there may be an issue in paludis. Maybe a forgotten fsync after writing the SLOT file ? Thanks ! Rodolphe Rocca ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user
[paludis-user] Forcing reinstall of all packages in a given repository
Hi, first and foremost, thank you for this great piece of software called paludis. I'm using it almost since it's been published and it just works :-) Now I have a question that may quickly turn into an improvement wish. I've just installed the enlightenment overlay which provides many cvs ebuilds. How could I do to force paludis to reinstall all the already installed packages that belong to the enlightenment overlay ? I would have thought of something like : paludis -i world --dl-reinstall always --repository enlightment but it seems the --repository option is only used when issuing a --list-* command. Rodolphe ___ paludis-user mailing list paludis-user@lists.pioto.org http://lists.pioto.org/mailman/listinfo/paludis-user