On Sat, 2006-04-08 at 00:01, Rich McAllister wrote: > Wow. Are you binary-searching the contents file? Given the absurdly > restricted way the "wild card" in pkgchk -p works (it just stops > comparing when it hits the asterisk) you could do that and still support > the "wild card". (Not that there seems to be any point in maintaining > this old undocumented behavior.)
I could keep the behaviour, but that would involve writing new code rather than simply using what's there already to good advantage. > However, this bumps up against some current efforts to speed up > pkgadd/pkgrm (and thus patchadd/patchrm which are built on top.) One of > the ideas being looked at is to stop maintaining > /var/sadm/install/contents as we know it. Breaking the contents file would require alternative query mechanisms - in other words, an enhanced pkgchk that is both more functional and considerably faster then the current implementation. In fact, the modifications I've made to enhance pkgchk (not just -l -p) seem to speed up pkgrm by about 10-20%. And there's obviously more to be had, as some of the code is terribly slow. > (Right now every pkgadd and > pkgrm has to read in all of /var/sadm/install/contents and write it all > out at the end.) But that isn't the limiting factor. Not currently anyway (by rights it should be, but there are other problems). I haven't worked out what the problem with pkgrm is - as it's much slower than pkgadd which doesn't seem right as it would be reasonable to think that unlink() would be faster than creating all the data. One of the big problems with pkgrm is dealing with the product registry, which is horribly inefficient. > One idea is to keep each package's contents entry in > that package's directory in /var/sadm/pkg. That way only a small file > needs to be written on pkgadd, and pkgrm is real cheap. (Files that are > owned by multiple packages need some special consideration, which there > is still discussion on.) Keeping pkgchk -l -[p|P] working in this new > regime isn't hard, but it will probably be a little slower than the > current scheme (have to open 1000+ little files instead of one big one.) > It does seem worth slowing down pkgchk -l a little to make > pkgadd/pkgrm a lot faster, since pkgchk -l is a occasional thing while > pkgadd/pkgrm are what makes install/upgrade/patching slow. Not really. Every pkgadd and pkgrm has to do the equivalent of pkgchk -l. In fact, all I've done is use the binary search code that I think pkgadd uses. If you make pkgchk -l -p slower than it currently is, then you're going to make pkgadd go slower, not faster. (Although I would think that it would be possible to make pkgchk go faster than it does at present with almost any back-end.) Do we understand exactly where all the time's going in the install/upgrade/patch cycle? I've been doing a number of tests recently (quite a lot, actually) and while I've firmly believed that maintaining the contents file was the big problem for some years now, all the testing I've done indicates otherwise. Not that it isn't a problem, but that it's not the major problem, and there are a lot of other things that can be done that will have much bigger payback. (At which point the contents file does become the limiting factor.) Fixing the contents file gives you 10-20%; I'm interested in the other 80%. I'll try and knock my notes into a more sensible shape. -- -Peter Tribble L.I.S., University of Hertfordshire - http://www.herts.ac.uk/ http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
