I'm starting a separate thread, because I don't want to confuse this with the old db removal stuff.
Vasiliy commented on the need to get rid of the contents file. As usual, it sounds eminently reasonable that getting rid of the contents file is going to radically improve performance, right? OK, so I did some measurements. They're at the end. First some commentary on Vasiliy's comments. > We have huge opportunity to bust performance if we get rid of contents file - > [...] > This is next bottleneck > need to be fixed to improve installation performance. It's inefficient, for sure. I'm hoping we'll get to the point where it is actually the next bottleneck, but I think we're still a long way from there. > I think we should keep this records locally for each package Well, actually, we do already. eg. /var/sadm/pkg/SYMhisl/save/pspool/SYMhisl/pkgmap although I'm not sure what uses it (I don't see it getting updated by patches). I think this is where editable files are stored so you can populate a zone cleanly. > contents file which is in nevada over 1M 12Meg seems about typical. My home test machine is about 28Meg, but I have a lot of junk installed. So a quick test. How long does a random pkgadd and pkgrm take. This is on my W2100z with a 28M contents file, so any effect that a contents file has is going to be more noticeable on this system than most. pkgadd (cold): 3.97s pkgadd(warm): 2.65s pkgrm(cold): 17.28s pkgrm(warm): 4.87s The cold results are first time round. OK, so I can install to an alternate root, so the contents file will be empty. This ought to be way quicker. pkgadd: 1.20s pkgrm: 0.53s All these are essentially warm, so that's the comparison. Now, two things are clear. The first is that, as expected, there is some improvement. The contents file is about half the time. (And the difference isn't far off the 1.4s time it's expected to take to do 2 28M writes to disk at the 40M/s it'll manage.) If we translate this to a more normal system then the typical effect of the contents file is about 20%. (And for a system install it's on average half that, so a 10% effect, which is basically what I worked out before.) The second is that pkgrm is much more sensitive. I need to work out why that is. One thing that pkgrm needs to do that pkgadd doesn't is to fully parse the contents file (it needs to parse every line to see if that pathname is in the given package, whereas pkgadd knows the list of filenames and can use a binary search to find them in the contents file because it's sorted). But that's only about 0.8s (I know that from how long pkginfo -l on the package takes). The next test involved a simple patchadd. This is of a patch with a single file. patchadd: 3.51s patchrm: 14.07s Ouch to that second one. OK, so the contents file effect is 40% here on the patchadd, and 10% on the patchrm case. And on a more typical system will be something like half that. (And again note that removal is expensive.) Conclusion: there's still a lot of work to do to improve the performance of the package and patch tools before we start looking at the contents file. A couple of other observations: 1. The contents file is quite compressible.I got about a factor 8 with gzip. Clearly there's quite a lot of redundant information in the file, so it should be feasible to come up with a way to use that redundancy to make the file much smaller. I'm not expecting a factor of 8, but a factor of 3 seems eminently reasonable. 2. Is the contents file in it's current format causing functionality problems, as opposed to just the performance issues? For example, I would like to see stronger checksums, and the ability to describe ACLs and extended attributes. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
