On 2/4/07, Vasiliy <vassun at gmail.com> wrote: > This is great idea to measure contents file impact using alternative root > patching with empty content file! As usual for great ideas, now it is hard to > understand how else it may be measured. Pretty easy and simple method anyone > can do.
Yep. Looks great, doesn't it? Like many apparently simple ideas this one falls down when faced with the harshness of reality :-( The important test is the following - does the difference between full-system and alternate root tests vary in the expected way as the size of the contents file changes? And it doesn't. On my first system the difference was almost exactly equal to the time taken to write the contents file twice. On my second system with faster disks and a contents file 1/3 the size, the difference increased rather than decreased. On a third system the whole system times were actually faster much of the time than the alternate root times. Oh well, back to the drawing board. > But anyway why do you came to this conclusion? It is clear from your tests > that the time contents file handling is most of the time of packaging. And I > assume in real life we always have cold pkgadd/pkgrm (all this differense is > mostly because of contents file access because it may not affect delivering > new files by pkgadd as I understand). So for one package in will be > > pkgadd (cold): 3.97s > pkgrm(cold): 17.28s > > versus > > pkgadd: 1.20s > pkgrm: 0.53s > > This shows real disc operations while files are not buffered - the real > impact of slow contents file. Well, no, because in fact we know what the cost of the contents file is in these cases - about 1.4s, which is about right for the warm pkgadd case. > For me, conclusion is that we may have more then 3 times faster pckgadd amd > more then 10 times faster pkgrm. Nope. What I've shown is that pretty much the best you can hope for is to increase pkgadd by a factor of 2, and on average the contents file is only about 20% of the cost of pkgadd. (It depends on the size of the package, of course. The simplest way to improve install times apart from using gzip instead of bzip is to eliminate all the tiny packages.) As for pkgrm, the cost of the contents file is the same, yet pkgrm is much slower. So the achievable performance increase is much smaller in percentage terms. What would be interesting is to understand why pkgrm is so incredibly slow. Another improvement would be to only write the contents file once. I don't see why we have to rewrite the entire file twice. > Without major changes in tools, without revolutionary changes in > packaging-patching, just drop contents file into pieces. The snag there is that you trade efficient sequential I/O for large numbers of random I/O. Some operations will be quicker, but many involve touching all the files, so you don't always win. Consider the pkgrm case. You must read all the contents files so as to find which pathnames are unique to this package. With 1000 packages that's going to add 10s or so to the already slow case of doing it from cold. And you might save 1s by reducing the writes. For pkgadd you also need to do the check for shared pathnames, which again would be significantly more expensive. > From convinience point of view it will be also big step forward. We may share > for example package content information between zones and we will not need to > update each zone contents file every time something installed in global zone > shared area. Now that's an interesting point. And that could be a win, as you would only have to pay the cost of the checks in the global zone (for shared components anyway). > So for 1 package in 10 zones enviroment it will not be 10zone X 2 X 12M. > > Patches is bit more complicated story because it depends on how many > deltapackages are inside patch and who big this delta packages if it is one > only then we save as many as we may of one package if it is 50 deltapackages > then we save 50 times. Wouldn't it be a big win to treat a patch as a single unit and end up just updating the system the once? That would be a much bigger win than any fiddling with the contents file. > In patchadd.ksh dependancy check was switched off, but most likely > patchrm.ksh continue to do this check. And this is most likely reason for > this. However patchrm is just adding backup-packages so should not be much > slower then patchadd. Shouldn't, but it is. Significantly slower. And some patches are just ridiculously slow. The java patch is one that comes to mind. 5 minutes to add it, and 10 minutes to back it out? The contents file handling is about 5s of that. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
