Data structures. For me it is obvious that best representation for patches to do most efficient dependency analysis is graph not a table. All relation should be direct link from one patch to another with backlinks. It is possible of course to implement it based on relational algebra but it will not be efficient - you have to do search every time you need follow the require relation or any other relation. I need to make this remark, because some people for some reason thought and even continue thinking that relational database is solution for all our install problems. It was 3 year long investigation not so far ago which conclude that relation database repository is the possible best way to keep and analyze packaging and patching data. It is possible to implement it this way, but it is slow and as we saw on practice, when this database "solution" was introduced in S10 development version on analyze for some time - extremely heavyweight in terms of disk space, memory and processor power, without real performance improvement.
Major requirement from zones is performance. Any installation over zones will take as many times as many zones we have and 1 min dependency check time acceptable for patching when you have no zones will turn in unacceptable 1 hour if you have 60 zones. So to keep installation time reasonable and match customer expectation (which is not realistic and usually they expect install work for 100 zones same amount of times as for system without zones, at least not 100 times longer). So major target was performance and so datastructure as well as everything is chosen to provide best performance possible. What attributes patch has. Patch_ID - which is patch identifier and consist of patch number and patch revision. Due to the lack of design for patches there is no defined syntax for Patch_ID and in general it may be any directory name. However, in real life it is six digit number and two digit revision. All references to patch in all system files are done using Patch_ID. So this is actually unique key - identifier for each patch. In Pdo.c it is represented as string all patch_id are located in array of strings and on top of this array there is hashtable which speed up access to the patch by Patch_ID very good. Much faster then index. This days when size of memory is ridiculous and no need to squeeze you data in graphic memory (because you memory is only 48K, but there is free 64K green bank of graphic memory...) I can use way bigger hashtable index then recommended by theory 30% - in pdo it is actually few times more then expected load. Hash function is specially created to be fast using addition and shifts and gives good dispersion. When I tuned hashtable it gives me something about 1 patch found from 3rd hit, 5 from second hits and all other few hundreds from first hits (of course this is statistical results). I guess I can compete indexed table in database with this. May be it is overkill, and more like assembler kind of programming, but it was fun for me (like back to late 80th) and performs very well. All the other attributes of patch are in general relations. Let take REQUIRE relation. In pkginfo it is Key REQUIRE followed by PatchID I can of course keep it as an string and any time I need to follow this relation do the search over PatchID array - it may take minimal number of comparisons etc. But it will be much better if this attribute will directly lead me to related patch - no search at all, even most efficient, just get it right away. If we have this direct link to required patch we may also need direct link back, to the patch which require this one. It is good and handy to have anyway, but it also need to be known if we removing patch (again following direct links work bit faster then request to relational database). All other relation are represented same way - not a PatchIDs but direct links to related patches with direct back-links. Those are: REQUIRE - REQUIRE_BY INCOMPATIBLE - INCOMPATIBLE_BY OBSOLETE - OBSOLETE_BY PREVIOUS - NEXT (revisions) PREVIOUS - NEXT pair is only single links - other links multiple by nature - in theory it may be several REQUIRE relation (plus with accumulation any of this require relation will be inherited). So each relation is an array of link. OBSOLETE_BY is single link, but I keep it as array just to be less confused and have all implementation symmetrical. In general, in real life there is no too much of this relation defined for patches. However our new proposal to get rid of accumulation and simplify patching to have patch per putback without revisions will require much greater use of require relation. Other important attribute of patch is type. Initial there are three types of patches SYSTEM - already installed and l found them in /var/sadm/pkg/*/pkginfo REQUESTED - patches requested to install and I found them in specified to patchadd location VIRTUAL - very important for analysis type of patches, they do not exist anywhere, but were refered by other patches I am generating this data structure while parsing pkginfo files from system and form patches - this one way parsing. When I read first patch record I create patch object for this record and for each related patch I create also Virtual patch records - they just marked Virtual. If I read patch record and find that this patch already exist as an virtual I mark it as real (SYSTEM or REQUESTED - depends what I am parsing currently). I will talk in detail about parsing later. To implement this I use unusual approach. All books will suggest in this case to have at least array of structures. And links between records have as pointers. I developed and get used to different technic for some time in my outside of the job, hobby programing for years (I did little organizer database for my PalmV and then Sony Clie and other things of this nature just for fun). Instead of having array of structures I have set of arrays. This way all links are not pointers but longs referring to array item with that index. In result this representation is extremely mobile and relocatable, also it is easy to understand debugging printouts. It is not tied to fixed structure and when I need I can introduce additional field just inside one procedure name space for example which I use with other field as part of the structure but then destroy - like depth field needed only for ordering during dependency check or circular dependency path. Best example of this mobility is mechanism of transactions in recursive patching - when I may commit or rollback results of recursive dependency check. Colors of the knots in tree during tree analysis I keep in array TYPES, first I memcpy this array at once and then run check if root patch is approved then I COMMIT this results by keeping marks as they are and destroy copy, if not I ROLLBACK - relink TYPES to copy and destroy current check results. vassun This message posted from opensolaris.org
