Re: [Rpm-maint] Rpm Database musings
On 03/11/2013 02:14 PM, Michael Schroeder wrote: On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote: It has its advantages of course. Having headers spread in different files would probably make some things easier but also slower, so you'd really want to avoid having to go to the headers. I did a quick test-case in python yesterday: reading through all the ~2160 headers in my rpmdb with the current libdb implementation (with no signature checking) takes about 0.11s, loading them from separate files takes about 0.15s. Small numbers but in percentages thats quite a lot. Is that with dropped caches (echo 3 > /proc/sys/vm/drop_caches)? Heh, no :) That was with hot caches. Which of course is not the typical situation unless you happen to hack package management software for a living... With dropped caches it about 11.5s for the libdb implementation, circa 15.5s for the separate files. So relative performance is the same, only now the numbers aren't that small anymore. Anyway, attached is a little Packages database implementation I did yesterday and today. The code is very careful not to destroy things if the database is corrupt, i.e. it makes sure that it does not overwrite data. Wow, that didn't take long. One might get the idea that you're even more eager to get rid of BDB than I am :D Can't blame you for that... Well, I did it because A) it was a fun little hack and B) it's good to have something to verify our ideas. Yup, its highly useful to have something concrete as a starting point. I've already refactored the rpmdb code a fair bit towards separating the backend implementation from the "rpmdb" level. Doing that has been on my TODO for ages and occasionally been nipping around the edges but with a more concrete target now, it might actually happen for real. We could perhaps take some advantage of knowing the way how rpm does transactions: erases always come after installs, so on upgrades there are never free slots originating from the same transaction. So we could just do lazy deletion: just flag the removed headers for erasure but dont actually bother deleting and zeroing them, the next transaction that occurs will do that. Should reduce the amount of data needing fdatasync() as well. Yes, that could work. OTOH it makes crash recovery a bit harder. Kinda related to the above: I dont see the header timestamp being actually used for anything (but then I might've missed something). I added the timestamp so that when there was a crash and we need to scan the database and there are multiple good headers for the same pkgid, we know which one to take. Right, makes sense. - Panu - ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] FSM hooks for rpm plugin
>Yup, rpm pretty much has to trust its plugins. OTOH... this made me think of >another related issue: it would actually be better to set the permissions etc >before moving the file to its final location. Currently we first move the >file and then start setting permissions, which means executables and all will >for a >short period of time have totally incorrect permissions, labels and >all. So if you happen to execute that binary during that period, who knows >what will happen: it could fail to execute at all, execute with wrong >capabilities / labels etc. Yes, this would be the safest way of doing it. But it isn't that bad in the current scenario: if your security settings are proper (like labels of rpm itself and etc.), noone would be able to even access the tmp files before the proper labelling is in place. But agree: doing it right from beginning is even better and removes possibility of bad setup. >Setting the permissions before moving would fix that and also avoid replacing >a previous file at all in case we fail to in one of the metadata steps. For >the stock metadata the actual path makes no difference, but for security >labels you'd want the final path though (to avoid having to figure out and >strip the >temp extension from the file), so it'd require passing two paths >to the pre-commit hook: current and final. Maybe it is the fact that I had to wake up 3am today to come back to Helsinki, but I don't understand why do we need to know the final path for security labels labelling? I don't think file is labelled based on its destination: it is more like based on what is inside package, manifest, device security policies and etc. Best Regards, Elena. smime.p7s Description: S/MIME cryptographic signature ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] Rpm Database musings
On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote: > It has its advantages of course. Having headers spread in different files > would probably make some things easier but also slower, so you'd really > want to avoid having to go to the headers. I did a quick test-case in > python yesterday: reading through all the ~2160 headers in my rpmdb with > the current libdb implementation (with no signature checking) takes about > 0.11s, loading them from separate files takes about 0.15s. Small numbers > but in percentages thats quite a lot. Is that with dropped caches (echo 3 > /proc/sys/vm/drop_caches)? >> Anyway, attached is a little Packages database implementation I did yesterday >> and today. The code is very careful not to destroy things if the database >> is corrupt, i.e. it makes sure that it does not overwrite data. > > Wow, that didn't take long. One might get the idea that you're even more > eager to get rid of BDB than I am :D Can't blame you for that... Well, I did it because A) it was a fun little hack and B) it's good to have something to verify our ideas. > We could perhaps take some advantage of knowing the way how rpm does > transactions: erases always come after installs, so on upgrades there are > never free slots originating from the same transaction. So we could just do > lazy deletion: just flag the removed headers for erasure but dont actually > bother deleting and zeroing them, the next transaction that occurs will do > that. Should reduce the amount of data needing fdatasync() as well. Yes, that could work. OTOH it makes crash recovery a bit harder. > Kinda related to the above: I dont see the header timestamp being actually > used for anything (but then I might've missed something). I added the timestamp so that when there was a crash and we need to scan the database and there are multiple good headers for the same pkgid, we know which one to take. Cheers, Michael. -- Michael Schroeder m...@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] FSM hooks for rpm plugin
>A file is a hard-link if (S_ISREG(st->st_mode) && st->st_nlink > 1) is true. >When erasing, we get this info from filesystem so that remains accurate (the >last one would be seen as the "real" file). On installation the stat struct >of a file is made up by rpm, so we can pass whatever we want in there. >Currently >st_nlink for hardlinks equals the total number of links a file >will have, but we can easily change that to the number of *current* links so >that it better matches reality. Ie the first one will have st_nlink == 1 so >it will be seen as the real file, the rest will st_nlink++ each. See attached >patch for a quick >implementation of this. Oh, I guess I just wanted to say that after file is installed there is no way to determine where is the initial file and where is the hard link, but indeed in rpm case for installation, it can be indicated (as your patch does) by rpm, so I guess it is all fine, then :) Best Regards, Elena. smime.p7s Description: S/MIME cryptographic signature ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint