Re: [Rpm-maint] Rpm Database musings

2013-03-11 Thread Panu Matilainen

On 03/11/2013 02:14 PM, Michael Schroeder wrote:

On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote:

It has its advantages of course. Having headers spread in different files
would probably make some things easier but also slower, so you'd really
want to avoid having to go to the headers. I did a quick test-case in
python yesterday: reading through all the ~2160 headers in my rpmdb with
the current libdb implementation (with no signature checking) takes about
0.11s, loading them from separate files takes about 0.15s. Small numbers
but in percentages thats quite a lot.


Is that with dropped caches (echo 3 > /proc/sys/vm/drop_caches)?


Heh, no :) That was with hot caches. Which of course is not the typical 
situation unless you happen to hack package management software for a 
living...


With dropped caches it about 11.5s for the libdb implementation, circa 
15.5s for the separate files. So relative performance is the same, only 
now the numbers aren't that small anymore.





Anyway, attached is a little Packages database implementation I did yesterday
and today. The code is very careful not to destroy things if the database
is corrupt, i.e. it makes sure that it does not overwrite data.


Wow, that didn't take long. One might get the idea that you're even more
eager to get rid of BDB than I am :D Can't blame you for that...


Well, I did it because A) it was a fun little hack and B) it's good
to have something to verify our ideas.


Yup, its highly useful to have something concrete as a starting point. 
I've already refactored the rpmdb code a fair bit towards separating the 
backend implementation from the "rpmdb" level. Doing that has been on my 
TODO for ages and occasionally been nipping around the edges but with a 
more concrete target now, it might actually happen for real.



We could perhaps take some advantage of knowing the way how rpm does
transactions: erases always come after installs, so on upgrades there are
never free slots originating from the same transaction. So we could just do
lazy deletion: just flag the removed headers for erasure but dont actually
bother deleting and zeroing them, the next transaction that occurs will do
that. Should reduce the amount of data needing fdatasync() as well.


Yes, that could work. OTOH it makes crash recovery a bit harder.


Kinda related to the above: I dont see the header timestamp being actually
used for anything (but then I might've missed something).


I added the timestamp so that when there was a crash and we need to scan the
database and there are multiple good headers for the same pkgid, we know which
one to take.


Right, makes sense.

- Panu -
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] FSM hooks for rpm plugin

2013-03-11 Thread Reshetova, Elena
>Yup, rpm pretty much has to trust its plugins. OTOH... this made me think of 
>another related issue: it would actually be better to set the permissions etc 
>before moving the file to its final location. Currently we first move the 
>file and then start setting permissions, which means executables and all will 
>for a >short period of time have totally incorrect permissions, labels and 
>all. So if you happen to execute that binary during that period, who knows 
>what will happen: it could fail to execute at all, execute with wrong 
>capabilities / labels etc.

Yes, this would be the safest way of doing it. But it isn't that bad in the 
current scenario: if your security settings are proper (like labels of rpm 
itself and etc.), noone would be able to even access the tmp files before the 
proper labelling is in place. But agree: doing it right from beginning is even 
better and removes possibility of bad setup.

>Setting the permissions before moving would fix that and also avoid replacing 
>a previous file at all in case we fail to in one of the metadata steps. For 
>the stock metadata the actual path makes no difference, but for security 
>labels you'd want the final path though (to avoid having to figure out and 
>strip the >temp extension from the file), so it'd require passing two paths 
>to the pre-commit hook: current and final.

Maybe it is the fact that I had to wake up 3am today to come back to Helsinki, 
but I don't understand why do we need to know the final path for security 
labels labelling? I don't think file is labelled based on its destination: it 
is more like based on what is inside package, manifest,  device security 
policies and etc.


Best Regards,
Elena.


smime.p7s
Description: S/MIME cryptographic signature
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] Rpm Database musings

2013-03-11 Thread Michael Schroeder
On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote:
> It has its advantages of course. Having headers spread in different files 
> would probably make some things easier but also slower, so you'd really 
> want to avoid having to go to the headers. I did a quick test-case in 
> python yesterday: reading through all the ~2160 headers in my rpmdb with 
> the current libdb implementation (with no signature checking) takes about 
> 0.11s, loading them from separate files takes about 0.15s. Small numbers 
> but in percentages thats quite a lot.

Is that with dropped caches (echo 3 > /proc/sys/vm/drop_caches)?

>> Anyway, attached is a little Packages database implementation I did yesterday
>> and today. The code is very careful not to destroy things if the database
>> is corrupt, i.e. it makes sure that it does not overwrite data.
>
> Wow, that didn't take long. One might get the idea that you're even more 
> eager to get rid of BDB than I am :D Can't blame you for that...

Well, I did it because A) it was a fun little hack and B) it's good
to have something to verify our ideas.

> We could perhaps take some advantage of knowing the way how rpm does 
> transactions: erases always come after installs, so on upgrades there are 
> never free slots originating from the same transaction. So we could just do 
> lazy deletion: just flag the removed headers for erasure but dont actually 
> bother deleting and zeroing them, the next transaction that occurs will do 
> that. Should reduce the amount of data needing fdatasync() as well.

Yes, that could work. OTOH it makes crash recovery a bit harder.

> Kinda related to the above: I dont see the header timestamp being actually 
> used for anything (but then I might've missed something).

I added the timestamp so that when there was a crash and we need to scan the
database and there are multiple good headers for the same pkgid, we know which
one to take.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX Products GmbH,  GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] FSM hooks for rpm plugin

2013-03-11 Thread Reshetova, Elena
>A file is a hard-link if (S_ISREG(st->st_mode) && st->st_nlink > 1) is true. 
>When erasing, we get this info from filesystem so that remains accurate (the 
>last one would be seen as the "real" file). On installation the stat struct 
>of a file is made up by rpm, so we can pass whatever we want in there. 
>Currently >st_nlink for hardlinks equals the total number of links a file 
>will have, but we can easily change that to the number of *current* links so 
>that it better matches reality. Ie the first one will have st_nlink == 1 so 
>it will be seen as the real file, the rest will st_nlink++ each. See attached 
>patch for a quick >implementation of this.

Oh, I guess I just wanted to say that after file is installed there is no way 
to determine where is the initial file and where is the hard link, but indeed 
in rpm case for installation, it can be indicated (as your patch does) by rpm, 
so I guess it is all fine, then :)

Best Regards,
Elena.






smime.p7s
Description: S/MIME cryptographic signature
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint