On 09/11/2013 08:09 PM, Michael Schroeder wrote:

Hi rpm Maintainers,

Hi,

Sorry for the late reply. Trying to finally get back in business, moved to a new house and life's been "slightly" chaotic for the past weeks. I was expecting Florian to comment on this as the pool chunks and dummy entries were added by him :)


I think I've found two bugs in the rpmstrPoolRehash() function:

  1) IMHO there's an off-by-one in the for loop: pool->offs_size is
     the last used id, thus it should be "<=" instead of "<".

Yeah, I think you're right. rpmstrPoolRehash() is not much used in practise but a bug is a bug...


  2) the function should to skip the "dummy" entries that are put
     at the end of each chunk.

A chunk looks like this:

foo\0bar\0...\0baz\0
^    ^    ^    ^    ^

The dummy entry is there to make rpmstrPoolStrlen() work.
Putting it in the hash is wrong. I've changed the code so that:

Ugh, yes the dummies dont belong in the hash...


- \0 is written to where the dummy entries point (not strictly
   needed as chunks are allocated with calloc, but nevertheless
   good style).
- the rpmstrPoolRehash() loop checks if the string is of size
   zero (true for dummy entries). If that's the case it checks
   if the next string does not start after the \0, if that's
   also true it is a dummy entry.


There are different (and easier) ways to fix this:

- you can always put an empty string into the pool, it would always
   have id 1. This simplifies the dummy entry check to:
       if (i != 1 && str[0] == 0)

I thought of that (I know that's what libsolv's strpool does) but the idea was to allow using the pool for unique string count and the like, including empty strings.

- you could get rid of the dummy entries and remove the
   rpmstrPoolStrlen() function. It's only used 5 times in the code
   and calling strlen() on the returned string does not cost much.

strlen() can get quite expensive when done by the masses, calculating string lenghts is one of the biggest time-consumers in headerImport(), except in the newish "fast" mode where offset calculations are used. Avoiding strlen() style string-walk makes quite a difference there and IIRC rpmstrPoolStrlen() does make a difference in eg fingerprinting, but dont have numbers at hand.

Anyway, I tend to agree on getting rid of the dummies - just special case rpmstrPoolLen() for the last entry, it'll still be constant time for most of the uses (there aren't that many in the codebase but I'd expect the number to grow as time goes on)

Hmm, I just see that the code in rpmfc.c also loops over the ids
including the dummy entries. Oh my. rpmfcApply() should at least
ignore "" entries.

The code just assumes strpool is not buggy :)

And it seems to modify the string returned
from the strpool. Oh my again. (But using strtol() to convert
a "const char *" to a "char *" is clever.)

Oh ugh, hadn't noticed that at all. Not the first case of accidental (I hope!) conversion of const char * to char * in rpm either... AFAICS it's not harmful in practise in this particular case as the modifications only occur after all additions have been done and the pool thrown away but yeah it needs fixing.

        - Panu -


Cheers,
   Michael.



_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint


_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint

Reply via email to