Re: [Rpm-maint] Problems in rpmstrPoolRehash function

Panu Matilainen Tue, 08 Oct 2013 02:15:36 -0700

On 09/11/2013 08:09 PM, Michael Schroeder wrote:


Hi rpm Maintainers,

Hi,

Sorry for the late reply. Trying to finally get back in business, movedto a new house and life's been "slightly" chaotic for the past weeks. Iwas expecting Florian to comment on this as the pool chunks and dummyentries were added by him :)


I think I've found two bugs in the rpmstrPoolRehash() function:

  1) IMHO there's an off-by-one in the for loop: pool->offs_size is
     the last used id, thus it should be "<=" instead of "<".

Yeah, I think you're right. rpmstrPoolRehash() is not much used inpractise but a bug is a bug...


  2) the function should to skip the "dummy" entries that are put
     at the end of each chunk.

A chunk looks like this:

foo\0bar\0...\0baz\0
^    ^    ^    ^    ^

The dummy entry is there to make rpmstrPoolStrlen() work.
Putting it in the hash is wrong. I've changed the code so that:


Ugh, yes the dummies dont belong in the hash...


- \0 is written to where the dummy entries point (not strictly
   needed as chunks are allocated with calloc, but nevertheless
   good style).
- the rpmstrPoolRehash() loop checks if the string is of size
   zero (true for dummy entries). If that's the case it checks
   if the next string does not start after the \0, if that's
   also true it is a dummy entry.


There are different (and easier) ways to fix this:

- you can always put an empty string into the pool, it would always
   have id 1. This simplifies the dummy entry check to:
       if (i != 1 && str[0] == 0)

I thought of that (I know that's what libsolv's strpool does) but theidea was to allow using the pool for unique string count and the like,including empty strings.

- you could get rid of the dummy entries and remove the
   rpmstrPoolStrlen() function. It's only used 5 times in the code
   and calling strlen() on the returned string does not cost much.

strlen() can get quite expensive when done by the masses, calculatingstring lenghts is one of the biggest time-consumers in headerImport(),except in the newish "fast" mode where offset calculations are used.Avoiding strlen() style string-walk makes quite a difference there andIIRC rpmstrPoolStrlen() does make a difference in eg fingerprinting, butdont have numbers at hand.

Anyway, I tend to agree on getting rid of the dummies - just specialcase rpmstrPoolLen() for the last entry, it'll still be constant timefor most of the uses (there aren't that many in the codebase but I'dexpect the number to grow as time goes on)

Hmm, I just see that the code in rpmfc.c also loops over the ids
including the dummy entries. Oh my. rpmfcApply() should at least
ignore "" entries.


The code just assumes strpool is not buggy :)

And it seems to modify the string returned
from the strpool. Oh my again. (But using strtol() to convert
a "const char *" to a "char *" is clever.)

Oh ugh, hadn't noticed that at all. Not the first case of accidental (Ihope!) conversion of const char * to char * in rpm either...AFAICS it's not harmful in practise in this particular case as themodifications only occur after all additions have been done and the poolthrown away but yeah it needs fixing.


        - Panu -


Cheers,
   Michael.



_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint


_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint

Re: [Rpm-maint] Problems in rpmstrPoolRehash function

Reply via email to