On 09/11/2013 08:09 PM, Michael Schroeder wrote:
Hi rpm Maintainers,
Hi,
Sorry for the late reply. Trying to finally get back in business, moved
to a new house and life's been "slightly" chaotic for the past weeks. I
was expecting Florian to comment on this as the pool chunks and dummy
entries were added by him :)
I think I've found two bugs in the rpmstrPoolRehash() function:
1) IMHO there's an off-by-one in the for loop: pool->offs_size is
the last used id, thus it should be "<=" instead of "<".
Yeah, I think you're right. rpmstrPoolRehash() is not much used in
practise but a bug is a bug...
2) the function should to skip the "dummy" entries that are put
at the end of each chunk.
A chunk looks like this:
foo\0bar\0...\0baz\0
^ ^ ^ ^ ^
The dummy entry is there to make rpmstrPoolStrlen() work.
Putting it in the hash is wrong. I've changed the code so that:
Ugh, yes the dummies dont belong in the hash...
- \0 is written to where the dummy entries point (not strictly
needed as chunks are allocated with calloc, but nevertheless
good style).
- the rpmstrPoolRehash() loop checks if the string is of size
zero (true for dummy entries). If that's the case it checks
if the next string does not start after the \0, if that's
also true it is a dummy entry.
There are different (and easier) ways to fix this:
- you can always put an empty string into the pool, it would always
have id 1. This simplifies the dummy entry check to:
if (i != 1 && str[0] == 0)
I thought of that (I know that's what libsolv's strpool does) but the
idea was to allow using the pool for unique string count and the like,
including empty strings.
- you could get rid of the dummy entries and remove the
rpmstrPoolStrlen() function. It's only used 5 times in the code
and calling strlen() on the returned string does not cost much.
strlen() can get quite expensive when done by the masses, calculating
string lenghts is one of the biggest time-consumers in headerImport(),
except in the newish "fast" mode where offset calculations are used.
Avoiding strlen() style string-walk makes quite a difference there and
IIRC rpmstrPoolStrlen() does make a difference in eg fingerprinting, but
dont have numbers at hand.
Anyway, I tend to agree on getting rid of the dummies - just special
case rpmstrPoolLen() for the last entry, it'll still be constant time
for most of the uses (there aren't that many in the codebase but I'd
expect the number to grow as time goes on)
Hmm, I just see that the code in rpmfc.c also loops over the ids
including the dummy entries. Oh my. rpmfcApply() should at least
ignore "" entries.
The code just assumes strpool is not buggy :)
And it seems to modify the string returned
from the strpool. Oh my again. (But using strtol() to convert
a "const char *" to a "char *" is clever.)
Oh ugh, hadn't noticed that at all. Not the first case of accidental (I
hope!) conversion of const char * to char * in rpm either...
AFAICS it's not harmful in practise in this particular case as the
modifications only occur after all additions have been done and the pool
thrown away but yeah it needs fixing.
- Panu -
Cheers,
Michael.
_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint
_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint