On Mar 1, 2011, at 11:04 AM, Paul Eggleton wrote: > Hi there, > > In Poky we're currently seeing a crash of "zypper search" in conjunction with > rpm 5.4.0 [1]. Using valgrind I tracked the issue down to rpmio/mire.c line > 361: > > mire->preg = xcalloc(1, sizeof(*mire->preg)); > > If I hack this line to specify 64 as the size (the expected sizeof(regex_t) > for x86_64, as opposed to 24 reported by valgrind) then the crash disappears > and valgrind stops reporting invalid memory accesses. > OK.
(aside for context) The reference counting on the mire object is quite tricky. And all the memory management in RPM is quite tricky, comes with "reference counting" most allocations, can't be helped. But specifically painful wrto the mire object is that sometimes a mire object is a scalar, and sometimes its an array. The hard design problem is this: Does the reference count apply to the array or the element in the array? and (what is not yet implemented cleanly/correctly @rpm5.org) is that I've wired up the same reference count (through a pointer) so that the array and all of its elements "share" the same reference count. That "works" iff there are programming contraints that MUST be obeyed, likely not the case with zypper. BTW: could you dig a bit and find out how/why zypper is actually segfaulting? A simple gdb backtrace will suffice, don't bother digging deeper, I can find all the info necessary if I have the gdb backtrace (or equivalentl: the valgrind spewage) that you are fixing. > I don't have much knowledge of the rpm codebase, but a bit of header grepping > shows me that libpcre's pcreposix.h has a regex_t which differs quite > considerably from regex_t in regex.h (and matches the smaller size reported > by > valgrind), and therefore I strongly suspect that the culprit is that pcre's > regex_t is being used when allocating the struct in mire.c which is then > passed to regcomp. FWIW we are enabling pcre support at configure time. > (another aside for context) RPM had to commit to a *RE dialect => PCRE. And PCRE support is MANDATORY @rpm5.org. But I fully expect "Have it your own way!" to eventually 2nd guess the choice of MANDATORY. SO the code remains using the POSIX *RE PI and the MNDATORY PCRE is achieved by using <pcreposx.h> None of the above is obvious from reading code. Apologies. > I could hack this to work, but since we may have dueling headers here the > solution might not be trivial. Any suggestions? > If you want to try a hack, be my guess. This implementation is not to my liking and is tortured beyond belief. But the simple rule(s) regarding miRE object reference counts and PCRE wireup are this: 1) You SHOULD be able to see all the necessary detals to debug a miRE issue if you add --miredebug from the CLI or set (programatically) static int _mire_debug = -1; The " ++ " and " -- " lines are the reference counts being changed, and (in the case of segfaults/memleaks) a reference count ++/-- has gone AWOL somehow. 2) All code using the miRE API MUST do #include <pcreposix.h> because there's a risk of symbol pollution. This is tricky with out-of-tree builds like zypper (that may be peeking into miRE internals, or that may need #define's as arguments). The Right Thing To Do (but "Have it your own way!" RPM building voids whatever suggestions I might make) is a) build RPM --with-pcre=internal b) supply /usr/include/rpm/pcreposix.h for the internal pcre. Please note that #include <mire.h> is not any interface I wish to export from RPM's API whatsoever. But there's an endless need to compile with RPM and so <mire.h> flips out and is pulled back into RPM regular as clockwork. > Thanks, > Paul > > [1] http://bugzilla.pokylinux.org/show_bug.cgi?id=721 > ... checking ... Comment #3 at the bug report indicates problem is during configuration (where miRE is used to parse patterns out of /etc/rpm/platform). SHort answer: try deleting everything but the 1st line in /etc/rpm/platform, and I'll bet your segfault disappears. Hmmm ... you got a bunch of issues trying to initialize rpm through configuration. Usually there's only a single flaw. Short answer: if zypper is initializing -lrpmlib multiple times, well, that's asking for trouble because of the reference counting confusion between array and array element that is currently implemented @rpm5.org. There's no need to repeatedly re-initialize -lrpmlib. But the real hint is that "regcomp" not a wrapped symbol ("pcre_regcomp" or something) is in the stack backtrace. Which forces me to ... (yet another obscure aside) The emulation provided by <pcreposix.h> isn't perfect because a) #define's differ from POSIX b) its an API/ABI emulation only. E.g. the emulation is perfectly happy parsing PCRE dialect regexes passed through the POSIX emulated API. SHort answer: I believe you have symbol pollution because of the way that zypper <-> pcre <-> rpm are being built separately. There's gory details ... checking ... in rpm-devel archives near here http://rpm5.org/community/rpm-devel/3429.htm Short answer: Your build(s) need to see these #define's (from bottom of <pcreposix.h> in pcre/pcreposix.h inyernal to RPM) to wrap the symbols in the POSIX *RE emulation and prevent the symbol pollution somehow: #define regcomp pcreposix_regcomp #define regexec pcreposix_regexec #define regerror pcreposix_regerror #define regfree pcreposix_regfree Yes I'd dearly like to see a better "fix", but I ain't gonna hold my breath. No patch from @rpm5.org will ever be accepted by zypper devel's afaik, and the ultimate flaw is/was in <pcreposix.h> from PCRE. hth 73 de Jeff > -- > > Paul Eggleton > Intel Open Source Technology Centre (UK) > ______________________________________________________________________ > RPM Package Manager http://rpm5.org > Developer Communication List rpm-devel@rpm5.org ______________________________________________________________________ RPM Package Manager http://rpm5.org Developer Communication List rpm-devel@rpm5.org