Re: [Rpm-maint] [PATCH] Improve macro table performance

Panu Matilainen Thu, 14 Feb 2013 10:40:53 -0800

On 02/13/2013 08:34 PM, Alexey Tourbin wrote:

On Wed, Feb 13, 2013 at 12:59:02PM +0200, Panu Matilainen wrote:

There's an assumption that the table must be sorted, because
rpmDumpMacroTable (used by 'rpm --showrc') prints entries in sorted
order. It is not immediately obvious whether this is actually
important, and hence whether the assumption can be dropped.


--showrc is just a diagnostic tool and its output order is mere
cosmetics. If somebody cares enough, one could always sort the hash
table keys for output, performance is irrelevant for that operation.

Meanwhile,
I argue that the change is worth applying anyway, because it removes
qsort(3) stuff and makes use the underlying data structure in a more
concise and explicit manner - such that it will be easier to try
another data structure.


I haven't seriously looked but its entirely possible this would
indeed make it easier to switch to a hash-table (which to me is
clearly the most "natural" data structure for macros). In the
meanwhile, this *removes* a fair amount of code and is much faster,
what's not to like?


I have another reason for going with a sorted table: introducing bsearch
macro (see below).

Well, a couple of things:

Also, macro table entries are now allocated in a single chunk.


Not so much the change itself but the fact unrelated changes are
combined into one commit. If you find yourself typing "also", "in
addition" or the like into commit messages, most of the time its a
sign that the change should go into a separate commit :)


Generally, yes.  In another commit, I found myself grouping a few
changes together only to come up with a simpler commit subject, like
"Optimize readLine routine".  This includes a few loosely coupled
optimizations to readLine and related static routines; these
optimizations can indeed be separated, and probably should be.
But sometimes there is a trade-off between separating independent
changes and grouping related changes into a single logical commit.

In my personal experience, every time I let myself or somebody else belazy with splitting changes, it comes back to bite me sooner or later :)Of course there are exceptions to everything, and sometimes its simplynot possible to sanely split a large change.

Also the change as its done here obscures macro entry data structure
somewhat:

     char *name;         /*!< Macro name. */
     char *opts;         /*!< Macro parameters (a la getopt) */
     int used;           /*!< No. of expansions. */
     int level;          /*!< Scoping level. */
     char body[];        /*!< Macro body. */

"name" and "opts" look like they are individual pointers that should
be freed separately while they're actually pointers to within
body[]. If you want to go this way, use something like arena[] for
the storage area and make name, opts and body separate "const char
*" pointers into the arena so its clear one shouldn't go freeing
them. Readability is more important than optimizing away one pointer
per macro entry.


I argue that this data structure makes sense, and is probably optimal.

- Each macro has its own body, which is almost always non-empty (macros
   with logically empty body are usually defined via %nil).  This rules
   out the possibility for sharing a body with another macro, and this
   makes useless any further optimization for null/empty body.  Thus a
   pointer is not needed for body, and it makes sense to allocate the
   body on behalf of flexible array member, at a known/fixed offset.

- Macro name can be shared: when pushed for the first time, the name is
   allocated on behalf of flexible array, after the body.  Otherwise, the
   name is shared with the "underlying" macro.  Hence the pointer is
   inevitable.

- Options are even more tricky: simple macros have opts set to NULL;
   macros with arguments like foo() have opts point to a statically
   allocated string ""; finally, macros like foo(o:) have opts point into the
   flexible array member, somewhere after the body.

Note that sharing macro name existed before, which indicates an
intention and a half-hearted attempt to cut down on malloc.  I bring
this attempt to the logical conclusion: no further optimizations are
possible (in this respect).

As for arena[], only body[] is tightly coupled with the macro; name and
opts have a fair chance to point somewhere else, and it's none of
somebody's business where they actually point to.  This is how the
arrangement can be understood, and it makes sense then to leave the
body[] as a flexible array member.

Hence I don't buy the argument that it is not clear whether the pointers
should be freed.  There have already been some complications with
freeing a shared name, and these complications are now gone: a macro
entry is freed with a single free(me) call (and there's a comment which
ensures that "me" actually comes in a single chunk).  Furthermore, the
only place where a macro needs to be freed is in delMacro(), and the
only place where the macro is allocated is addMacro().  All allocations
and all data structure manipulations are now local to these two
complementary routines.  In the rest of the code, the macros and the
table are used essentially in read-only mode.

Yes I see what it does, after staring it for a bit. My initial reactionhowever was:

- Mm, isn't this leaking memory from opts/name?
- No... but then where are they stored?
- Pointer into body? Wtf has the body got to do with this?

But perhaps you misunderstood what I meant: I'm not complaining aboutthe way things are stored in one malloc blob, this is just a naming /clarity thing. On top of your patch:


--- a/rpmio/macro.c
+++ b/rpmio/macro.c
@@ -37,11 +37,12 @@ extern int optind;
 /*! The structure used to store a macro. */
 struct rpmMacroEntry_s {
     struct rpmMacroEntry_s *prev;/*!< Macro entry stack. */
-    char *name;        /*!< Macro name. */
-    char *opts;        /*!< Macro parameters (a la getopt) */
+    const char *name;          /*!< Macro name. */
+    const char *opts;          /*!< Macro parameters (a la getopt). */
+    const char *body;  /*!< Macro body. */
     int used;           /*!< No. of expansions. */
     int level;          /*!< Scoping level. */
-    char body[];       /*!< Macro body. */
+    char arena[];      /*!< String arena. */
 };

 /*! The structure used to store the set of macros in a context. */
@@ -1339,7 +1340,7 @@ addMacro(rpmMacroContext mc,
        /* entry with shared name */
        me = xmalloc(mesize);
        /* copy body */
-       p = me->body;
+       me->body = p = me->arena;
        if (blen)
            memcpy(p, b, blen + 1);
        else
@@ -1363,7 +1364,7 @@ addMacro(rpmMacroContext mc,
        size_t nlen = strlen(n);
        me = xmalloc(mesize + nlen + 1);
        /* copy body */
-       p = me->body;
+       me->body = p = me->arena;
        if (blen)
            memcpy(p, b, blen + 1);
        else

That way name, body and opts become self-explanatory by just looking atthe struct declaration: they're const so they're pointers to "somewhereelse, arena probably" and must not be freed or otherwise messed with,and the compiler will complain if you try to do so.

So, please split it into two parts:
- one to eliminate qsort() + bsearch()
- one to change the macro entry allocation (and see above)


I'll recheck if it is possible to separate the data structure and the
allocation logic, but it seems that it is not, since the data structure
and the allocation are now tightly coupled.  The skeleton of addMacro()
now looks like this:

        findEntry
        if (found)
            allocate entry with shared name
        else
            prepare slot
            allocate entry with flexible name
        allocate/initialize the rest
        insert into the slot

If it is not possible to produce two patches that commute (i.e. can be
applied in any order), it is probably best to go with a single patch.

On the surface it seems like it shouldn't be particularly hard to split,but I didn't look at it that carefully and might well be missingsomething. It's not the end of the world or worth putting a huge effortinto, but if its *reasonably* splittable, please do.

Also one cosmetic nit: in addMacro(), this "else" belongs to the
line above with the closing brace:
+    else {
+       /* extend macro table */
+       const int delta = 256;


I usually prefer "uncuddled elses" (this comes from Perl, and the term
comes from perlstyle manpage).  However, I'm willing to adjust to the
received style.

Now to the bsearch macro.  I've been having some second thoughts about
the patch: is it okay to plug custom bsearch code like this?  Bsearch
logic is pretty straightforward, but that's still quite a few lines,
and there are other bsearch calls throughout the code.

Do we even save much by inlining bsearch?  On the one hand, since
bsearch iterates only log(n) times, which is roughly 10 (for 1024
elements), there might seem to be just too little stuff to cut down on.
On the other hand, it's been estimated that fully inlined bsearch/qsort
with very simple comparison (i.e. array of integers) is 4 times faster;
it can be 2 times faster on the array of short strings.  So, on the
other hand, if you search routine is 2 times slower than it can be,
it's a great deal.

So I end up with trying to tailor a (reusable) BSEARCH macro, and there
are quite some details to get it right.  It will eliminate findEntry()
function, since findEntry is completely reduced to bsearch, and gcc
already creates a few copies of findEntry due to constant argument
propagation.

I wouldn't worry about the added lines too much, as despite the custombsearch() this patch actually reduces the lines of code, and is faster.And since it might be replaced with a hash table in not-too-distantfuture...


        - Panu -
_______________________________________________
Rpm-maint mailing list
[email protected]
http://lists.rpm.org/mailman/listinfo/rpm-maint

Re: [Rpm-maint] [PATCH] Improve macro table performance

Reply via email to