Re: [pkg-discuss] [review] package performance improvements

Shawn Walker Thu, 23 Feb 2012 08:26:29 -0800

On 02/22/12 18:27, Edward Pilatowicz wrote:

On Wed, Feb 22, 2012 at 02:04:03PM -0800, Shawn Walker wrote:

On 02/21/12 17:46, Edward Pilatowicz wrote:

On Wed, Feb 15, 2012 at 09:04:03AM -0800, Shawn Walker wrote:

Greetings,


The following webrev contains general changes to improve the overall
performance of the package system:

   7145683 explore general pkg performance improvements

webrev:
   https://cr.opensolaris.org/action/browse/pkg/swalker/pkg-7145863/webrev/


i'm really impressed with all the performance you've squeezed out of
this code.  here are my comments:

----------
general:

...

- in the .c files, since it takes me a while to remember how python c
   bindings work, it'd be nice to have some comments that say:
        in the case of an error we call "return (NULL)", which will
        generate an exception.  in the case of success we call
        Py_RETURN_NONE.


That's not quite right though; the calling conventions differ
depending on the routine and situation involved.

Returning (NULL) by itself won't generate an exception.  If an
exception is already set, or in other error scenarios, a caller is
expected to return (NULL).  But again, even that isn't always true
(for example, in module init).

As for success, no, the value returned is dependent on the function.
It's just that functions are expected in non-error scenarios to
return *something* other than (NULL) (again this isn't always true,
for things like module init).  In these particular cases, since
something has to be returned, and None would be the normal default.


yeah.  i was more specifically thinking about the init routines where
this holds true.  (as mentioned more below, any comment hints or
reminders about how these functions should be working is always
appreciated.)


I'll think about this some and then add a few comments in some key places.

...

- in init_actions(), why Py_DECREF() the error classes right after
   looking them up?  (seems you'd want to keep a reference to them since
   you still have globals pointing to them.


So Python 2.x unfortunately has no module shutdown/cleanup hooks [1].

Since I know that these objects won't be garbage collected until the
actions module is unloaded, and since I can't DECREF them when
_actions is unloaded, I simply DECREF them ahead of time so garbage
collection on exit is clean.


please add a comment explaining this.


Done.

----------
src/modules/actions/_common.c

- in set_invalid_action_error(), instead of calling:

        PyObject_GetAttrString(pkg_actions,<class name>)

   for each error class when you go to generate an error, i'd recommend
   looking up all the error error class in init_common() ahead of time.
   the reason is that these errors paths are infrequently used so moving
   some of this code to init_common() will make it easier to catch any
   errors in the code.  (which makes maintenance in the future easier as
   well.)


I could see that except that we have test cases for every one of
these errors (and should continue to), so I don't see any real
benefit to looking them up ahead of time.  Certainly not concerned
about the performance aspects for this path.  I'd still have to
lookup the errors in a data structure or do an if/else if
comparison, so that doesn't seems simpler either.


perhaps because it makes it more like the _generic.c module that looks
up globals ahead of time?  :)


A pattern that I'm not yet convinced is worth replicating.

...

----------
src/modules/actions/generic.py

- if "hardlinks must follow all filesystem-modifying actions", why are
   they followed by "link" actions.

- user/group actions seem like filesystem modifying actions (since they
   update files) so why are they before "dir" actions?  (or perhaps i
   just don't understand the comment?)


For the record; I didn't change the order, but I agree the comment or
the order should match the other.

The user/group actions must be before dir/file actions so we can
chown the file/dir actions to the user/group that we might be
delivering at the same time.

Since I don't have a deathwish, I'm going to leave the order alone,
but here's the new comment:

# Directories must precede all filesystem object actions;
# hardlinks must follow all filesystem object actions (except links).
# Note that usr/group actions precede file actions; this implies that
# /etc/group and /etc/passwd file ownership needs to be part of initial
# contents of those files.


assuming i understand this now, could you update this to something like:

# Directories must precede all filesystem object actions;
# hardlinks must follow all filesystem object actions (except links).
# Note that usr/group actions precede file actions (so that we know
# about all the users and groups that other actions may be delivering
# content for); this implies that /etc/group and /etc/passwd file
# ownership needs to be part of initial contents of those files.

It's not so much so that we know about them as that the system permitsthem because they're written into the various /etc files first. Wecertainly analyse all actions before execution to determine whichusers/groups are being delivered during planning early on.


I'll assume that's what you meant here...

- in set_data(), the first thing it does is:
        if data is None:
   but _generic_init() won't invoke set_data() if data == None.


Correct, _generic_init() doesn't bother calling set_data() if it
doesn't need to as that would be much slower.  I've added a comment
to _common.c:_generic_init.


you could also remove "if data is None:" from set_data() (or replace it
with an "assert data != None").

No; as this function is not private and may be used by externalconsumers to change the data of an action after the fact. There's norequirement to construct an action with its data.

In the event that data is None, we just want to set the appropriateattribute and then drive on.


In particular, pkgsend (directly) and pkgdepend (indirectly) do this .

...

- in consolidate_attrs(), i thought you removed the ability for
   attribute values to be sets?  (it seems you did in __str__().)


Per the comments in other places, this ensures that the attribute
values for the variants are converted to lists from sets.  (See
comment in modules/flavor/base.py as an example).

If you'd prefer that we never provide set objects for the attribute
values so there's not even the slightest potential for a window
where we could fail, I'm willing to change modules/flavor/base.py
and modules/publish/dependencies.py to instead simply convert the
attributes themselves ahead of time.


i'm still a bit confused.

in consolidate_attrs(), we used to check for property values that were a
list.  you updated this so that we check for lists or sets.  (implying
to me that attributes can be lists or sets.)


Yes, making consolidate_attrs() a special case.

but in __str__(), we used to check if property values were a list or
sets, but you updated this so that now we only check for lists.  (and it
seems like __str__() will fail if it encounters a attribute which is a
set.)


Correct, because I didn't want actions to normally use sets.

this seems inconsistent to me.  from your comments it sounds like
attributes can indeed be sets, but does this mean there is some
undocumented convention that users must call consolidate_attrs() to
convert all the attributes to lists before using other functions like
__str__()?  (if this is the case perhaps there should be a local
"consolidated" variable within the action class so it can self
consolidate when necessary?)

Yes, I admit it is inconsistent. So I'll undo this and fix the sillyconsumers that are using sets to only use lists.

...

----------
src/modules/client/imageplan.py

- in __get_directories(), why does it matter in which order you update
   the "dirs" set?


It doesn't matter in which order I update, but it does matter how
many items I pass to update().  Constructing a set() object using a
generator with potentially many items is faster than calling update
with a set of many items that are needed to expand the set().

So I call update() with the smaller, constant set of data.  Again,
we're talking about milliseconds here 8)


how about adding a comment which says:

# it's faster to build a large set and make a small update to it than to
# do the reverse.


Done.

- why bother with:
        aname = act.name
        aattrs = act.attrs
        ...
                oattrs = oact.attrs


See earlier example in get_varcet_keys() above; we avoid a few extra
attribute retrievals since Python doesn't optimise this for us.  So
often, assigning a reference to an attribute and then re-using that
is faster.


in that case, there's another set of nested for loops in the same
function just below this one that seems like it could use the same
optimization for the same variables

The thing is, it's not always a win. I experimented with some of theother cases you're referring to and it ended up being a slight loss.Yay for may bizarreness.

----------
src/modules/flavor/base.py
src/modules/publish/dependencies.py

- is there any reason not to always invoke consolidate_attrs implicitly
   from the generic Action __init__() method (or perhaps just the
   DependencyAction __init__() method)?


Other than the substantially unnecessary overhead?  See my earlier
explanation above.


as per my reply, i guess i'm still confused as to the purpose of this
consolidation...

The original intent of the consolidate_attrs was to eliminate duplicatevalues in attributes. I was abusing it to also ensure attribute valueswere lists and not sets.


-Shawn

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [review] package performance improvements

Reply via email to