Re: [pkg-discuss] [review] package performance improvements

Edward Pilatowicz Tue, 21 Feb 2012 17:47:47 -0800

On Wed, Feb 15, 2012 at 09:04:03AM -0800, Shawn Walker wrote:
> Greetings,
>
> The following webrev contains general changes to improve the overall
> performance of the package system:
>
>   7145683 explore general pkg performance improvements
>
> webrev:
>   https://cr.opensolaris.org/action/browse/pkg/swalker/pkg-7145863/webrev/
>
>


i'm really impressed with all the performance you've squeezed out of
this code.  here are my comments:

----------
general:

- for classes which have methods defined via:
        YYY.ZZZ = types.MethodType(...)

  it might be nice to have a mention of these methods at the top of the
  class.  (because folks searching for those methods may not find them.)

- in the .c files, since it takes me a while to remember how python c
  bindings work, it'd be nice to have some comments that say:
        in the case of an error we call "return (NULL)", which will
        generate an exception.  in the case of success we call
        Py_RETURN_NONE.

- i have a bit of concern that some of your performance gains could be
  slowly lost in the future due to innocuous looking code changes.  i
  wonder if perhaps the test suite should have an automated mechanism
  for running benchmarks and comparing them against a known baseline?
  (then perhaps these benchmarks could be run regularly via hudson for
  every putback).

----------
src/modules/_varcet.c

- you could reduce indentation and make the code easier to read by
  doing:

        if (strncmp(as, "facet.", 6) != 0)
                continue
        ...
        if (facet != NULL) {
                /* Facets are currently OR'd. */
                ret = Py_False;
                continue
        }
        ...
        if (strncmp(as, "variant.", 8) != 0)
                continue

- in _allow_facet(), for consistency why not just do:
        if (facet == Py_True) {
                CLEANUP_FREFS;
                Py_RETURN_TRUE;
        }

----------
src/modules/actions/__init__.py

- why do the new InvalidActionError derived error classes use "*args"
  as a parameter instead of having usefully named arguments?  (i see
  that InvalidActionError and other InvalidActionError derived classes
  do it, but i don't see any comments explaining why they do this.)

- why not import fromstr() at the top of the file where we normally
  import stuff?

----------
src/modules/actions/_actions.c

- in fromstr(), rather than doing:
        PyErr_SetString(PyExc_TypeError, "unknown action type");
  shouldn't you raise an UnknownActionError?

- in general, i really prefer data driven code.  so for example, in both
  init_actions() and fromstr() you have hard coded all the action names.
  i'd would find it cleaner if there was one sorted array of action
  names, and that array was used in init_actions() to generate a table
  of pointers that could subsequently be used in fromstr().

  also, is this same sorted action list duplicated in depend.py (see
  comment below for depend.py)

- in init_actions(), shouldn't you call Py_DECREF(sys_modules) if the
  lookup for pkg_actions fails?

- in init_actions(), why Py_DECREF() the error classes right after
  looking them up?  (seems you'd want to keep a reference to them since
  you still have globals pointing to them.

- in init_actions(), shouldn't cache_class() do:
        Py_DECREF(action_types);
  before calling return?

----------
src/modules/actions/_common.c

- in set_invalid_action_error(), why not move the call to
  PyImport_ImportModule() into init_common()?

- in set_invalid_action_error(), instead of calling:

        PyObject_GetAttrString(pkg_actions, <class name>)

  for each error class when you go to generate an error, i'd recommend
  looking up all the error error class in init_common() ahead of time.
  the reason is that these errors paths are infrequently used so moving
  some of this code to init_common() will make it easier to catch any
  errors in the code.  (which makes maintenance in the future easier as
  well.)

- in set_invalid_action_error(), does the call to PyErr_SetObject() end
  up invoking a __init__() routine for the exc type object?  if so it'd
  be great to get some comments stating that.

- in _generic_init(), CLEANUP_GREFS() seems kinda pointless since it
  only expands to one line of code.

- in _generic_init(), all your strncmp() calls should have their 'n'
  value incremented by one.  ex:
                strncmp(ns, "set", 3)
  should be:
                strncmp(ns, "set", 4)
  otherwise you're testing for a prefix substring match and not a full
  string match.

- in _generic_init(), could you add a comment mentioning why it's ok
  for "set" and "signature" actions to not have their key_value defined?

- _file_init() ends up calling PyArg_UnpackTuple() twice.  why not avoid
  this by creating _generic_init_common() that kaes a "action" and
  "data" pointer and gets called from both _file_init() and
  _generic_init() after calling PyArg_UnpackTuple().

----------
src/modules/actions/attribute.py

- i think you broke the "convenience" name=value code by removing the
  check for "name" and "value".  (it looks like we always raise an
  exception if we enter this code path.)  if i'm reading the code right
  it might be nice to have a test case for this code path.

----------
src/modules/actions/depend.py

- please add a comment above known_types explaining that it is sorted
  for performance reasons and mention how you determined the sort order
  (so that no one else comes along and resorts it in alphabetic order,
  and so that the next person creating a new dependency type can
  somehow determine where their new entry should be placed.)

  does this sorted list overlap with the sorted actions hard coded into
  generic.c`fromstr() (and generic.c`init_common())?  if so is there any
  way that this could all be reduced into one list?

----------
src/modules/actions/generic.py

- if "hardlinks must follow all filesystem-modifying actions", why are
  they followed by "link" actions.

- user/group actions seem like filesystem modifying actions (since they
  update files) so why are they before "dir" actions?  (or perhaps i
  just don't understand the comment?)

- in set_data(), the first thing it does is:
        if data is None:
  but _generic_init() won't invoke set_data() if data == None.

- in __cmp__(), it'd be nice if you could scope the "except
  AttributeError" handling more narrowly (not to include the call to
  compare). there's another similar comment below for catalog.py.

- in get_varcet_keys(), walking through self.attrs twice speeds things
  up?

- in different(), it seems odd that you raise an assertion if a file
  action is comparted to another action, but you don't raise an
  assertion if another action is compared to a file action.  perhaps you
  could make this more consistent by doing something like:

        shash = ohash = None
        try:
                shash = self.hash
                ohash = other.hash
                ...
        except AttributeError:
                if shash or ohash:
                        raise AssertionError(...)

- in consolidate_attrs(), i thought you removed the ability for
  attribute values to be sets?  (it seems you did in __str__().)

----------
src/modules/catalog.py

- in general, feel free to call me paranoid, but putting function calls
  in try blocks like this worries me:

        try:
                yield f, self.__gen_actions(f, entry["actions"],
                    excludes)
        except KeyError:
                ...

  the reason is that if any code in __gen_actions() generates a KeyError
  (which is a pretty common exception) we'll unexpectedly catch that
  here.  i'd much rather see something like:

        try:
                actions = entry["actions"]:
        except KeyError:
                ...
        else:
                yield f, self.__gen_actions(f, actions, excludes)

----------
src/modules/client/imageplan.py

- in __get_directories(), why does it matter in which order you update
  the "dirs" set?

- why bother with:
        aname = act.name
        aattrs = act.attrs
        ...
                oattrs = oact.attrs

----------
src/modules/client/pkg_solver.py

- in __comb_newer_fmris(), you could reduce indentation and make the code
  easier to read by doing:
        if tp in self.__cache:
                return self.__cache[tp]


- in __comb_common(), the elif below could probably just be an else:
        if not mver:
                ...
        elif mver:
                ...

----------
src/modules/fmri.py

- there are two separate:
        if veridx != None:
                ...
        else
                ...
  comparisons in __init__().  why not just have one and set self.version
  and self.pkg_name in the same block?

----------
src/modules/manifest.py

- instead of:
        if aname == "set" and attrs["name"] == "authority":
                ...
        elif aname == "hardlink":
  why not do:
        if aname == "set":
                if attrs["name"] == "authority":
                        ...
        elif aname == "hardlink":
  that way the "elif" won't get invoked for most set actions.

- could you add a comment explaining why we need to lstrip() hardlink
  paths?

----------
src/modules/flavor/base.py
src/modules/publish/dependencies.py

- is there any reason not to always invoke consolidate_attrs implicitly
  from the generic Action __init__() method (or perhaps just the
  DependencyAction __init__() method)?

----------
src/tests/api/t_manifest.py

- why are you deleting the test_variants() test case?

- if your going to delete the test_variants() test case, it's the only
  consumer of self.m6_contents so you should delete that as well.

----------
src/tests/cli/t_pkgdep.py

- i have no idea why you removed "%(pfx)s.path=isadir/pfoo2/baz" from
  double_double_stdout.

----------
src/tests/perf/fmribench.py

- for "fmri create (parts)", why are you specifying build_release if
  that information is already included in the specified version string?
  (what happens if build_release doesn't match the build_release value
  embedded in version?)
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [review] package performance improvements

Reply via email to