On Thu, Aug 27, 2009 at 09:19:08PM -0500, Mike Gerdts wrote:
> Time in read_dict_file improved by over 20% (17.9 sec -> 14.2 sec) by
> inlining one tiny function.  Overall execution time improved by 13%
> (33.9 sec -> 29.6 sec).

This is a good improvment, but I think there are some other things going
on in this code too.

The profiler is a good first pass at figuring out what's going on, but
it adds a lot of overhead to the process.

Here's the timing reported by the profiler:

5743405 function calls (5739751 primitive calls) in 26.670 CPU seconds

Here's the timing without:

real        8.830243431
user        8.435688975
sys         0.305382919

So it's adding about 3.2x to the process.

I prefer to sort this output two different ways.

The first is by number of calls:


   104736    0.440    0.000    1.820    0.000 file.py:351(generate_indices)
   128691    1.360    0.000    2.430    0.000 file.py:55(__init__)
   136467    0.350    0.000    1.930    0.000 posixpath.py:110(basename)
   136469    0.940    0.000    1.580    0.000 posixpath.py:74(split)
   137895    0.260    0.000    0.260    0.000 :0(rfind)
   148364    0.280    0.000    0.280    0.000 :0(has_key)
150517/148344    1.790    0.000    2.770    0.000 manifest.py:380(__handle_list)
   153783    0.220    0.000    0.220    0.000 :0(rstrip)
165768/165460    0.200    0.000    0.200    0.000 :0(len)
   177389    0.430    0.000    2.260    0.000 generic.py:529(include_this)
   177389    1.510    0.000    1.830    0.000 variant.py:74(allow_action)
   177390    0.580    0.000    0.580    0.000 :0(tell)
   177399    0.760    0.000    0.760    0.000 :0(_fromstr)
   177399    0.890    0.000    1.120    0.000 generic.py:96(__init__)
   177399    1.660    0.000    5.760    0.000 __init__.py:153(fromstr)
   177412    0.330    0.000    0.330    0.000 :0(keys)
   177869    0.440    0.000    0.440    0.000 :0(strip)
   178268    0.630    0.000    0.630    0.000 :0(readline)
   181606    0.230    0.000    0.230    0.000 :0(get)
   300941    0.530    0.000    0.530    0.000 :0(lstrip)
   324351    0.430    0.000    0.430    0.000 search_storage.py:657(__unquote)
   347831    0.600    0.000    0.600    0.000 :0(split)
   417795    0.930    0.000    0.930    0.000 :0(isinstance)
   426242    0.860    0.000    0.860    0.000 :0(append)
   644414    1.110    0.000    1.110    0.000 :0(match)

The second is by total time:

    18053    0.100    0.000    0.400    0.000 link.py:121(generate_indices)
    20261    0.130    0.000    0.220    0.000 link.py:48(__init__)
    15522    0.160    0.000    0.240    0.000 directory.py:47(__init__)
     4215    0.170    0.000    0.270    0.000 depend.py:89(clean_fmri)
165768/165460    0.200    0.000    0.200    0.000 :0(len)
   153783    0.220    0.000    0.220    0.000 :0(rstrip)
   181606    0.230    0.000    0.230    0.000 :0(get)
   137895    0.260    0.000    0.260    0.000 :0(rfind)
   148364    0.280    0.000    0.280    0.000 :0(has_key)
        1    0.290    0.290   22.960   22.960 query_parser.py:156(search)
   177412    0.330    0.000    0.330    0.000 :0(keys)
   136467    0.350    0.000    1.930    0.000 posixpath.py:110(basename)
   177389    0.430    0.000    2.260    0.000 generic.py:529(include_this)
   324351    0.430    0.000    0.430    0.000 search_storage.py:657(__unquote)
   104736    0.440    0.000    1.820    0.000 file.py:351(generate_indices)
   177869    0.440    0.000    0.440    0.000 :0(strip)
   300941    0.530    0.000    0.530    0.000 :0(lstrip)
   177390    0.580    0.000    0.580    0.000 :0(tell)
   347831    0.600    0.000    0.600    0.000 :0(split)
   178268    0.630    0.000    0.630    0.000 :0(readline)
   177399    0.760    0.000    0.760    0.000 :0(_fromstr)
   426242    0.860    0.000    0.860    0.000 :0(append)
   177399    0.890    0.000    1.120    0.000 generic.py:96(__init__)
        2    0.920    0.460    2.020    1.010 choose.py:10(choose)
   417795    0.930    0.000    0.930    0.000 :0(isinstance)
   136469    0.940    0.000    1.580    0.000 posixpath.py:74(split)
   644414    1.110    0.000    1.110    0.000 :0(match)
   128691    1.360    0.000    2.430    0.000 file.py:55(__init__)
        1    1.400    1.400    2.360    2.360 
search_storage.py:671(read_dict_file)
        1    1.450    1.450   22.660   22.660 
query_parser.py:208(_search_fast_update)
   177389    1.510    0.000    1.830    0.000 variant.py:74(allow_action)
   177399    1.660    0.000    5.760    0.000 __init__.py:153(fromstr)
150517/148344    1.790    0.000    2.770    0.000 manifest.py:380(__handle_list)
      704    3.390    0.005   19.230    0.027 manifest.py:360(search_dict)

At least on x86, it looks like we're spending a lot of time on string
manipulations and action creation.  The :0(whatever) functions are
string builtin routines, and the file/__init__/generic(__init__)
routines are constructions for actions.

We've optimized these before.  I'm not sure how much performance is left
on the table there.

The __unquote function shows up as getting called frequently, but from
this output it actually looks like search_dict gets called more frequently,
doesn't have as many leaf calls.  In fact since we're seeing a lot of
time accrue under _search_fast_update, but not read_dict_file, it looks
like search_dict may actually be a better candidate for optimization.
(Though the unquote fix looks useful, to be sure).

As it turns out, we can verify this with dtrace.

              libc.so.1`lseek64+0x7
              libpython2.4.so.1.0`file_tell+0x34
              libpython2.4.so.1.0`call_function+0x315
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2b11
                [ /usr/lib/python2.4/vendor-packages/pkg/manifest.py:449 
(search_dict) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x903
              libpython2.4.so.1.0`fast_function+0x164
              libpython2.4.so.1.0`call_function+0xcf
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2b11
                [ 
/usr/lib/python2.4/vendor-packages/pkg/client/query_parser.py:240 
(_search_fast_update) ]
              libpython2.4.so.1.0`fast_function+0xf8
              libpython2.4.so.1.0`call_function+0xcf
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2b11
                [ 
/usr/lib/python2.4/vendor-packages/pkg/client/query_parser.py:187 (search) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x903
              libpython2.4.so.1.0`function_call+0x153
              libpython2.4.so.1.0`PyObject_Call+0x20
              libpython2.4.so.1.0`ext_do_call+0x114
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2bba
                [ /usr/lib/python2.4/vendor-packages/pkg/query_parser.py:896 
(search) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x903
              libpython2.4.so.1.0`function_call+0x153
              libpython2.4.so.1.0`PyObject_Call+0x20
              libpython2.4.so.1.0`instancemethod_call+0x12c
              libpython2.4.so.1.0`PyObject_Call+0x20
              libpython2.4.so.1.0`ext_do_call+0x114
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2bba
                [ 
/usr/lib/python2.4/vendor-packages/pkg/client/query_parser.py:84 (search) ]
              libpython2.4.so.1.0`gen_iternext+0x48
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2104
                [ /usr/lib/python2.4/vendor-packages/pkg/client/api.py:1065 
(<generator expression>) ]
              libpython2.4.so.1.0`gen_iternext+0x48
              libpython2.4.so.1.0`PyIter_Next+0x16
              itertools.so`chain_next+0x22
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2104
                [ /usr/bin/pkg:1080 (search) ]
              libpython2.4.so.1.0`fast_function+0xf8
              libpython2.4.so.1.0`call_function+0xcf
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2b11
                [ /usr/bin/pkg:2547 (main_func) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x903
              libpython2.4.so.1.0`function_call+0x153
              libpython2.4.so.1.0`PyObject_Call+0x20
              libpython2.4.so.1.0`ext_do_call+0x114
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2bba
                [ /usr/bin/pkg:2599 (handle_errors) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x903
              libpython2.4.so.1.0`fast_function+0x164
              libpython2.4.so.1.0`call_function+0xcf
              libpython2.4.so.1.0`PyEval_EvalFrame+0x2b11
                [ /usr/bin/pkg:2711 (?) ]
              libpython2.4.so.1.0`PyEval_EvalCodeEx+0x903
              libpython2.4.so.1.0`PyEval_EvalCode+0x22
              libpython2.4.so.1.0`run_node+0x39
              libpython2.4.so.1.0`PyRun_SimpleFileExFlags+0x14d
              libpython2.4.so.1.0`Py_Main+0x86b
              isapython2.4`_start+0x7d
              508

This is the most frequently occuring stack when using the profile
provider to look at search -l ls.  Almost all of the other stacks
include _search_fast_update, so this appears to confirm what the
profiler's output showed.  My guess is that by optimizing search_dict,
you might see better results.

Another possibility is to pick off some of the most frequently called
fucntions from the first list, and see if we can reduce their usage in
hot paths.

In fact, DTrace has highlighted another interesting case for us.  It
looks like this code calls lseek frequently.

We can trace the syscalls:

  open64                                                     12224612
  read                                                       27717232
  llseek                                                    194982874

These values are in nanoseconds, so we're spending .02 seconds reading
the data, and .19 seconds seeking around in the file.  That's a lot of
time for the computer, but it's not very much time compared to the
overall 8 seconds for the operation to complete.

My guess is that we can squeeze some performance out of search_dict.

However, the most gain may come from optimizing the indexes for the
types of searches that you want to perform.  The existing search system
is generic and has a language that lets you write a complex series of
queries that is run against a wide variety of different bits of
indexable content.  This is great, but if we know that there are common
queries that users are going to perform, it would probably make more
sense to build separate indexes for the common types of queries.  The
rpm example that was given earlier has a rather specific meaning.  There
are different query types that can be given to the -q option, and I
would imagine that they've optimized a number of these to be fast.

The binary format is an interesting idea.  Mmap isn't as fast when used
with ZFS.  Python has a series of classes that let it write binary
representations of objects out to disk.  We've looked at this in the
past, but found, I think, that the overhead of converting the binary
blobs into objects was almost as higher than  constructing the objects
out of our own string representation.  This is why I'm inclined to
suggest that optimizing a particular index for a type of query might
actually be more fruitful than a binary blob.  If we don't know what
we're looking for in memory, it's still going to be expensive to
linearly scan that mmap'd range.

I think we should take a closer look at search_dict, and then also
figure out what common questions people are trying to ask using search.

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to