Re: [pkg-discuss] pkg(5) Performance was Re: ON/SXCE bi-weekly schedule not valid anymore?

Mike Gerdts Tue, 25 Aug 2009 21:31:17 -0700

On Tue, Aug 25, 2009 at 8:24 PM, <[email protected]> wrote:
> Bcc:
> Subject: Re: [on-discuss] [pkg-discuss] [osol-discuss] ON/SXCE bi-weekly
>        schedule not valid anymore?
> Reply-To:
> In-Reply-To: <[email protected]>
>
> On Tue, Aug 25, 2009 at 07:50:19PM -0500, Mike Gerdts wrote:
>> Silly example, but is representative of something that needs to be
>> done from time to time.
>>
>> Solaris 10:
>>
>> $ uname -srvi
>> SunOS 5.10 Generic_141414-02 SUNW,SPARC-Enterprise-T5120
>>
>> $ ptime grep -w ls /var/sadm/install/contents
>> [snip]
>> real        2.771
>> user        2.653
>> sys         0.115
>>
>> OpenSolaris:
>>
>> $ uname -srvi
>> SunOS 5.11 snv_111b SUNW,SPARC-Enterprise-T5120
>>
>> $ ptime pkg search -l ls
>> [snip]
>> real       34.130866085
>> user       30.690454076
>> sys         0.774897050
>
> This one is a bit tough, since there isn't a SysV packaging command to
> find the package that a file belongs to.
>
> That said, the performance is a lot better on x86, where on an Ultra 27
> with nv121, I get the following for a local search:
>
> real        8.613470157
> user        8.323895802
> sys         0.280899637


Ugh.  If we accept the falacy that performance scales linearly with
clock speed, that means that if my T5120 were running at 4.8 GHz I
would see similar performance.  The specs I have access to says that
your CPU's (with significantly more sophisticated execution units) are
running somewhere between 2.6 and 3.3 GHz.  I'm not sure that we're
seeing a lot more here than the speed of individual execution units
within the cores.

Let's compare to a packaging system that we need to be competitive
with.  On a just revived Fedora 9 instance in virtualbox on a 1.8 GHz
Core2 Duo:

# time rpm -qf /bin/ls
coreutils-6.10.35.fc9.i386

real    0m0.259s
user   0m0.029s
sys    0m0.106s

An operation that traverses the entire rpm database (rpm -qal | grep
-w ls) completes in less than 42 seconds, with only 3.3 seconds of
user time.  Oddly enough the time consuming part of that is the task
switching involved in sending data to grep - If I send the output to
/dev/null it complets in just over 10 seconds.  When grep is reading
from stdin and writing a very small amount to stdout, it shouldn't
take 20+ seconds of kernel time.  (Now I guess I need to fire up an
OpenSolaris virtuabox to compare context switching efficiency...)

>> My experience with hardware that I can order from Sun today says that
>> the new software takes 12x longer for equivalent tasks.  There is a
>> very high startup cost with pkg that does not exist with pkgadd.  With
>> pkgadd I don't think too far ahead to be sure to group as many
>> operations into one invocation as possible.  When I use pkg, I most
>> certainly try to lump as many operations as possible into each
>> invocation to avoid this startup penalty.
>
> Is this 12x on SPARC, or in all cases?  Most of the analysis that I've
> done so far has been on x86, and is generally quite fast.

Your example of 8 seconds for a search is not fast.  It is 33x slower
than the equivalent rpm operation on much slower hardware.  On the
same hardware, my guess is that rpm is 50x faster.

The sparc system I am running on has 1337 packages installed.  "pkg
list" takes about 33.6 seconds.  An x86 system (Athlon dual core 1.9
GHz) running build 117 that has 777 packages installed completes "pkg
list" in about 10.0 seconds.   In other words, the x86 system at 1.58x
the clock speed processes 2x the number of packages per second.

> We also haven't performed any optimization for startup costs yet.  There
> are some options available today, and some that we hope will be
> available in the future.

Based on some profiling I did[1], it looks like read_dict_file is a
culprit for poor performance with "pkg search -l ls".

1. http://onlamp.com/pub/a/python/2005/12/15/profiling.html

         842695 function calls (842377 primitive calls) in 33.868 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   17.913   17.913   21.051   21.051
/usr/lib/python2.4/vendor-packages/pkg/search_storage.py:661(read_dict_file)
        2    3.181    1.591    3.182    1.591
/usr/lib/python2.4/vendor-packages/pkg/choose.py:10(choose)
   691110    3.137    0.000    3.138    0.000
/usr/lib/python2.4/vendor-packages/pkg/search_storage.py:647(__unquote)
       16    1.440    0.090    7.525    0.470
/usr/lib/python2.4/vendor-packages/pkg/manifest.py:359(search_dict)
    16234    1.423    0.000    1.423    0.000
/usr/lib/python2.4/vendor-packages/pkg/client/variant.py:74(allow_action)
...

That is over 50% of the total time was spent within the following
code, not including the functions it called.

   661          def read_dict_file(self):
   662                  """Reads in a dictionary stored in with an entity
   663                  and its number on each line.
   664                  """
   665                  if self.should_reread():
   666                          self._dict.clear()
   667                          for line in self._file_handle:
   668                                  res = line.split(" ")
   669                                  token = self.__unquote(res[0])
   670                                  offset = int(res[1])
   671                                  self._dict[token] = offset

I can't help but think that parsing text every time is the wrong
approach.  A binary format that can be mmap'd and used would probably
take over 17 of the 17.9 seconds off of this function.

>> When I look at the publicly disclosed/speculated road map for CMT
>> systems, I don't see things improving for the simple operations
>> without fixing the software.  I eagerly await the SAT solver and any
>> other improvements that are in the works.
>
> We know that there are some things that need to be fixed for CMT systems
> now.  We're also aware the more analysis need to be performed before an
> enterprise release gets shipped.  We've talked about making
> decompression and hash verification occur in parallel.  There has also
> been some discussion about finding and removing recursive algorithms,
> because the spill/fill traps on SPARC are quite costly.

Today was my second day of Python performance analysis.  In the
process, I tried psyco[2].  With it in place, pkg spent about 36% of
its time in traps.  "pkg search -l ls" performance degraded by about
7x.

2. http://psyco.sourceforge.net/

Trapstat showed:

# trapstat 10 1
vct name                |     cpu0     cpu1     cpu2     cpu3
------------------------+------------------------------------
  9 immu-miss           |        0       48        5       72
 20 fp-disabled         |        0        0        0        0
 24 cleanwin            |   116540      393    65420    13737
 31 dmmu-miss           |        1      136       92      245
 34 unalign             |    87318        0    48793     9499
...
 ac spill-asuser-32-cln |    87416      264    48956     9993
 b0 spill-asuser-64-cln |       40       39        0        0
...

Is this indicative of the problem you mention, or another one?

>
>> Right now I'm not complaining - I know the software is young and the
>> primary development platform is x86 where the regression isn't so
>> apparent.  Once I start hearing that there aren't big performance
>> improvements coming, I will start opening support calls if the
>> performance is still worse than before.
>
> Thanks for being reasonable.  It doesn't make a lot of sense for us to
> spend lots of time optimizing performance in portions of our code that
> are going to change soon.  Once the feature set has stabilized more, our
> expectation is that we'll have more time to find and fix performance
> problems that are platform specific.  That said, we continue to make
> algorithmic improvments that should benefit all platforms.
>
> -j
>

Are any of the improvements possibly aimed at mmap'able database?  I
believe this is the key to rpm's speed.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] pkg(5) Performance was Re: ON/SXCE bi-weekly schedule not valid anymore?

Reply via email to