2003

Raphael Manfredi Fri, 28 Feb 2003 16:53:11 -0800

I found the "connection stall" problem on TX.  It was systematically occurring
since I optimized mq_service() to send less data.  Adding tracing to the
compressing TX layer lead me to the solution.  The service routine needs
to saturate the lower layer or flush its whole data.


I'm currently doing some profiling, and was able to optimize most bottlenecks.
That does not mean the CPU usage will drop significantly if you use GTK2
though.  I'm only optimizing GTKG's core routines.

Here's a sample of what profiling said before I started:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 46.32    641.20   641.20                             qlink_cmp
 20.83    929.55   288.35 3094965107     0.09     0.09  gmsg_cmp
  6.47   1019.05    89.50  1914867    46.74    47.45  make_room
  3.48   1067.26    48.21   200775   240.12   279.74  mq_service
  2.43   1100.86    33.60   372429    90.22    90.22  ev_link
  1.30   1118.92    18.06  1625592    11.11    18.24  filter_apply
  1.02   1133.06    14.14    11754  1202.99  1202.99  qrt_apply_patch
  0.86   1144.97    11.91    70077   169.96   706.39  search_results
  0.86   1156.86    11.89  2958510     4.02     4.02  SHA1ProcessMessageBlock
  0.79   1167.78    10.92 106371927     0.10     0.21  SHA1Input
  0.71   1177.56     9.78 13059343     0.75     0.75  pattern_qsearch
  0.66   1186.66     9.10   412229    22.08    23.30  qrt_build_query_target
  0.59   1194.87     8.21                             dl_server_retry_cmp
  0.57   1202.75     7.88      142 55492.96 198226.32  qrt_dump

Notice the obscene time used by the gmsg_cmp and qlink_cmp routines.
Those were exercised during flow control.

Yesterday evening, the profiling had turned into:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 11.17     78.92    78.92                             dl_server_retry_cmp
  8.38    138.13    59.21  5545088     0.01     0.01  search_has_muid
  6.91    186.93    48.80 11891432     0.00     0.00  SHA1ProcessMessageBlock
  5.98    229.17    42.24  3526551     0.01     0.02  filter_apply
  4.38    260.10    30.93   166372     0.19     0.19  ev_link
  3.36    283.83    23.73 19755618     0.00     0.00  SHA1Input
  3.17    306.25    22.42 27921284     0.00     0.00  pattern_qsearch
  2.50    323.93    17.68  3656995     0.00     0.01  make_room
  1.92    337.51    13.58 155249357     0.00     0.00  gm_snprintf
  1.91    351.04    13.53   779884     0.02     0.02  qrt_build_query_target
  1.41    361.01     9.97  1290439     0.01     0.01  qlink_remove

See the place SHA1ProcessMessageBlock took?

The next target was dl_server_retry_cmp, and I optimized that this evening,
as well as filter_apply and search_has_muid.

Here's what it looks like now with a 34 minute run:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
  9.43      2.34     2.34   204177    11.46    18.69  filter_apply
  4.71      3.51     1.17  1618350     0.72     0.72  pattern_qsearch
  3.55      4.39     0.88  8722846     0.10     0.10  gm_snprintf
  3.34      5.22     0.83   165175     5.02     6.11  make_room
  3.26      6.03     0.81    13555    59.76    59.76  ev_link
  2.34      6.61     0.58    30057    19.30    48.93  mq_service
  2.26      7.17     0.56    45706    12.25    12.84  qrt_build_query_target
  1.85      7.63     0.46    15793    29.13    29.53  gui_update_download_clear
  1.73      8.06     0.43   418255     1.03     1.03  base32_encode_exactly
  1.73      8.49     0.43   216915     1.98     1.98  base32_decode_alphabet
  1.57      8.88     0.39     2028   192.31   552.16  download_pickup_queued
  1.53      9.26     0.38   610417     0.62     0.62  strlower
  1.49      9.63     0.37    53078     6.97     6.97  host_is_connected
  1.45      9.99     0.36   144966     2.48    90.60  node_parse
  1.45     10.35     0.36     4752    75.76   353.56  download_store
  1.41     10.70     0.35    60232     5.81     5.81  qlink_remove
  1.41     11.05     0.35    46400     7.54    18.24  st_search
  1.29     11.37     0.32   234436     1.36     1.92  qlink_cmp
  1.25     11.68     0.31   968268     0.32     0.33  zalloc
  1.25     11.99     0.31    87556     3.54    24.20  search_request

The hot spot is filter_apply now.  I've tried to optimize it, but it
is not obvious, and my "optimized" version was actually slower.  I backed
off my changes.

It would be interesting to compile GTK2 and GLib with profiling and run
GTKG, to see where GTK/Glib spend their time...  I'm only interested in
making the core fast and efficient, but the GUI guys might find this
useful to do.

I can't really optimize patter_qsearch() further, but I can optimize
ev_link a bit, I think.  At the levels of usage here, it won't make much
of a difference though.  Gaining even 50% on a routine used 3.26% of the
time is only a 1.63% gain.

And now the change list:

- (2003.02.28 - RAM)
 * The `dl_by_time' list was split into a hashed array, to cut down the number
   of item comparisons we do when inserting, at the cost of more work when
   traversing the list.
 * Optimized filter_apply() by pre-computing the length of the matching text
   at filter "compilation" time, to avoid endless strlen() calls.
 * Optimized search_has_muid(): sent MUIDs are now stored in a hash table.
 * Turned old g_return_if_fail() from legacy code into a g_assert() (tx_link.c).
 * Added some debugging traces whilst chasing the "connection freeze" problem
   (tx_deflate.c).
 * Fixed mq_service(): need to saturate the lower layer whilst running, lest
   we run the risk of never being called again when the lower layer has
   finished flushing its data => the TX side freezes.

- (2003.02.27 - RAM)
 * Optimized message queue to avoid huge CPU utilization during flow-control.
 * Optimized mq_service() to avoid useless I/O vector building.
 * Don't encode '/' in the query portion of URLs.

- (2003.02.27 - Richard)
 * event.[ch]: Added as a replacement for listener.h. For now it's used only
   in the property system. The bonus over listener.h: control of callback 
   frequency on a per-listener base.
 * Added seperate property to control debug level of the property system.
 * props.tpl: Upgraded to use event.[ch], regenerated property sources.
 * prop.[ch]: 
   - Upgraded to use event.[ch]
   - Added prop_get_by_name(): look up a property by name.
   - Added prop_to_string(): converting a property to string reprsenentation, 
     doesn't work for all property types yet.
 * settings_gui.c: Upgraded property_map to make use of the new frequency
   control feature of prop.[ch]
 * shell.c: 
   - No longer show auth cookie in debug output
   - New command: "PRINT [property]" to display current value of a property

Raphael


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Gtk-gnutella-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

[Gtk-gnutella-devel] CVS update 28/02/2003

Reply via email to