I found the "connection stall" problem on TX. It was systematically occurring
since I optimized mq_service() to send less data. Adding tracing to the
compressing TX layer lead me to the solution. The service routine needs
to saturate the lower layer or flush its whole data.
I'm currently doing some profiling, and was able to optimize most bottlenecks.
That does not mean the CPU usage will drop significantly if you use GTK2
though. I'm only optimizing GTKG's core routines.
Here's a sample of what profiling said before I started:
% cumulative self self total
time seconds seconds calls us/call us/call name
46.32 641.20 641.20 qlink_cmp
20.83 929.55 288.35 3094965107 0.09 0.09 gmsg_cmp
6.47 1019.05 89.50 1914867 46.74 47.45 make_room
3.48 1067.26 48.21 200775 240.12 279.74 mq_service
2.43 1100.86 33.60 372429 90.22 90.22 ev_link
1.30 1118.92 18.06 1625592 11.11 18.24 filter_apply
1.02 1133.06 14.14 11754 1202.99 1202.99 qrt_apply_patch
0.86 1144.97 11.91 70077 169.96 706.39 search_results
0.86 1156.86 11.89 2958510 4.02 4.02 SHA1ProcessMessageBlock
0.79 1167.78 10.92 106371927 0.10 0.21 SHA1Input
0.71 1177.56 9.78 13059343 0.75 0.75 pattern_qsearch
0.66 1186.66 9.10 412229 22.08 23.30 qrt_build_query_target
0.59 1194.87 8.21 dl_server_retry_cmp
0.57 1202.75 7.88 142 55492.96 198226.32 qrt_dump
Notice the obscene time used by the gmsg_cmp and qlink_cmp routines.
Those were exercised during flow control.
Yesterday evening, the profiling had turned into:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
11.17 78.92 78.92 dl_server_retry_cmp
8.38 138.13 59.21 5545088 0.01 0.01 search_has_muid
6.91 186.93 48.80 11891432 0.00 0.00 SHA1ProcessMessageBlock
5.98 229.17 42.24 3526551 0.01 0.02 filter_apply
4.38 260.10 30.93 166372 0.19 0.19 ev_link
3.36 283.83 23.73 19755618 0.00 0.00 SHA1Input
3.17 306.25 22.42 27921284 0.00 0.00 pattern_qsearch
2.50 323.93 17.68 3656995 0.00 0.01 make_room
1.92 337.51 13.58 155249357 0.00 0.00 gm_snprintf
1.91 351.04 13.53 779884 0.02 0.02 qrt_build_query_target
1.41 361.01 9.97 1290439 0.01 0.01 qlink_remove
See the place SHA1ProcessMessageBlock took?
The next target was dl_server_retry_cmp, and I optimized that this evening,
as well as filter_apply and search_has_muid.
Here's what it looks like now with a 34 minute run:
% cumulative self self total
time seconds seconds calls us/call us/call name
9.43 2.34 2.34 204177 11.46 18.69 filter_apply
4.71 3.51 1.17 1618350 0.72 0.72 pattern_qsearch
3.55 4.39 0.88 8722846 0.10 0.10 gm_snprintf
3.34 5.22 0.83 165175 5.02 6.11 make_room
3.26 6.03 0.81 13555 59.76 59.76 ev_link
2.34 6.61 0.58 30057 19.30 48.93 mq_service
2.26 7.17 0.56 45706 12.25 12.84 qrt_build_query_target
1.85 7.63 0.46 15793 29.13 29.53 gui_update_download_clear
1.73 8.06 0.43 418255 1.03 1.03 base32_encode_exactly
1.73 8.49 0.43 216915 1.98 1.98 base32_decode_alphabet
1.57 8.88 0.39 2028 192.31 552.16 download_pickup_queued
1.53 9.26 0.38 610417 0.62 0.62 strlower
1.49 9.63 0.37 53078 6.97 6.97 host_is_connected
1.45 9.99 0.36 144966 2.48 90.60 node_parse
1.45 10.35 0.36 4752 75.76 353.56 download_store
1.41 10.70 0.35 60232 5.81 5.81 qlink_remove
1.41 11.05 0.35 46400 7.54 18.24 st_search
1.29 11.37 0.32 234436 1.36 1.92 qlink_cmp
1.25 11.68 0.31 968268 0.32 0.33 zalloc
1.25 11.99 0.31 87556 3.54 24.20 search_request
The hot spot is filter_apply now. I've tried to optimize it, but it
is not obvious, and my "optimized" version was actually slower. I backed
off my changes.
It would be interesting to compile GTK2 and GLib with profiling and run
GTKG, to see where GTK/Glib spend their time... I'm only interested in
making the core fast and efficient, but the GUI guys might find this
useful to do.
I can't really optimize patter_qsearch() further, but I can optimize
ev_link a bit, I think. At the levels of usage here, it won't make much
of a difference though. Gaining even 50% on a routine used 3.26% of the
time is only a 1.63% gain.
And now the change list:
- (2003.02.28 - RAM)
* The `dl_by_time' list was split into a hashed array, to cut down the number
of item comparisons we do when inserting, at the cost of more work when
traversing the list.
* Optimized filter_apply() by pre-computing the length of the matching text
at filter "compilation" time, to avoid endless strlen() calls.
* Optimized search_has_muid(): sent MUIDs are now stored in a hash table.
* Turned old g_return_if_fail() from legacy code into a g_assert() (tx_link.c).
* Added some debugging traces whilst chasing the "connection freeze" problem
(tx_deflate.c).
* Fixed mq_service(): need to saturate the lower layer whilst running, lest
we run the risk of never being called again when the lower layer has
finished flushing its data => the TX side freezes.
- (2003.02.27 - RAM)
* Optimized message queue to avoid huge CPU utilization during flow-control.
* Optimized mq_service() to avoid useless I/O vector building.
* Don't encode '/' in the query portion of URLs.
- (2003.02.27 - Richard)
* event.[ch]: Added as a replacement for listener.h. For now it's used only
in the property system. The bonus over listener.h: control of callback
frequency on a per-listener base.
* Added seperate property to control debug level of the property system.
* props.tpl: Upgraded to use event.[ch], regenerated property sources.
* prop.[ch]:
- Upgraded to use event.[ch]
- Added prop_get_by_name(): look up a property by name.
- Added prop_to_string(): converting a property to string reprsenentation,
doesn't work for all property types yet.
* settings_gui.c: Upgraded property_map to make use of the new frequency
control feature of prop.[ch]
* shell.c:
- No longer show auth cookie in debug output
- New command: "PRINT [property]" to display current value of a property
Raphael
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Gtk-gnutella-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel