Hey.

I've been using the pth library (version 1.3.7) for a while and i
think it's pretty cool. 

i did find something which i think is a bug. non-blocking connect
in threads does not seem to be handled properly. what i mean is,
if i use pth_connect() on a blocking file descriptor, everything
works the way it is supposed to work. but if i set the file 
descriptor to non-blocking mode, things don't work with pth_connect(), 
and i have to use connect() instead. so this effectively prevents me 
from using the soft- or hard- syscalls features, because if i use
them, i lose access to the normal connect() call.

i haven't really checked the pth source code to see what causes
this problem, but it seems that the obvious fix would be to make
pth_connect() check if the file descriptor is already set to
non-blocking mode, and if so, simply invoke connect() with no
other handling. at least the "wrapped" (soft- or hard-) syscalls
should do this. other calls besides connect() may have this problem
too, i have not checked, but i think they probably do, and so that 
should be fixed there as well.

another thing that needs work is that pth uses too much cpu time.
after profiling my application, i found the following:

        1. calls to sigprocmask() made by libpth occupied more than
60% of the total running time of my application. these were totally
unnecessary since my application is not supposed to handle signals
whatsoever (SIGPIPE is ignored at start up and other signals simply
do not occur).

        2. calls to functions pth_util_fts_test() and pth_util_fts_merge()
occupied a whole lot of time unnecessarily. actually they occupied more 
time than all of the other functions of the libpth scheduler.

for the first thing, i would suggest you do something to avoid calling
this kernel function without need. perhaps you can remember the current
signal mask in a variable, and if the new signal mask your trying to set
is identical to the current one, just don't set it. to make sure the
user does not set a signal mask without you knowing, you could add a
soft- or hard- system call wrapper, similar to the ones for the blocking
functions.

for the second problem, there are two things you can do:
        
        1. take "if" statements out of the "for" loops, and create three
separate "for" loops instead. this ought to provide some speed up.

        2. manipulate file descriptor sets using bit operations on bytes
(or even better, integers), instead of manipulating them bit by bit. 
this will provide tremendous speed up. i know it's not "portable", but 
it will work for most systems. and you can test whether this works in 
the configure process, and if so, enable this feature with some #define
in the source. if it doesn't work, you can still fall back on the old 
method.

well, i think the description of the problems is clear enough.
if not, let me know what you don't understand. 

i should say that cpu usage is of serious concern for me, probably for
other people too. with the current version of libpth, my threaded server
application could only handle up to 100 clients at a time, and it used
100% cpu time for that. when i modified libpth to fix the problems
described above, it could handle more than 200 clients at a time
without even loading the cpu to 100%. the source patches that i made 
are not suitable for including in the official libpth, but i'm sure 
you could make much better fixes than i did.

have fun.
______________________________________________________________________
GNU Portable Threads (Pth)            http://www.gnu.org/software/pth/
User Support Mailing List                            [EMAIL PROTECTED]
Automated List Manager (Majordomo)           [EMAIL PROTECTED]

Reply via email to