Thanks, Guy, I'll try this debugging. Howeer, befor that I want to make
sure I'm using the correct mechanism for timeout.
For example, within the thread that needs to wait 10 seconds, the sample
code is as follows:
struct timespec tmspec;
int retcode;
Clock_gettime(CLOCK_REALTIME, &tmspec);
Tmspec.tv_sec += (seconds to wait until timeout).
Then within the loop: lock mutex, retcode =
pthread_cond_timedwait(&cond, &mutex, &tmspec);
Unlock mutex and then check if retcode == ETIMEDOUT, process on.
Does this seem to be corrct, specifically the seconds I add to
tmspec.tv_sec?
Thanks, Rafi.
-----Original Message-----
From: guy keren [mailto:[EMAIL PROTECTED]
Sent: Sunday, November 11, 2007 12:23 AM
To: Rafi Cohen
Cc: 'Gilad Ben-Yossef'; linux-il@cs.huji.ac.il
Subject: Re: concurrent timers on linux
i think you have a simple bug in your code that causes the behaviour
you're talking about.
i would suggest that, as an exercise, you write a program with only 2
threads, have one of them wait (with pthread_cond_timedwait) for 10
seconds and then print a message with the thread ID and current time,
and the second thread wait for 15 seconds and print a similar message.
run this code in a loop, and see if you can get the threads to work as
expected (i.e. one prints a message every 10 seconds, the other prints
it every 15 seconds).
i guess that if you get this working properly, you'll be able to see why
your program is not working as expected.
--guy
Rafi Cohen wrote:
Hi Gilad, first, thanks for your efforts to help.
I'll try to give a brief explanation of what I'm trying to do. Well,
I'll not use the word "timer, but "timeout". In any case, I'll be glad
to hear from you if, still, timers were not the correct wording here.
I'm working as a freelancer for a company involved in cellular
communications and I was asked to write a kind of connections manager
application.
My application has to be able to manage concurrent (parallel)
connections to some multicell systems, which may be roughly defined as
celllular centrals.
Those multicell systems exchange messages with my application for some
tasks to be done by my application. In addition, there is a kind of
software (let's call it client software) that knows to communicate
with
those multicell systems through my application. So, basically there
may
be such client software used on various other computers at the same
time
and should be connected to my application concurrently.
And yet also, my application has to connect to an ftp server to upload
some kind of files.
As I said, all connections have to be concurrent and not blocking
others.
I chose for this the multithread approach, where each connection is a
thread.
Yeah, I know, many people would advise using non-blocking sockets
here,
but this is my weak part and I was in ahurry, so I decided to postpone
this learning curve to a later occasion and chose multithreads.
Why timeouts? For example, a timeout to block the ftp thread, which
after it there is a reconnection to ftp server for upload.
In case there is no connection to any of the multicell systems, a
timeout to wait and then retry connection again.
In case of connection with what I called client software, I have a
very
simple ling-pong protocol to make sure there is connection between
this
software and my application and again after my application sends a
"ping" it should wait for an adjustable time (by default 10 seconds)
for
the pong reply.
So, as you see, various timeouts and all of them should be able to
proceed cuncurrently and unaffected by any other one.
Unfortunately, what I see in my case is that the most recent timeout
"takes the lead" on all other blocked threads at that time and affects
their timeouts.
So for example, if the recent timeout was the one for the ftp and it
is
for a couple of hours, all the other threads that were blocked for
their
timeouts, remain blocked all along the last one.
And here lies my problem. If you say that the system has a single
timer
than that may well be the problem, and each thread needs to have it's
own independent timer (timer again?).
So, I do think there is a problem with my strategy.
If you say thet pthread_cond_timedwait blocks the whole system, this
is
bad. I intended to block a single thread, and that's what I thought it
should do.
Concerning clock_gettime function, there is an alternative of
CLOCK_THREAD_CPUTIME_ID as it's first argument. I did not try it yet,
but may be this could lead to a better solution.
Ah, very lengthy message, still clear I hope. I'll be glad to continue
and receive assistance.
Thanks Gilad, Rafi.
-----Original Message-----
From: Gilad Ben-Yossef [mailto:[EMAIL PROTECTED]
Sent: Saturday, November 10, 2007 10:23 AM
To: Rafi Cohen
Cc: linux-il@cs.huji.ac.il
Subject: Re: concurrent timers on linux
Rafi Cohen wrote:
My application is a multithread one, for which each thread has it's
own
timer and tasks upon timeout. Each timer may be different (varies
from
10 seconds in one case to 24 hours in another).
Each timer should also be independent and the threads should run
concurrently withou actually affecting other threads and timers.
POSIX only supply a single timer counting in real time for each
process.
Multiple timers are either created by the programmer based on this
single timer using a timer heap or the same done by the threading
library pthread create_timer.
It seems that there still is some effect of some timers to others and
I
don't achieve the goal of total independency of timers.
There is just one timer.
Here is the strategy I decided to use, by the way, my code is in C:
Each thread uses in a loop the function pthread_cond_timedwait. Part
of them may exit either upon signal or time out and part of them
actually waits until timeout.
This is not a timer. It's a blocking system call with a timeout.
The problem I think I see is the timer adjustment prior to using
pthread_cond_timedwait.
What timer? you're using a blocking system call with a timeout.
I sould also note that each thread has it's own and unique mutex and
condition variables for this function. The time adjustment is done by
calling clock_gettime with
CLOCK_REALTIME
as it's first parameter.
This is called before each use of pthread_cond_timedwait and
immediately
after call to clock_gettime I add the timeout seconds to the tv_sec
field of the timespec structure.
Now, to my questions:
1. Does this strategy seem correct, if not please give other ideas.
That's how it is supposed to be used
2. Specifically, is the function clock_gettime the correct one to
use, should it's first parameter be the one I use and should it be
called indeed before each pthread_cond_timedwait within the loop, or
only
once
before the loop.
Anyway, clock_gettime is the correct one. CLOCK_REALTIME is correct
and
you need to call it only once before entering the loop.
The man page has an excellent code example code:
http://linux.die.net/man/3/pthread_cond_timedwait
Search for "Timed Condition Wait"
I'll be glad to have detailed ideas, in case you think I'm wrong
here.
I don't think I understand what you're trying to achieve exactly.
3. Shachar (Shemes), you pointed me once to the libevent library as
an alternative. I looked into this library and was very willing to
use it. However, I understood from it's documentation that it is not
threadsafe. Therefore, it seems not to be the right idea to use it
for concurrent timers.
Timers?
"You keep saying that word. I don't think you know what it means..."
:-)
Am I wrong here? I'll be glad to stand corrected and use such option
instead of the strategy I mentioned above. Any assistance will be
most appreciated.
A description of what you are trying to do will do wonders here, I
think.
And why threads, anyway?
Note: I raise programming issues here from time to time and get good
answers most of the time. However, probably there are more linux
users
than programmers here. So, if you think there should be a better
forum
or mailing list to raise such questions, then please let me know. I
do think however, that there are here really knowledgeable people
that
may and I believe will help the best they can.
This happens to be the most technical Linux oriented mailing list in
Israel.
gilad