Thanks, Guy, I'll try this debugging. Howeer, befor that I want to make sure I'm using the correct mechanism for timeout. For example, within the thread that needs to wait 10 seconds, the sample code is as follows: struct timespec tmspec; int retcode;
Clock_gettime(CLOCK_REALTIME, &tmspec); Tmspec.tv_sec += (seconds to wait until timeout). Then within the loop: lock mutex, retcode = pthread_cond_timedwait(&cond, &mutex, &tmspec); Unlock mutex and then check if retcode == ETIMEDOUT, process on. Does this seem to be corrct, specifically the seconds I add to tmspec.tv_sec? Thanks, Rafi. -----Original Message----- From: guy keren [mailto:[EMAIL PROTECTED] Sent: Sunday, November 11, 2007 12:23 AM To: Rafi Cohen Cc: 'Gilad Ben-Yossef'; linux-il@cs.huji.ac.il Subject: Re: concurrent timers on linux i think you have a simple bug in your code that causes the behaviour you're talking about. i would suggest that, as an exercise, you write a program with only 2 threads, have one of them wait (with pthread_cond_timedwait) for 10 seconds and then print a message with the thread ID and current time, and the second thread wait for 15 seconds and print a similar message. run this code in a loop, and see if you can get the threads to work as expected (i.e. one prints a message every 10 seconds, the other prints it every 15 seconds). i guess that if you get this working properly, you'll be able to see why your program is not working as expected. --guy Rafi Cohen wrote: > Hi Gilad, first, thanks for your efforts to help. > I'll try to give a brief explanation of what I'm trying to do. Well, > I'll not use the word "timer, but "timeout". In any case, I'll be glad > to hear from you if, still, timers were not the correct wording here. > I'm working as a freelancer for a company involved in cellular > communications and I was asked to write a kind of connections manager > application. > My application has to be able to manage concurrent (parallel) > connections to some multicell systems, which may be roughly defined as > celllular centrals. > Those multicell systems exchange messages with my application for some > tasks to be done by my application. In addition, there is a kind of > software (let's call it client software) that knows to communicate with > those multicell systems through my application. So, basically there may > be such client software used on various other computers at the same time > and should be connected to my application concurrently. > And yet also, my application has to connect to an ftp server to upload > some kind of files. > As I said, all connections have to be concurrent and not blocking > others. > I chose for this the multithread approach, where each connection is a > thread. > Yeah, I know, many people would advise using non-blocking sockets here, > but this is my weak part and I was in ahurry, so I decided to postpone > this learning curve to a later occasion and chose multithreads. > Why timeouts? For example, a timeout to block the ftp thread, which > after it there is a reconnection to ftp server for upload. > In case there is no connection to any of the multicell systems, a > timeout to wait and then retry connection again. > In case of connection with what I called client software, I have a very > simple ling-pong protocol to make sure there is connection between this > software and my application and again after my application sends a > "ping" it should wait for an adjustable time (by default 10 seconds) for > the pong reply. > So, as you see, various timeouts and all of them should be able to > proceed cuncurrently and unaffected by any other one. > Unfortunately, what I see in my case is that the most recent timeout > "takes the lead" on all other blocked threads at that time and affects > their timeouts. > So for example, if the recent timeout was the one for the ftp and it is > for a couple of hours, all the other threads that were blocked for their > timeouts, remain blocked all along the last one. > And here lies my problem. If you say that the system has a single timer > than that may well be the problem, and each thread needs to have it's > own independent timer (timer again?). > So, I do think there is a problem with my strategy. > If you say thet pthread_cond_timedwait blocks the whole system, this is > bad. I intended to block a single thread, and that's what I thought it > should do. > Concerning clock_gettime function, there is an alternative of > CLOCK_THREAD_CPUTIME_ID as it's first argument. I did not try it yet, > but may be this could lead to a better solution. > Ah, very lengthy message, still clear I hope. I'll be glad to continue > and receive assistance. > Thanks Gilad, Rafi. > > -----Original Message----- > From: Gilad Ben-Yossef [mailto:[EMAIL PROTECTED] > Sent: Saturday, November 10, 2007 10:23 AM > To: Rafi Cohen > Cc: linux-il@cs.huji.ac.il > Subject: Re: concurrent timers on linux > > > Rafi Cohen wrote: > >> My application is a multithread one, for which each thread has it's >> own >> timer and tasks upon timeout. Each timer may be different (varies from > >> 10 seconds in one case to 24 hours in another). >> Each timer should also be independent and the threads should run >> concurrently withou actually affecting other threads and timers. > > POSIX only supply a single timer counting in real time for each > process. > > Multiple timers are either created by the programmer based on this > single timer using a timer heap or the same done by the threading > library pthread create_timer. > >> It seems that there still is some effect of some timers to others and >> I >> don't achieve the goal of total independency of timers. > > There is just one timer. > >> Here is the strategy I decided to use, by the way, my code is in C: >> Each thread uses in a loop the function pthread_cond_timedwait. Part >> of them may exit either upon signal or time out and part of them >> actually waits until timeout. > > This is not a timer. It's a blocking system call with a timeout. > >> The problem I think I see is the timer adjustment prior to using >> pthread_cond_timedwait. > > What timer? you're using a blocking system call with a timeout. > >> I sould also note that each thread has it's own and unique mutex and >> condition variables for this function. The time adjustment is done by >> calling clock_gettime with > CLOCK_REALTIME >> as it's first parameter. >> This is called before each use of pthread_cond_timedwait and > immediately >> after call to clock_gettime I add the timeout seconds to the tv_sec >> field of the timespec structure. >> Now, to my questions: >> 1. Does this strategy seem correct, if not please give other ideas. > > That's how it is supposed to be used > >> 2. Specifically, is the function clock_gettime the correct one to >> use, should it's first parameter be the one I use and should it be >> called indeed before each pthread_cond_timedwait within the loop, or >> only > once >> before the loop. > > Anyway, clock_gettime is the correct one. CLOCK_REALTIME is correct > and > you need to call it only once before entering the loop. > > The man page has an excellent code example code: > > http://linux.die.net/man/3/pthread_cond_timedwait > > Search for "Timed Condition Wait" > >> I'll be glad to have detailed ideas, in case you think I'm wrong >> here. > > I don't think I understand what you're trying to achieve exactly. > >> 3. Shachar (Shemes), you pointed me once to the libevent library as >> an alternative. I looked into this library and was very willing to >> use it. However, I understood from it's documentation that it is not >> threadsafe. Therefore, it seems not to be the right idea to use it >> for concurrent timers. > > Timers? > "You keep saying that word. I don't think you know what it means..." > :-) > >> Am I wrong here? I'll be glad to stand corrected and use such option >> instead of the strategy I mentioned above. Any assistance will be >> most appreciated. > > A description of what you are trying to do will do wonders here, I > think. > > And why threads, anyway? > >> Note: I raise programming issues here from time to time and get good >> answers most of the time. However, probably there are more linux >> users > >> than programmers here. So, if you think there should be a better >> forum > >> or mailing list to raise such questions, then please let me know. I >> do think however, that there are here really knowledgeable people > that >> may and I believe will help the best they can. > > This happens to be the most technical Linux oriented mailing list in > Israel. > > gilad > -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.15.27/1121 - Release Date: 11/9/2007 7:29 PM ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]