> > Kern: I can commit this if it's okay with you? > > Thanks for finding this, but please do not commit the above patch at least for > the moment. The problem is indeed a race condition as you found. Your fix > does significantly reduce the window for it to happen but does not eliminate > it. > > The correct solution is to put the jq mutex lock/unlock (i.e. P/V) around the > test for max_workers at line 357 of jobq.c and to move the incrementing of > the count as you have done. However, it is even more complicated because > you only removed one of the many decrements of num_workers from jobq_server, > and it is not so easy to remove them all. In addition for the jobq_server to > work correctly, it must be able to release and reset the lock. Some careful > thought needs to go into this. Possibly the solution is to just pull out the > main P() and V()s from the jobq_server routine on entry and exit but to leave > the 3 places in the code where the lock is temporarily released. > > Unless I hear otherwise from you, I will assume that you want to create the > patch. >
The calls to start_server are already protected by P() and V() so everything in start_server is fine. I moved the increment from the thread to start_server to make the 'test then increment' an atomic operation, which is where the race is occurring. The decrement there is only to reverse the increment, and is again atomic. All other decrements are already protected by P() and V() as far as I can see. James ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
