> > Kern: I can commit this if it's okay with you?
> 
> Thanks for finding this, but please do not commit the above patch at
least for
> the moment.  The problem is indeed a race condition as you found.
Your fix
> does significantly reduce the window for it to happen but does not
eliminate
> it.
> 
> The correct solution is to put the jq mutex lock/unlock (i.e. P/V)
around the
> test for max_workers at line 357 of jobq.c and to move the
incrementing of
> the count as you have done.   However, it is even more complicated
because
> you only removed one of the many decrements of num_workers from
jobq_server,
> and it is not so easy to remove them all.  In addition for the
jobq_server to
> work correctly, it must be able to release and reset the lock.  Some
careful
> thought needs to go into this.  Possibly the solution is to just pull
out the
> main P() and V()s from the jobq_server routine on entry and exit but
to leave
> the 3 places in the code where the lock is temporarily released.
> 
> Unless I hear otherwise from you, I will assume that you want to
create the
> patch.
> 

The calls to start_server are already protected by P() and V() so
everything in start_server is fine.

I moved the increment from the thread to start_server to make the 'test
then increment' an atomic operation, which is where the race is
occurring. The decrement there is only to reverse the increment, and is
again atomic.

All other decrements are already protected by P() and V() as far as I
can see.

James


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to