Perhaps I'm missing the point here, but I'm not sure why B should lock() the list and then wait for somebody else to alter the list through gwlist_remove_producer().

It sound perfectly reasonable to me to lock() and later unlock() the list in gwlist_remove_producer(). Furthermore, putting unlock as the first line in gwlist_consume does not sound logical to me.
So, I see no need to change anything in gwlib.
AFAIK, the whole problem arises from problematic usage of gwlib by B. Correct me if I'm wrong, but, perhaps...

B does:
   lock(list); /* atomic lock */

       list->single_operation_lock->owner = -1;
unlock(list); /*let the list be altered from some other thread, and then wait for that alteration*/ pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex);


There is, however, the basic question behind the whole problem: why is list->single_operation_lock->owner set to -1? If that change was the point of lock()ing and unlock()ing the list, perhaps a change in software design could help...

Andrija Petrovic

Andreas Fink wrote:
We found a severe bug in gwlib.

We have the following scenario:

A  calls debug("xxx",0,"xxxx") which does :

                gwlist_add_producer(writers);

and continues but doesnt reach yet this line:

                gwlist_remove_producer(writers);


at this point the list "writers" is empty but has writers->num_producers=1

B does:
    lock(list); /* atomic lock */

        list->single_operation_lock->owner = -1;
pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex);

so it waits until A is calling gwlist_remove_producer()

and wait until A completes.


Now A is calling this:

void gwlist_remove_producer(List *list)
{
    lock(list);
    gw_assert(list->num_producers > 0);
    --list->num_producers;
    pthread_cond_broadcast(&list->nonempty);
    unlock(list);
}

and gets locked up because the list's atomic lock is locked by B.


C now has a new debug message and gets stopped at  gwlist_produce().


In other words, every process who wants to write to debug log gets stuck.

Now there is different solutions to this.
Our approach would be to do in gwlist_consume() to do this:


unlock(list);
pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex);
        lock(list);


Any other ideas?
maybe no atomic lock around gwlist_remove_producer() ?


Andreas Fink

Fink Consulting GmbH
Global Networks Schweiz AG
BebbiCell AG

---------------------------------------------------------------
Tel: +41-61-6666330 Fax: +41-61-6666331  Mobile: +41-79-2457333
Address: Clarastrasse 3, 4058 Basel, Switzerland
E-Mail:  [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
www.finkconsulting.com www.global-networks.ch www.bebbicell <http://www.bebbicell>.ch
---------------------------------------------------------------
ICQ: 8239353 MSN: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> AIM: smsrelay Skype: andreasfink
Yahoo: finkconsulting SMS: +41792457333






Reply via email to