I think I've finally figured out what's going wrong in my module but am unsure 
what to do about it.

The module runs on apache 2.2.22 with mpm prefork. Occasionally I am seeing 
corruption of the output bucket brigade, primarily the ring pointers 
(link->next and link->prev) ending up with strange values.

Having spent some time reading the source, I believe that apache provides no 
protection to these pointers, and it's inherently unsafe for a bucket brigade 
to be used by more than one thread (even if you are careful with allocators), 
unless all callers provide their own mutex protection. As apache itself uses 
the output bucket brigade without mutex protection, the output bucket brigade 
can never be written to by other threads, and therefore ap_fwrite (to this 
brigade) can never be safely by any thread other than the main thread. First 
question: is this correct?

My module is currently structured as follows.

The main thread creates another thread for each request (the requests are long 
running websocket connections).

The main thread does the following:

  while (!done)
  {
    /* Blocking read */
    apr_brigade_create;
    ap_get_brigade;
    apr_brigade_flatten;

    /* Do stuff with the data */
    blocking_socket_write;
  }

The spawned thread does the following

  while (!done)
  {
     blocking_socket_read;
   
     /* do stuff with the data */
     ap_fwrite(output_bucket_brigade);

   }
    
Now, what I believe is happening is as follows. The blocking read in the main 
thread at some point calls select(), and does not only do a read, but also also 
a write of the data in the output bucket brigade. This removes a bucket from 
the ring. If this is happens at the same time as the ap_fwrite in the spawned 
thread adds something to the output ring, two threads will be accessing the 
ring pointers at once.

What I can't figure out is how to fix this.

I can't put in a mutex to protect the ring pointers, because the access to the 
ring pointers by apache is outside of my module.

I can't hold a mutex across the blocking read in the main thread, because 
otherwise my module won't be able to write data to the output bucket brigade 
whilst there is no input from the apache client; as the apache client may be 
waiting for data to be sent to it, this could cause deadlock.

And I can't obviously see how to do the read in a non-blocking way.

Any ideas?

-- 
Alex Bligh

Reply via email to