On 18.05.2011 16:19, Steven Dake wrote:
On 05/18/2011 06:27 AM, Jerome Flesch wrote:
---
  lib/coroipcc.c |   15 +++++++++++++--
  1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/coroipcc.c b/lib/coroipcc.c
index 34db980..c03925c 100644
--- a/lib/coroipcc.c
+++ b/lib/coroipcc.c
@@ -847,6 +847,17 @@ coroipcc_dispatch_get (
                return (error);
        }

+       if (shared_mem_dispatch_bytes_left (ipc_instance)>  
(ipc_instance->dispatch_size/2)) {
+               /*
+                * Notify coroipcs to flush any pending dispatch messages
+                */
+               res = ipc_sem_post (ipc_instance->control_buffer, 
SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT);
+               if (res != CS_OK) {
+                       error = CS_ERR_LIBRARY;
+                       goto error_put;
+               }
+       }
+
        *data = NULL;


Could you explain this change in more detail?  These types of semaphores
should only be triggered during a request, or a flush should happen, or
we should exit.

Basically, without this patch, I had pretty often an issue: My client process ended up blocked on cpg_dispatch_get() (on L876:poll()), and Corosync's thread was waiting on this semaphore (exec/coroipcs.c:pthread_ipc_consumer():L682). Flow control appeared to be stuck enabled on Corosync side.

The context was the following:
- The client process is only reading messages from CPG, never writing
- The client process uses cpg_dispatch() in blocking mode (CS_DISPATCH_BLOCKING)
- --enable-small-memory-footprint was used when compiling Corosync
- The client process is slower than Corosync. On our systems, Corosync has a nice of -18, and the client process of -5 only, and our client has some work to do each time it receives a message. - Messages received by the client have a size close the one of the SHM (-> slightly less than 64KB)

I think the problem is that, in lib/cpg.c, between cpg_dispatch_get() cpg_dispatch_put(), user callbacks are called, and they take quite a long time (mine has currently a bunch of malloc()/realloc() to do). During their execution, Corosync may try to stack other messages in the SHM. But if it is full, it will go wait on the semaphore SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT instead. Thing is, once user callbacks are finished, coroipcc_dispatch_put() is called and mark a part of the SHM as freed. However, when coroipcc_dispatch_get() will be called again, currently, it will immediately call poll(), without waking up Corosync first. Which ends up as described above: blocked.

Also, it makes sense to me to let Corosync stack new messages as soon as possible. This is why I have left in coroipcc_dispatch_get() the other call to ipc_sem_post(SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT) (right after the one to socket_recv()).


Regards
-steve
        ufds.fd = ipc_instance->fd;
@@ -874,11 +885,11 @@ coroipcc_dispatch_get (
        error = socket_recv (ipc_instance->fd,&buf, 1);
        assert (error == CS_OK);

-       if (shared_mem_dispatch_bytes_left (ipc_instance)>  500000) {
+       if (shared_mem_dispatch_bytes_left (ipc_instance)>  
(ipc_instance->dispatch_size/2)) {
                /*
                 * Notify coroipcs to flush any pending dispatch messages
                 */
-               
+

Can you remoe this whitespace change and submit this change as a
separate patch.  Then it should be suitable for merging.
                res = ipc_sem_post (ipc_instance->control_buffer, 
SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT);
                if (res != CS_OK) {
                        error = CS_ERR_LIBRARY;




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to