On 18.05.2011 16:19, Steven Dake wrote:
Basically, without this patch, I had pretty often an issue: My client process ended up blocked on cpg_dispatch_get() (on L876:poll()), and Corosync's thread was waiting on this semaphore (exec/coroipcs.c:pthread_ipc_consumer():L682). Flow control appeared to be stuck enabled on Corosync side.On 05/18/2011 06:27 AM, Jerome Flesch wrote:--- lib/coroipcc.c | 15 +++++++++++++-- 1 files changed, 13 insertions(+), 2 deletions(-)diff --git a/lib/coroipcc.c b/lib/coroipcc.c index 34db980..c03925c 100644 --- a/lib/coroipcc.c +++ b/lib/coroipcc.c @@ -847,6 +847,17 @@ coroipcc_dispatch_get ( return (error); } + if (shared_mem_dispatch_bytes_left (ipc_instance)> (ipc_instance->dispatch_size/2)) { + /* + * Notify coroipcs to flush any pending dispatch messages + */ + res = ipc_sem_post (ipc_instance->control_buffer, SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT); + if (res != CS_OK) { + error = CS_ERR_LIBRARY; + goto error_put; + } + } + *data = NULL;Could you explain this change in more detail? These types of semaphores should only be triggered during a request, or a flush should happen, or we should exit.
The context was the following: - The client process is only reading messages from CPG, never writing- The client process uses cpg_dispatch() in blocking mode (CS_DISPATCH_BLOCKING)
- --enable-small-memory-footprint was used when compiling Corosync- The client process is slower than Corosync. On our systems, Corosync has a nice of -18, and the client process of -5 only, and our client has some work to do each time it receives a message. - Messages received by the client have a size close the one of the SHM (-> slightly less than 64KB)
I think the problem is that, in lib/cpg.c, between cpg_dispatch_get() cpg_dispatch_put(), user callbacks are called, and they take quite a long time (mine has currently a bunch of malloc()/realloc() to do). During their execution, Corosync may try to stack other messages in the SHM. But if it is full, it will go wait on the semaphore SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT instead. Thing is, once user callbacks are finished, coroipcc_dispatch_put() is called and mark a part of the SHM as freed. However, when coroipcc_dispatch_get() will be called again, currently, it will immediately call poll(), without waking up Corosync first. Which ends up as described above: blocked.
Also, it makes sense to me to let Corosync stack new messages as soon as possible. This is why I have left in coroipcc_dispatch_get() the other call to ipc_sem_post(SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT) (right after the one to socket_recv()).
Regards -steveufds.fd = ipc_instance->fd; @@ -874,11 +885,11 @@ coroipcc_dispatch_get ( error = socket_recv (ipc_instance->fd,&buf, 1); assert (error == CS_OK); - if (shared_mem_dispatch_bytes_left (ipc_instance)> 500000) { + if (shared_mem_dispatch_bytes_left (ipc_instance)> (ipc_instance->dispatch_size/2)) { /* * Notify coroipcs to flush any pending dispatch messages */ - +Can you remoe this whitespace change and submit this change as a separate patch. Then it should be suitable for merging.res = ipc_sem_post (ipc_instance->control_buffer, SEMAPHORE_REQUEST_OR_FLUSH_OR_EXIT); if (res != CS_OK) { error = CS_ERR_LIBRARY;
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais