May I repost this, and see if there's any feedback?

Thanks,

Michele

---------- Forwarded message ----------
From: Michele Vallisneri <[email protected]>
Date: Thu, Nov 7, 2013 at 3:26 PM
Subject: Re: [Dmtcp-forum] Problem with ipc/sysv plugin
To: Kapil Arya <[email protected]>

Thanks Kapil!

I can send out a couple of simple executables that show the problem, which
arises when one of them creates the shmem with "shmget(key,size,flags)" and
the other accesses it with "shmget(key,0,0)", which is allowed. If the
second executable ends up being the checkpoint coordinator, I think it uses
the wrong information to reconnect to the shmem post-restart.

However, in the meantime I've experimented a little, and it seems to me
that the problem is fixed with this simple change:

diff -r tmp/dmtcp-2.0/plugin/ipc/sysv/sysvipc.cpp
dmtcp-2.0/plugin/ipc/sysv/sysvipc.cpp

515c515

<   if (key == -1) {

---

>   if (key == -1 || size == 0) {

...so that dmtcp::ShmSegment::ShmSegment calls shmctl to get the actual
size if it is passed a size argument of 0. Does this make sense to you?

Michele


On Wed, Nov 6, 2013 at 6:37 PM, Kapil Arya <[email protected]> wrote:

> Hi Michele,
>
> Would it be possible for you to provide us a smaller test case to
> reproduce the failure locally? This will help us in finding the solution
> rather quickly. If not, we can try some alternate route.
>
> Best,
> Kapil
>
>
> On Mon, Nov 4, 2013 at 9:06 PM, Michele Vallisneri <[email protected]>wrote:
>
>> Dear DMTCP maintainers,
>>
>> thanks for the great job you're doing on this piece of software.
>>
>> I'm trying to checkpoint a pair of client/server applications that
>> communicate by way of sysv shared memory. The server creates and writes to
>> a shared memory block, the client attaches to it and reads from it.
>>
>> I'm having intermittent problems where one of the processes fails to
>> "shmat" the shared memory segment to its own address space, so I get an
>> error from the "JASSERT (_real_shmat...)" line in
>> dmtcp::ShmSegment::postRestart().
>>
>> I suspect (but I'm not sure) that this is because upon checkpointing the
>> client is sometimes made checkpoint leader, and information about the
>> segment is then obtained from the client's "shmget", which has a size of
>> zero (the client does not know it in advance). Does this sound like a
>> possible explanation?
>>
>> Thanks a lot for your help!
>>
>> Michele
>>
>>
>> ------------------------------------------------------------------------------
>> November Webinars for C, C++, Fortran Developers
>> Accelerate application performance with scalable programming models.
>> Explore
>> techniques for threading, error checking, porting, and tuning. Get the
>> most
>> from the latest Intel processors and coprocessors. See abstracts and
>> register
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dmtcp-forum mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>>
>
------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to