Hi,
> Hi, Brad and Rafael,
>
> Brad, thanks for your sending message to GASNet community.That's very helpful.
> I'm going to try the work-arounds.
>
> Let me ask a question about "stride" issue.
> Here is part of my problematic code. If you want to see whole code, please
> take a look at my previous mail.
>
> const zero: int(32) = 0;
> var tile_array_indices = {zero..tileSize-1,zero..tileSize-1};
> class Tile {
> var tile_array: [tile_array_indices] real;
> }
>
> var temp = lkji_tiles(i,j,zero).tile_array;
>
> I would like to make sure there is no stride code in my program.
If you disable the bulkComms optimization no strided comms are used, it was
point 1 in the list from Rafael Asenjo :
1.- Not using bulkComms optimization (-suseBulkTransferStride=false
-suseBulkTransfer=false). —> Slower comms.
That is the faster solution, but as it said, the comms will be a lot slower.
You only need to put those arguments :
-suseBulkTransferStride=false -suseBulkTransfer=false
when compiling with chpl.
> Am I right to thinking the Chapel compiler compiles "var temp =
> lkji_tiles(i,j,zero).tile_array;" to just one contiguous bulk transfer?
> Anyway, I'll generate gasnet trace of my problematic code and take a look at
> it. Also, I'll try C+gasnet tests.
As Brad said, getting inside the problem is very hard, it is very hard even for
the gasnet people...
The value of GASNET_VIS_AMPIPE doesn´t help.
Also in my tests I tried the broken communication on its own, and when tested
alone it worked fine, so it seemed that some sequence of events, after doing
some strided transfers, triggered the bug.
A complete explanation seems to be here:
https://hpcrdm.lbl.gov/pipermail/upc-users/2013-April/001267.html
IMHO ibv needs a way to request a flush of its cache of virtual to physical
memory addresses...
Greets,
Rafael
> Thanks,
>
> Akihiro
>
> On Jan 29, 2014, at 6:36 PM, Brad Chamberlain wrote:
>
>>
>> One more note:
>>
>>> The strided code allocates and frees buffers for pack/unpack which can
>>> trigger the bug.
>>> One additional work around is to disable the problematic pack/unpack
>>> implementation by setting env var GASNET_VIS_AMPIPE=0
>>
>> -Brad
>>
>>
>> On Wed, 29 Jan 2014, Brad Chamberlain wrote:
>>
>>>
>>> Hi Rafael and Akihiro --
>>>
>>> I should've been a bit more patient in my response. After passing along
>>> the "strided" bit, the response came back:
>>>
>>>> The implementation of strided operations is KNOWN to trigger the bug I
>>>> referenced. So, have a look at the work-arounds in the bug report.
>>>
>>> which is here:
>>>
>>> https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495
>>>
>>> -Brad
>>>
>>>
>>> On Wed, 29 Jan 2014, Brad Chamberlain wrote:
>>>
>>>> Hi Rafael and Akihiro --
>>>> It sounds as though there isn't a known issue w.r.t. large ibv conduit
>>>> messages (I didn't catch the importance of 'strided', so sent that along
>>>> in a second message, but don't expect it'll change the response). They
>>>> wrote:
>>>>> We don't have a known error with long messages, but do have one with
>>>>> respect to free() which might be the problem. If we (ibv-conduit via
>>>>> firehose) cache a dynamic memory registration (especially problematic w/
>>>>> SEGENT_EVERYTHING), then it is possible that memory is free()d and a
>>>>> later malloc() gets the same virtual address. If that happens then ibv
>>>>> may end up performing RDMA from the physical pages corresponding to the
>>>>> PREVIOUS association for the virtual address (NOTE: the pages are
>>>>> ref-counted and thus NOT truly free and NOT mapped into some other
>>>>> process). See https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495 for
>>>>> some details on that bug. The work-arounds are to disable mmap-based
>>>>> malloc(), or disable firehose.
>>>> To me, this doesn't sound like the same thing, but I'm far enough away
>>>> from the problem that you may recognize something that I'm not.
>>>> Assuming it doesn't, how hard would it be to put together a small C+GASnet
>>>> test that exhibits this issue? Would it be as simple as sending a large
>>>> buffer in strided mode? In a loop?
>>>> As long as I was bothering them, I also asked whether there was a way to
>>>> sanity check that an executable was built with debugging on (since there
>>>> are so many ways that we could get this wrong) and got the response:
>>>>> As for checking for debugging support, the preprocessor token
>>>>> GASNET_CONFIG_STRING will tell you a lot about the configuration you've
>>>>> compiled with. We do some name-shifting to ensure you can't link with a
>>>>> library of a different configuration that you've compiled with.
>>>>> Alternatively if you have the "ident" utility for finding RCS strings,
>>>>> applying it to the executable file will extract lots of configuration
>>>>> bits. The value of GASNET_CONFIG_STRING will follow "$GASNetConfig:" You
>>>>> can fake that with:
>>>>> $ perl -n -ln044 -e 'print if /GASNetConfig:/' -- a.out
>>>> Thanks,
>>>> -Brad
>>>
>>
>
--
Rafael Larrosa Jiménez
Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es
Universidad de Málaga
EMAIL: [email protected] Edificio de Bioinnovación
TELEF: + 34951952788 C/ Severo Ochoa 34
FAX : +34951952792 Parque Tecnológico de Andalucía
29590 Málaga
(SPAIN)
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers