Hi,

> Hi, Brad and Rafael,
> 
> Brad, thanks for your sending message to GASNet community.That's very helpful.
> I'm going to try the work-arounds.
> 
> Let me ask a question about "stride" issue.
> Here is part of my problematic code. If you want to see whole code, please 
> take a look at my previous mail.
> 
> const zero: int(32) = 0;
> var tile_array_indices = {zero..tileSize-1,zero..tileSize-1};
> class Tile {
>   var tile_array: [tile_array_indices] real;
> }
> 
> var temp = lkji_tiles(i,j,zero).tile_array;
> 
> I would like to make sure there is no stride code in my program. 

If you disable the bulkComms optimization no strided comms are used, it was 
point 1 in the list from Rafael Asenjo :

1.- Not using bulkComms optimization (-suseBulkTransferStride=false 
-suseBulkTransfer=false). —> Slower comms.

That is the faster solution, but as it said, the comms will be a lot slower. 
You only need to put those arguments :

-suseBulkTransferStride=false -suseBulkTransfer=false

when compiling with chpl.

> Am I right to thinking the Chapel compiler compiles "var temp = 
> lkji_tiles(i,j,zero).tile_array;" to just one contiguous bulk transfer?

> Anyway, I'll generate gasnet trace of my problematic code and take a look at 
> it. Also, I'll try C+gasnet tests.

As Brad said, getting inside the problem is very hard, it is very hard even for 
the gasnet people...

The value of GASNET_VIS_AMPIPE doesn´t help.

Also in my tests I tried the broken communication on its own, and when tested 
alone it worked fine, so it seemed that some sequence of events, after doing 
some strided transfers, triggered the bug.

A complete explanation seems to be here:

https://hpcrdm.lbl.gov/pipermail/upc-users/2013-April/001267.html

IMHO ibv needs a way to request a flush of its cache of virtual to physical 
memory addresses...

Greets,

Rafael

> Thanks,
> 
> Akihiro
> 
> On Jan 29, 2014, at 6:36 PM, Brad Chamberlain wrote:
> 
>> 
>> One more note:
>> 
>>> The strided code allocates and frees buffers for pack/unpack which can
>>> trigger the bug.
>>> One additional work around is to disable the problematic pack/unpack
>>> implementation by setting env var GASNET_VIS_AMPIPE=0
>> 
>> -Brad
>> 
>> 
>> On Wed, 29 Jan 2014, Brad Chamberlain wrote:
>> 
>>> 
>>> Hi Rafael and Akihiro --
>>> 
>>> I should've been a bit more patient in my response.  After passing along 
>>> the "strided" bit, the response came back:
>>> 
>>>> The implementation of strided operations is KNOWN to trigger the bug I 
>>>> referenced. So, have a look at the work-arounds in the bug report.
>>> 
>>> which is here:
>>> 
>>> https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495
>>> 
>>> -Brad
>>> 
>>> 
>>> On Wed, 29 Jan 2014, Brad Chamberlain wrote:
>>> 
>>>> Hi Rafael and Akihiro --
>>>> It sounds as though there isn't a known issue w.r.t. large ibv conduit 
>>>> messages (I didn't catch the importance of 'strided', so sent that along 
>>>> in a second message, but don't expect it'll change the response).  They
>>>> wrote:
>>>>> We don't have a known error with long messages, but do have one with 
>>>>> respect to free() which might be the problem.  If we (ibv-conduit via 
>>>>> firehose) cache a dynamic memory registration (especially problematic w/ 
>>>>> SEGENT_EVERYTHING), then it is possible that memory is free()d and a 
>>>>> later malloc() gets the same virtual address.  If that happens then ibv 
>>>>> may end up performing RDMA from the physical pages corresponding to the 
>>>>> PREVIOUS association for the virtual address (NOTE: the pages are 
>>>>> ref-counted and thus NOT truly free and NOT mapped into some other 
>>>>> process). See https://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=495 for 
>>>>> some details on that bug. The work-arounds are to disable mmap-based 
>>>>> malloc(), or disable firehose.
>>>> To me, this doesn't sound like the same thing, but I'm far enough away 
>>>> from the problem that you may recognize something that I'm not.
>>>> Assuming it doesn't, how hard would it be to put together a small C+GASnet 
>>>> test that exhibits this issue?  Would it be as simple as sending a large 
>>>> buffer in strided mode?  In a loop?
>>>> As long as I was bothering them, I also asked whether there was a way to 
>>>> sanity check that an executable was built with debugging on (since there 
>>>> are so many ways that we could get this wrong) and got the response:
>>>>> As for checking for debugging support, the preprocessor token 
>>>>> GASNET_CONFIG_STRING will tell you a lot about the configuration you've 
>>>>> compiled with.  We do some name-shifting to ensure you can't link with a 
>>>>> library of a different configuration that you've compiled with.
>>>>> Alternatively if you have the "ident" utility for finding RCS strings, 
>>>>> applying it to the executable file will extract lots of configuration 
>>>>> bits. The value of GASNET_CONFIG_STRING will follow "$GASNetConfig:" You 
>>>>> can fake that with:
>>>>> $ perl -n -ln044 -e 'print if /GASNetConfig:/' -- a.out
>>>> Thanks,
>>>> -Brad
>>> 
>> 
> 

-- 
Rafael Larrosa Jiménez
Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es
Universidad de Málaga

EMAIL: [email protected]                  Edificio de Bioinnovación
TELEF: + 34951952788                    C/ Severo Ochoa 34
FAX  : +34951952792                             Parque Tecnológico de Andalucía
                                                                29590 Málaga 
(SPAIN)



------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to