Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-16 Thread Doron Shoham
Hi,

The gather hangs only in liner_sync algorithm but works with
basic_linear and binomial algorithms.
The gather algorithm is choosen dynamiclly depanding on block size and
communicator size.
So, in the beginning, binomial algorithm is chosen (communicator size
is larger then 60).
When increasing the message size, the liner_sync algorithm is chosen
(with small_segment_size).
When debugging on the cluster I saw that the linear_sync function is
called in endless loop with segment size of 1024.
This explain why hang occure in the middle of the run.

I still don't understand why does RDMACM solve it or what causes
liner_sync hangs.

Again, in 1.5 it doesn't hang (maybe timing is different?).
I'm still trying to understand what are the diffrences in those areas
between 1.4.3 and 1.5


BTW,
Choosing RDMACM fixes hangs and performance issues in all collective operations.

Thanks,
Doron


On Thu, Jan 13, 2011 at 9:44 PM, Shamis, Pavel  wrote:
> RDMACM creates the same QPs with the same tunings as OOB, so I don't see how 
> CPC may effect on performance.
>
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
>
> On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:
>
>> +1 on what Pasha said -- if using rdmacm fixes the problem, then there's 
>> something else nefarious going on...
>>
>> You might want to check padb with your hangs to see where all the processes 
>> are hung to see if anything obvious jumps out.  I'd be surprised if there's 
>> a bug in the oob cpc; it's been around for a long, long time; it should be 
>> pretty stable.
>>
>> Do we create QP's differently between oob and rdmacm, such that perhaps they 
>> are "better" (maybe better routed, or using a different SL, or ...) when 
>> created via rdmacm?
>>
>>
>> On Jan 12, 2011, at 12:12 PM, Shamis, Pavel wrote:
>>
>>> RDMACM or OOB can not effect on performance of this benchmark, since they 
>>> are not involved in communication. So I'm not sure that the performance 
>>> changes that you see are related to connection manager changes.
>>> About oob - I'm not aware about hangs issue there, the code is very-very 
>>> old, we did not touch it for a long time.
>>>
>>> Regards,
>>>
>>> Pavel (Pasha) Shamis
>>> ---
>>> Application Performance Tools Group
>>> Computer Science and Math Division
>>> Oak Ridge National Laboratory
>>> Email: sham...@ornl.gov
>>>
>>>
>>>
>>>
>>>
>>> On Jan 12, 2011, at 8:45 AM, Doron Shoham wrote:
>>>
 Hi,

 For the first problem, I can see that when using rdmacm as openib oob
 I get much better performence results (and no hangs!).

 mpirun -display-map -np 64 -machinefile voltairenodes -mca btl
 sm,self,openib -mca btl_openib_connect_rdmacm_priority 100
 imb/src/IMB-MPI1 gather -npmin 64


 #bytes      #repetitions        t_min[usec]     t_max[usec]     t_avg[usec]

 0       1000        0.04        0.05        0.05

 1       1000        19.64       19.69       19.67

 2       1000        19.97       20.02       19.99

 4       1000        21.86       21.96       21.89

 8       1000        22.87       22.94       22.90

 16      1000        24.71       24.80       24.76

 32      1000        27.23       27.32       27.27

 64      1000        30.96       31.06       31.01

 128     1000        36.96       37.08       37.02

 256     1000        42.64       42.79       42.72

 512     1000        60.32       60.59       60.46

 1024    1000        82.44       82.74       82.59

 2048    1000        497.66      499.62      498.70

 4096    1000        684.15      686.47      685.33

 8192    519         544.07      546.68      545.85

 16384   519         653.20      656.23      655.27

 32768   519         704.48      707.55      706.60

 65536   519         918.00      922.12      920.86

 131072  320         2414.08     2422.17     2418.20

 262144  160         4198.25     4227.58     4213.19

 524288  80          7333.04     7503.99     7438.18

 1048576 40          13692.60    14150.20    13948.75

 2097152 20          30377.34    32679.15    31779.86

 4194304 10          61416.70    71012.50    68380.04

 How can the oob cause the hang? Isn't it only used to bring up the 
 connection?
 Does the oob has any part of the connections were made?

 Thanks,
 Dororn

 On Tue, Jan 11, 2011 at 2:58 PM, Doron Shoham  wrote:
>
> Hi
>
> All machines on the setup are IDataPlex with Nehalem 12 cores per node, 
> 24GB  memory.
>
>
>
> ·         Problem 1 – OMPI 1.4.3 hangs in gather:
>
>
>
> I’m trying to run IMB and gather operation with OMPI 1.4.3 (Vanilla).
>
> It happens when np >= 64 and message size exceed 4k:
>
>>>

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-16 Thread Shamis, Pavel
Well, then I would suspect rdmacm vs oob QP configuration. They supposed to be 
the same, but  probably it's some bug there, and  somehow rdmacm QP tuning 
different from oob, it is potential source cause for the performance 
differences that you see.

Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 16, 2011, at 4:12 AM, Doron Shoham wrote:

> Hi,
>
> The gather hangs only in liner_sync algorithm but works with
> basic_linear and binomial algorithms.
> The gather algorithm is choosen dynamiclly depanding on block size and
> communicator size.
> So, in the beginning, binomial algorithm is chosen (communicator size
> is larger then 60).
> When increasing the message size, the liner_sync algorithm is chosen
> (with small_segment_size).
> When debugging on the cluster I saw that the linear_sync function is
> called in endless loop with segment size of 1024.
> This explain why hang occure in the middle of the run.
>
> I still don't understand why does RDMACM solve it or what causes
> liner_sync hangs.
>
> Again, in 1.5 it doesn't hang (maybe timing is different?).
> I'm still trying to understand what are the diffrences in those areas
> between 1.4.3 and 1.5
>
>
> BTW,
> Choosing RDMACM fixes hangs and performance issues in all collective 
> operations.
>
> Thanks,
> Doron
>
>
> On Thu, Jan 13, 2011 at 9:44 PM, Shamis, Pavel  wrote:
>> RDMACM creates the same QPs with the same tunings as OOB, so I don't see how 
>> CPC may effect on performance.
>>
>> Pavel (Pasha) Shamis
>> ---
>> Application Performance Tools Group
>> Computer Science and Math Division
>> Oak Ridge National Laboratory
>>
>>
>>
>>
>>
>>
>> On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:
>>
>>> +1 on what Pasha said -- if using rdmacm fixes the problem, then there's 
>>> something else nefarious going on...
>>>
>>> You might want to check padb with your hangs to see where all the processes 
>>> are hung to see if anything obvious jumps out.  I'd be surprised if there's 
>>> a bug in the oob cpc; it's been around for a long, long time; it should be 
>>> pretty stable.
>>>
>>> Do we create QP's differently between oob and rdmacm, such that perhaps 
>>> they are "better" (maybe better routed, or using a different SL, or ...) 
>>> when created via rdmacm?
>>>
>>>
>>> On Jan 12, 2011, at 12:12 PM, Shamis, Pavel wrote:
>>>
 RDMACM or OOB can not effect on performance of this benchmark, since they 
 are not involved in communication. So I'm not sure that the performance 
 changes that you see are related to connection manager changes.
 About oob - I'm not aware about hangs issue there, the code is very-very 
 old, we did not touch it for a long time.

 Regards,

 Pavel (Pasha) Shamis
 ---
 Application Performance Tools Group
 Computer Science and Math Division
 Oak Ridge National Laboratory
 Email: sham...@ornl.gov





 On Jan 12, 2011, at 8:45 AM, Doron Shoham wrote:

> Hi,
>
> For the first problem, I can see that when using rdmacm as openib oob
> I get much better performence results (and no hangs!).
>
> mpirun -display-map -np 64 -machinefile voltairenodes -mca btl
> sm,self,openib -mca btl_openib_connect_rdmacm_priority 100
> imb/src/IMB-MPI1 gather -npmin 64
>
>
> #bytes  #repetitionst_min[usec] t_max[usec] 
> t_avg[usec]
>
> 0   10000.040.050.05
>
> 1   100019.64   19.69   19.67
>
> 2   100019.97   20.02   19.99
>
> 4   100021.86   21.96   21.89
>
> 8   100022.87   22.94   22.90
>
> 16  100024.71   24.80   24.76
>
> 32  100027.23   27.32   27.27
>
> 64  100030.96   31.06   31.01
>
> 128 100036.96   37.08   37.02
>
> 256 100042.64   42.79   42.72
>
> 512 100060.32   60.59   60.46
>
> 1024100082.44   82.74   82.59
>
> 20481000497.66  499.62  498.70
>
> 40961000684.15  686.47  685.33
>
> 8192519 544.07  546.68  545.85
>
> 16384   519 653.20  656.23  655.27
>
> 32768   519 704.48  707.55  706.60
>
> 65536   519 918.00  922.12  920.86
>
> 131072  320 2414.08 2422.17 2418.20
>
> 262144  160 4198.25 4227.58 4213.19
>
> 524288  80  7333.04 7503.99 7438.18
>
> 1048576 40  13692.6014150.2013948.75
>
> 2097152 20  30377.3432679.1531779.86
>
> 4194304 10  61416.7071012.506838