Re: [OMPI users] running "openmpi" with "knem"

2012-12-01 Thread Brice Goglin
Le 01/12/2012 12:45, Leta Melkamu a écrit :
> Hello there, 
>
> I have some doubts on the use of knem with openmpi, everything works fine.
> However, it is a bit not clear on the usage of knem flags while
> running my open-mpi program. 
> Something like --mca btl_sm_knem_dma_min 4860 is enough or I have to
> add more flag like --mca btl_sm_eager_limit 4276 in the same run? or
> can you please suggest me a good documentation link about knem flag
> usage, I have tried to look around but no good info regarding those
> staff. otherwise I will end up to test each command with different
> flag value for each run.

What are you trying to do?

* Use knem for direct copy through kernel for medium/large messages:
(common use case)
"--mca btl_sm_use_knem 1" is enough. You can "cat /dev/knem" before and
after a run to see that knem counters have increased, which means OMPI
successfully passed some copy requests to knem.

* Use knem for short messages:
"--mca btl_sm_eager_limit 4276" may help. But I am not sure that's a
good idea since knem was designed for large messages.

* Offload knem copies to I/OAT hardware on Intel servers. That's what
"btl_sm_knem_dma_min" is for. Not sure you really want to do that
either, it's not much interesting on current Intel servers.

Brice



Re: [OMPI users] cluster with iOS or Android devices?

2012-12-01 Thread Reuti
Am 30.11.2012 um 07:16 schrieb shiny knight:

> Thanks for all your replies.
> 
> As now I have access to 3 iOS devices and 1 Android, so if possible I would 
> be oriented to pursue more the iOS route.
> 
> So it seems that there is not yet a simple way to do so on these devices 
> (Thanks for the paper posted Dominik); I will have to look deeper in that 
> project that you mentioned and wait for some official release (at least for 
> the Android side)
> 
> I may install linux distro on a virtual machine; mostly I work on OSX so it 
> should not be that bad (OSX allows me to work with both Android and iOS 
> hassle free; that's why I had the thought to use my devices for MPI).
> 
> Beatty: My idea is to use the devices only when plugged in; I was reading a 
> paper about how to use MPI and dynamically change the number of nodes 
> attached, while crunching data for a process. So it would be possible to add 
> and remove nodes on the fly, and was trying to apply it to a portable device 
> (http://www.cs.rpi.edu/~szymansk/papers/ppam05.pdf) before realizing that 
> there is no MPI implementation for them.

NB: AFAICS this paper refers to the IOS from Cisco, not iOS from Apple.

-- Reuti


> I would never envision a system where a user has a device in his pocket that 
> is actually doing "something" behind is back...mine was a simple issue with 
> having devices sitting on my desk, which I use to test my apps, and I could 
> use these devices in a more productive way, while I have them tethered to my 
> main machine (which is the main server where MPI development is done).
> 
> Would you mind elaborate on the approach that you mentioned? I never used 
> Xgrid, so I am not sure about how your solution would work.
> 
> Thanks!
> 
> Lou
> 
> 
> On Nov 29, 2012, at 4:14 PM, Beatty, Daniel D CIV NAVAIR, 474300D wrote:
> 
>> Greetings Ladies and gentlemen,
>> There is one alternative approach and this a psuedo-cloud based MPI.  The
>> idea is that MPI node list is adjusted via the cloud similar to the way
>> Xgrid's Bonjour used to do it for Xgrid.
>> 
>> In this case, it is applying an MPI notion to the OpenCL codelets.  There
>> are obvious issues with security, battery life, etc.  There is considerable
>> room for discussion as far expectations.  Do jobs run free if the device is
>> plugged in?  If the device in the pocket, can the user switch to power
>> conservation/ cooler pockets?  What constitutes fairness?  Do owners have a
>> right to be biased in judgement?   These are tough questions that I think I
>> will have to provide fair assurances for.  After all, everyone likes to
>> think they are control of what they put in their pocket.
>> 
>> V/R,
>> Dan
>> 
>> 
>> On 11/28/12 3:06 PM, "Dominik Goeddeke"
>>  wrote:
>> 
>>> shameless plug: 
>>> http://www.mathematik.tu-dortmund.de/~goeddeke/pubs/pdf/Goeddeke_2012_EEV.pdf
>>> 
>>> In the MontBlanc project (www.montblanc-project.eu), a lot of folks from
>>> all around Europe look into exactly this. Together with a few
>>> colleagues, we have been honoured to get access to an early prototype
>>> system. The runs for the paper above (accepted in JCP as of last week)
>>> have been carried out with MPICH2 back in June, but OpenMPI also worked
>>> flawlessly except for some issues with SLURM integration at the time we
>>> did those tests.
>>> 
>>> The bottom line is: The prototype machine (128 Tegra2's) ran standard
>>> ubuntu, and since Android is essentially Linux, it should not be to
>>> hard to get the system you envision up and running, Shiny Knight.
>>> 
>>> Cheers,
>>> 
>>> Dominik
>>> 
>>> 
>>> On 11/29/2012 12:00 AM, Vincent Diepeveen wrote:
 You might want to post in beowulf mailing list see cc
 and you want to install linux of course.
 
 OpenFabrics releases openmpi, yet it only works at a limited number of
 distributions - most important is having
 the correct kernel (usually old kernel).
 
 I'm gonna try get it to work at debian soon.
 
 
 
 On Nov 28, 2012, at 11:50 PM, shiny knight wrote:
 
> I was looking for some info about MPI port on iOS or Android devices.
> 
> I have some old devices that may result useful, if I could be able to
> include them in my computation scheme.
> 
> OpenCL runs on iOS and Android, so I was wondering if there is any
> way to have an old iPhone/phone or iPad/tablet to run MPI.
> 
> Tried to look everywhere, but I didn't find anything that says that
> it is possible, nor I've found any practical example.
> 
> Thanks!
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> ___
>> users mailing list
>> us...@

[OMPI users] Lustre hints via environment variables/runtime parameters

2012-12-01 Thread Eric Chamberland

Hi,

I am using openmpi 1.6.3 with lustre.  I can change the stip count via 
"striping_unit" but if I try to change the stripe size via 
"striping_factor", all my options are ignored and fall back on the 
default values.


Here is what I do:

=
setenv ROMIO_HINTS /home/ericc/romio-hints

cat $ROMIO_HINTS
striping_unit 16
striping_factor 1048576

rm temp ; mpirun -n 3 idx2 ; lfs getstripe temp

temp
lmm_stripe_count:   1
lmm_stripe_size:65536
lmm_stripe_offset:  28
obdidx   objid  objidgroup
2823877295  0x16c56af0

=

If I remove the "striping_factor 1048576" entry in my hint file, I get this:

=
cat $ROMIO_HINTS
striping_unit 16
#striping_factor 1048576

rm temp ; mpirun -n 3 idx2 ; lfs getstripe temp

temp
lmm_stripe_count:   36
lmm_stripe_size:65536
lmm_stripe_offset:  21
obdidx   objid  objidgroup
2127078098  0x19d2dd20
 526516786  0x1949d320
1826272707  0x190e3c30
 222198271  0x152b7ff0
1424302770  0x172d4b20
1620970007  0x13ffa170
2823877307  0x16c56bb0
 625726827  0x1888f6b0
3123623835  0x168789b0
2324231103  0x171bcbf0
3423963185  0x16da6310
 323462711  0x16603370
1327515658  0x1a3db0a0
2623502238  0x1669d9e0
 726708491  0x1978a0b0
3221946148  0x14edf240
1726912937  0x19aa8a90
 424586434  0x17728c20
272326  0x16330d00
 923634614  0x168a2b60
1125769779  0x18937330
3324915767  0x17c2f370
2920790315  0x13d3c2b0
 825647332  0x18758e40
2026938873  0x19b0df90
1926182463  0x18f833f0
1225346469  0x182c1a50
1525681819  0x187df9b0
2423898261  0x16ca8950
1026554081  0x1952ee10
2523512409  0x166c5590
 028428909  0x1b1ca6d0
3023953009  0x16d7e710
2224117691  0x17001bb0
3520972494  0x14003ce0
 125492821  0x184fd550
=

And if I don't put anything in the file, I get this:

=
cat $ROMIO_HINTS
#striping_unit 16
#striping_factor 1048576

rm temp ; mpirun -n 3 idx2 ; lfs getstripe temp

temp
lmm_stripe_count:   1
lmm_stripe_size:1048576
lmm_stripe_offset:  18
obdidx   objid  objidgroup
1826272802  0x190e4220

=

which are the default values of our lustre.

Any idea?

Thanks,

Eric



[OMPI users] running "openmpi" with "knem"

2012-12-01 Thread Leta Melkamu
Hello there,

I have some doubts on the use of knem with openmpi, everything works fine.
However, it is a bit not clear on the usage of knem flags while running my
open-mpi program.
Something like --mca btl_sm_knem_dma_min 4860 is enough or I have to add
more flag like --mca btl_sm_eager_limit 4276 in the same run? or can you
please suggest me a good documentation link about knem flag usage, I have
tried to look around but no good info regarding those staff. otherwise I
will end up to test each command with different flag value for each run.

thanks in advance.


[OMPI users] vtrun/otf question

2012-12-01 Thread Jaroslaw Slawinski
Hello everybody, this is my first post.

I needed to analyze the communication among nodes in a CFD code, so I
used vtrun from mpiexec.
Next, I dumped the data (otfdump) and summed up the messages volumes
for Send and Rec. lines
My results astonished me - the total Sent <> total Received.
Below I present a very small, 4 processes problem but it occurs in
every run for any number of processes:
This is the sum for SendMessage - first column is sender, second is
rec, 3rd the volume in bytes.

0 0 0
0 1 33575534
0 2 17178610
0 3 17881624
1 0 75900050
1 1 0
1 2 9510508
1 3 20961830
2 0 39807134
2 1 9937288
2 2 0
2 3 30328578
3 0 32415748
3 1 33226154
3 2 55062442
3 3 0

For ReceiveMessage - first column is rec, second sender, 3rd the volume:

0 0 0
0 1 57682570
0 2 30912474
0 3 28154684
1 0 43260014
1 1 0
1 2 9937288
1 3 37073342
2 0 21455674
2 1 9510508
2 2 0
2 3 62425238
3 0 20559492
3 1 19374170
3 2 27494694
3 3 0

Comparing, you can see that reported volumes are perfect between ranks
1 and 2 both directions only. But for others?

I correlated the data with Vampir for this 4-proc case and it shows
agg. message volume partially from SendMessages, partially from
ReciveMessages. Below the table, data in MiB, in brackets you have
ident. or the Send or Rec part I got from OTF.

 p0p1 p2p3
p0 32.02(S)16.383(S) 17.053(S)
p1 55.01(R)  9.07(R/S)   18.477(R)
p2 29.48(R)9.477(R/S)26.221(R)
p3 26.85(R)   31.687(S)   52.512(S)

Can anybody explain this, please? Probably I do something wrong or I
do not understand how to interpret the data in otf. Can otfdump work
wrong? Or Vampir?

Best regards
jaross