Hi,
I experience hanging of tests ( latency ) since r19010
Best Regards
Lenny.
Is this related to r1378?
On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
Hi,
I experience hanging of tests ( latency ) since r19010
Best Regards
Lenny.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/
On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote:
Is this related to r1378?
Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
Hi,
I experience hanging of tests ( latency ) since r19010
Best Regards
Lenny.
I believe it it.
On 7/28/08, Jeff Squyres wrote:
>
> On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote:
>
> Is this related to r1378?
>>
>
> Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
>
>
> On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
>>
>> Hi,
>>>
>>> I experience hang
It could also be something new. Brad and I noted on Fri that IB was
locking up as soon as we tried any cross-node communications. Hadn't
seen that before, and at least I haven't explored it further - planned
to do so today.
On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote:
I believe i
Just got this warning today while trying to test IB connections. Last
I checked, 32 was indeed smaller than 192...
--
WARNING: rd_win specification is non optimal. For maximum performance
it is
advisable to configure rd_w
I'm getting the following when I try and build 1.3 from SVN:
gcc -DHAVE_CONFIG_H -I. -I../../adio/include -DOMPI_BUILDING=1 -I/
Users/greg/Documents/workspaces/ptp_head/ompi/ompi/mca/io/romio/
romio/../../../../.. -I/Users/greg/Documents/workspaces/ptp_head/ompi/
ompi/mca/io/romio/romio/../..
Blast. Looks like a problem with the new ROMIO I brought in last week.
I'll fix shortly; thanks for the heads-up.
On Jul 28, 2008, at 9:36 AM, Greg Watson wrote:
I'm getting the following when I try and build 1.3 from SVN:
gcc -DHAVE_CONFIG_H -I. -I../../adio/include -DOMPI_BUILDING=1 -I/
With the update on #1400, I think we're ready to push the MCA base
changes to the SVN trunk. Speak now if you object, or forever hold
your peace. The most notable parts of this commit:
- add "register" function to mca_base_component_t
- converted coll:basic and paffinity:linux and paffini
I checked this out some more and I believe it is ticket #1378 related.
We lock up if SM is included in the BTL's, which is what I had done on
my test. If I ^sm, I can run fine.
On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote:
It could also be something new. Brad and I noted on Fri that IB
It seems that the error felt into the helpfile.
Index: ompi/mca/btl/openib/help-mpi-btl-openib.txt
===
--- ompi/mca/btl/openib/help-mpi-btl-openib.txt (revision 19054)
+++ ompi/mca/btl/openib/help-mpi-btl-openib.txt (working copy)
@@
I failed to run on different nodes or on the same node via self,openib
On 7/28/08, Ralph Castain wrote:
>
> I checked this out some more and I believe it is ticket #1378 related. We
> lock up if SM is included in the BTL's, which is what I had done on my test.
> If I ^sm, I can run fine.
>
> On
On Mon, Jul 28, 2008 at 05:14:29PM +0300, Lenny Verkhovsky wrote:
> -advisable to configure rd_win smaller then (rd_num - rd_low), but currently
> +advisable to configure rd_win bigger then (rd_num - rd_low), but currently
^ a
--
Cluster and Metacomputi
WHAT: Rename MCA DSO filenames from "mca__.so"
to "libmca__.so" (backwards compatibility can be
preserved if we want it; see below)
WHY: Allows simplifying component Makefile.am's
WHEN: No real rush; just wanted to get the idea out there (does *not*
need to be before v1.3; more explanation
I think Lenny is pointing out that "smaller" got changed to "bigger",
too. :-)
Looking at the test in the code (btl_openib_component.c):
if ((rd_num - rd_low) > rd_win) {
orte_show_help("help-mpi-btl-openib.txt", "non
optimal rd_win",
tru
On Jul 28, 2008, at 8:22 AM, Jeff Squyres wrote:
I think Lenny is pointing out that "smaller" got changed to
"bigger", too. :-)
Looking at the test in the code (btl_openib_component.c):
if ((rd_num - rd_low) > rd_win) {
orte_show_help("help-mpi-btl-openib.txt", "n
My experience is the same a Lenny's. I've tested on x86_64 and ppc64
systems and tests using --mca btl openib,self hang in all cases.
--brad
2008/7/28 Lenny Verkhovsky
> I failed to run on different nodes or on the same node via self,openib
>
>
>
> On 7/28/08, Ralph Castain wrote:
>>
>> I c
Interesting - you are quite correct and I should have been more
precise. I ran with -mca btl openib and it worked. So having just
openib seems to be okay.
On Jul 28, 2008, at 8:37 AM, Brad Benton wrote:
My experience is the same a Lenny's. I've tested on x86_64 and
ppc64 systems and tes
FWIW, all my MTT runs are hanging as well.
On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
My experience is the same a Lenny's. I've tested on x86_64 and
ppc64 systems and tests using --mca btl openib,self hang in all
cases.
--brad
2008/7/28 Lenny Verkhovsky
I failed to run on diffe
only openib works for me too,
but Glebs said to me once that it's illigal and I always need to use self
btl.
On 7/28/08, Jeff Squyres wrote:
>
> FWIW, all my MTT runs are hanging as well.
>
>
> On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
>
> My experience is the same a Lenny's. I've teste
On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote:
only openib works for me too,
but Glebs said to me once that it's illigal and I always need to use
self btl.
Don't know - could be true. But if that is true, then we should check
to see if that condition is met and error out - with a
Looking into it a bit more, the situation is a little convoluted.
I've filed https://svn.open-mpi.org/trac/ompi/ticket/1419; followups
will occur there.
On Jul 28, 2008, at 9:42 AM, Jeff Squyres wrote:
Blast. Looks like a problem with the new ROMIO I brought in last
week.
I'll fix sho
Just an FYI for those of you working with slot_lists.
Lenny, Jeff and I have changed the mca param associated with how you
specify the slot list you want the rank_file mapper to use. This was
done to avoid the possibility of ORTE processes such as mpirun and
orted accidentally binding thems
Per an earlier telecon, I have modified the hostfile behavior slightly
to allow hostfiles to subdivide allocations.
Briefly: given an allocation, we allow users to specify --hostfile on
a per-app_context basis. In this mode, the hostfile info is used to
filter the nodes that will be used fo
I'm a little bit lost here. You're stating that openib,self doesn't
work while openib does? In other words that adding self to the BTL
leads to deadlocks?
george.
PS: Btw, it is not supposed to work at all, except in the case where
openib handle internal messages (where the source and de
I just re-tested to confirm, and that is correct.
-mca btl openib works
-mca btl openib,selfhangs
-mca btl openib,sm works
On Jul 28, 2008, at 9:49 AM, George Bosilca wrote:
I'm a little bit lost here. You're stating that openib,self doesn't
w
Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but apparently
I was wrong. Please let me know the failing test, I will take a look
this evening.
Thanks,
george.
On Jul 28, 2008, at 5:56 PM, Ralph Castain wro
On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but
apparently I was wrong. Please let me know the failing test, I will
take a look this evening.
FWIW, my manual
My test wasn't a benchmark - I was just testing with a little program
that calls mpi_init, mpi_barrier, and mpi_finalize.
A test with just mpi_init/finalize works fine, so it looks like we
simply hang when trying to communicate. This also only happens on
multi-node operations.
On Jul 28,
On Jul 28, 2008, at 11:05 AM, Ralph Castain wrote:
only openib works for me too,
but Glebs said to me once that it's illigal and I always need to
use self btl.
Don't know - could be true. But if that is true, then we should
check to see if that condition is met and error out - with an
My only concern is how will this interact with PLPA.
Say two Open MPI jobs each use "half" the cores (slots) on a
particular node... how would they be able to bind themselves to
a disjoint set of cores? I'm not asking you to solve this Ralph, I'm
just pointing it out so we can maybe warn users th
Jeff Squyres wrote:
On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but apparently
I was wrong. Please let me know the failing test, I will take a look
this evening.
Actually, this is true today regardless of this change. If two
separate mpirun invocations share a node and attempt to use paffinity,
they will conflict with each other. The problem isn't caused by the
hostfile sub-allocation. The problem is that the two mpiruns have no
knowledge of each ot
I think I fixed the parallel debugger attach stuff in an hg -- can
interested parties test it out at their own sites before I bring it
back to the SVN trunk? It should be working for both Allinea DDT and
TotalView.
HG:
http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/debugger-stuff/
Ti
On Mon, Jul 28, 2008 at 12:08 PM, Terry Dontje wrote:
> Jeff Squyres wrote:
>
>> On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
>>
>> Interesting. The self is only used for local communications. I don't
>>> expect that any benchmark execute such communications, but apparently I was
>>> wron
Since the trunk has now been bumped to MCA v2.0, and all frameworks
have also been bumped to v2.0, are these two #defines relevant anymore:
MCA_BTL_BASE_VERSION_1_0_1
MCA_BTL_BASE_VERSION_1_0_0
I know there was at least one BTL being developed at an organization
that may not have kept up wit
36 matches
Mail list logo