On 6/19/08 3:31 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> Yo Ralph --
> 
> Is the "bad" grpcomm component both new and the default?  Further, is
> the old "basic" grpcomm component now the non-default / testing
> component?

Yes to both

> 
> If so, I wonder if what happened was that Pasha did an "svn up", but
> without re-running autogen/configure, he wouldn't have seen the new
> "bad" component and therefore was falling back on the old "basic"
> component that is now the non-default / testing component...?
> 

Could be - though I thought that if you do a "make" in that situation, it
would force a re-autogen/configure when it saw a new component?

Of course, if he didn't do a "make" at the top level, and he is in a dynamic
build, then maybe OMPI wouldn't figure out that something was different...

Don't know - but we have had problems with svn missing things in the past
too, so it could be a number of things.

<shrug>

> 
> On Jun 19, 2008, at 4:21 PM, Pavel Shamis (Pasha) wrote:
> 
>> I did fresh check out and everything works well.
>> So looks like some svn up screw my svn.
>> Ralph, thanks for help !
>> 
>> Ralph H Castain wrote:
>>> Hmmm...something isn't right, Pasha. There is simply no way you
>>> should be
>>> encountering this error. You are picking up the wrong grpcomm module.
>>> 
>>> I went ahead and fixed the grpcomm/basic module, but as I note in
>>> the commit
>>> message, that is now an experimental area. The grpcomm/bad module
>>> is the
>>> default for that reason.
>>> 
>>> Check to ensure you have the orte/mca/grpcomm/bad directory, and
>>> that it is
>>> getting built. My guess is that you have a corrupted checkout or
>>> build and
>>> that the component is either missing or not getting built.
>>> 
>>> 
>>> On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)"
>>> <pa...@dev.mellanox.co.il> wrote:
>>> 
>>> 
>>>> Ralph H Castain wrote:
>>>> 
>>>>> I can't find anything wrong so far. I'm waiting in a queue on
>>>>> Odin to try
>>>>> there since Jeff indicated you are using rsh as a launcher, and
>>>>> that's the
>>>>> only access I have to such an environment. Guess Odin is being
>>>>> pounded
>>>>> because the queue isn't going anywhere.
>>>>> 
>>>> I use ssh., here is command line:
>>>> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
>>>> ./osu_benchmarks-3.0/osu_latency
>>>> 
>>>>> Meantime, I'm building on RoadRunner and will test there (TM
>>>>> enviro).
>>>>> 
>>>>> 
>>>>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il
>>>>>> wrote:
>>>>> 
>>>>> 
>>>>>>> You'll have to tell us something more than that, Pasha. What
>>>>>>> kind of
>>>>>>> environment, what rev level were you at, etc.
>>>>>>> 
>>>>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI)
>>>>>> 1.3a1r18682M
>>>>>> , OFED 1.3.1
>>>>>> Pasha.
>>>>>> 
>>>>>>> So far as I know, the trunk is fine.
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il
>>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> I tried to run trunk on my machines and I got follow error:
>>>>>>>> 
>>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>>> read past
>>>>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
>>>>>>>> read past
>>>>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>>>>> [sw214:04365]
>>>>>>>> -----------------------------------------------------------------------
>>>>>>>> ---
>>>>>>>> It looks like MPI_INIT failed for some reason; your parallel
>>>>>>>> process is
>>>>>>>> likely to abort.  There are many reasons that a parallel
>>>>>>>> process can
>>>>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>>>>> environment
>>>>>>>> problems.  This failure appears to be an internal failure;
>>>>>>>> here's some
>>>>>>>> additional information (which may only be relevant to an Open
>>>>>>>> MPI
>>>>>>>> developer):
>>>>>>>> 
>>>>>>>>  orte_grpcomm_modex failed
>>>>>>>>  --> Returned "Data unpack would read past end of
>>>>>>>> buffer" (-26) instead
>>>>>>>> of "Success" (0)
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Reply via email to