Hmmm...something isn't right, Pasha. There is simply no way you should be encountering this error. You are picking up the wrong grpcomm module.
I went ahead and fixed the grpcomm/basic module, but as I note in the commit message, that is now an experimental area. The grpcomm/bad module is the default for that reason. Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is getting built. My guess is that you have a corrupted checkout or build and that the component is either missing or not getting built. On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: > Ralph H Castain wrote: >> I can't find anything wrong so far. I'm waiting in a queue on Odin to try >> there since Jeff indicated you are using rsh as a launcher, and that's the >> only access I have to such an environment. Guess Odin is being pounded >> because the queue isn't going anywhere. >> > I use ssh., here is command line: > ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self > ./osu_benchmarks-3.0/osu_latency >> Meantime, I'm building on RoadRunner and will test there (TM enviro). >> >> >> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: >> >> >>>> You'll have to tell us something more than that, Pasha. What kind of >>>> environment, what rev level were you at, etc. >>>> >>>> >>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M >>> , OFED 1.3.1 >>> Pasha. >>> >>>> So far as I know, the trunk is fine. >>>> >>>> >>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> >>>> wrote: >>>> >>>> >>>> >>>>> I tried to run trunk on my machines and I got follow error: >>>>> >>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past >>>>> end of buffer in file base/grpcomm_base_modex.c at line 451 >>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past >>>>> end of buffer in file grpcomm_basic_module.c at line 560 >>>>> [sw214:04365] >>>>> -------------------------------------------------------------------------- >>>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during MPI_INIT; some of which are due to configuration or >>>>> environment >>>>> problems. This failure appears to be an internal failure; here's some >>>>> additional information (which may only be relevant to an Open MPI >>>>> developer): >>>>> >>>>> orte_grpcomm_modex failed >>>>> --> Returned "Data unpack would read past end of buffer" (-26) instead >>>>> of "Success" (0) >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel