Re: [OMPI users] Antw: Re: mpirun not working on more than one node
Thanks thats it! Would have been straigth forward, but there is a lot of things to consider by setting up a cluster the first time - a lot to oversee. Anyway thanks for your help. >>> Ralph Castain 18.11.2009 15:57 >>> Bingo! This is why we ask for info on how you configure OMPI :-) You need to rebuild OMPI with --enable-heterogeneous. Because there is additional overhead associated with running hetero configurations, and so few people do so, it is disabled by default. On Nov 18, 2009, at 2:55 AM, Laurin Müller wrote: Now i have the same openmpi versions. 1.3.2 recalulated on both nodes and it works again on each node seperatly: node1: cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --version mpirun (Open MPI) 1.3.2 cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ ( mailto:1.3.2cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ ) mpirun --hostfile /etc/openmpi/openmpi-default-hostfile -np 4 /mnt/projects/PS3Cluster/Benchmark/pi Input number of intervals: 20 1: pi = 0.798498008827023 2: pi = 0.773339953424083 3: pi = 0.747089984650041 0: pi = 0.822248040052981 pi = 3.141175986954128 node2 (PS3): root@kasimir:/mnt/projects/PS3Cluster/Benchmark# mpirun --version mpirun (Open MPI) 1.3.2 [...] root@kasimir:/mnt/projects/PS3Cluster/Benchmark# mpirun -np 2 pi Input number of intervals: 20 0: pi = 1.595587993477064 1: pi = 1.545587993477064 pi = 3.141175986954128 BUT when i start it on node1 with more than 16 processes and hostfile. i get this errors: cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --hostfile /etc/openmpi/openmpi-default-hostfile -np 17 /mnt/projects/PS3Cluster/Benchmark/pi -- This installation of Open MPI was configured without support for heterogeneous architectures, but at least one node in the allocation was detected to have a different architecture. The detected node was: Node: bioclust In order to operate in a heterogeneous environment, please reconfigure Open MPI with --enable-heterogeneous. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_proc_set_arch failed --> Returned "Not supported" (-8) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1239] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1240] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1241] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1242] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1244] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1245] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1246] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1247] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1248] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An erro
Re: [OMPI users] Antw: Re: mpirun not working on more than one node
Bingo! This is why we ask for info on how you configure OMPI :-) You need to rebuild OMPI with --enable-heterogeneous. Because there is additional overhead associated with running hetero configurations, and so few people do so, it is disabled by default. On Nov 18, 2009, at 2:55 AM, Laurin Müller wrote: > Now i have the same openmpi versions. 1.3.2 > > recalulated on both nodes and it works again on each node seperatly: > > node1: > cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --version > mpirun (Open MPI) 1.3.2 > cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --hostfile > /etc/openmpi/openmpi-default-hostfile -np 4 > /mnt/projects/PS3Cluster/Benchmark/pi > Input number of intervals: > 20 > 1: pi = 0.798498008827023 > 2: pi = 0.773339953424083 > 3: pi = 0.747089984650041 > 0: pi = 0.822248040052981 > pi = 3.141175986954128 > node2 (PS3): > root@kasimir:/mnt/projects/PS3Cluster/Benchmark# mpirun --version > mpirun (Open MPI) 1.3.2 > [...] > root@kasimir:/mnt/projects/PS3Cluster/Benchmark# mpirun -np 2 pi > Input number of intervals: > 20 > 0: pi = 1.595587993477064 > 1: pi = 1.545587993477064 > pi = 3.141175986954128 > BUT when i start it on node1 with more than 16 processes and hostfile. i get > this errors: > cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --hostfile > /etc/openmpi/openmpi-default-hostfile -np 17 > /mnt/projects/PS3Cluster/Benchmark/pi > -- > This installation of Open MPI was configured without support for > heterogeneous architectures, but at least one node in the allocation > was detected to have a different architecture. The detected node was: > > Node: bioclust > > In order to operate in a heterogeneous environment, please reconfigure > Open MPI with --enable-heterogeneous. > -- > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_proc_set_arch failed > --> Returned "Not supported" (-8) instead of "Success" (0) > -- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1239] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1240] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1241] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1242] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1244] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1245] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1246] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1247] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [bioclust:1248] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL
[OMPI users] Antw: Re: mpirun not working on more than one node
Now i have the same openmpi versions. 1.3.2 recalulated on both nodes and it works again on each node seperatly: node1: cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --version mpirun (Open MPI) 1.3.2 cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ ( mailto:1.3.2cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ ) mpirun --hostfile /etc/openmpi/openmpi-default-hostfile -np 4 /mnt/projects/PS3Cluster/Benchmark/pi Input number of intervals: 20 1: pi = 0.798498008827023 2: pi = 0.773339953424083 3: pi = 0.747089984650041 0: pi = 0.822248040052981 pi = 3.141175986954128 node2 (PS3): root@kasimir:/mnt/projects/PS3Cluster/Benchmark# mpirun --version mpirun (Open MPI) 1.3.2 [...] root@kasimir:/mnt/projects/PS3Cluster/Benchmark# mpirun -np 2 pi Input number of intervals: 20 0: pi = 1.595587993477064 1: pi = 1.545587993477064 pi = 3.141175986954128 BUT when i start it on node1 with more than 16 processes and hostfile. i get this errors: cluster@bioclust:/mnt/projects/PS3Cluster/Benchmark$ mpirun --hostfile /etc/openmpi/openmpi-default-hostfile -np 17 /mnt/projects/PS3Cluster/Benchmark/pi -- This installation of Open MPI was configured without support for heterogeneous architectures, but at least one node in the allocation was detected to have a different architecture. The detected node was: Node: bioclust In order to operate in a heterogeneous environment, please reconfigure Open MPI with --enable-heterogeneous. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_proc_set_arch failed --> Returned "Not supported" (-8) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1239] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1240] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1241] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1242] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1244] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1245] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1246] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1247] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1248] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1250] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [bioclust:1251] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** before MPI was initialize