[OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end
Hi, I am testing for the first time the 2.X release candidate. I have a segmentation violation using MPI_File_write_all_end(MPI_File fh, const void *buf, MPI_Status *status) The "special" thing, may be that in the faulty test cases, there are processes that haven't written anything, so they a a zero length buffer, so the second parameter (buf) passed is a null pointer. Until now, it was a valid call, has it changed? Thanks, Eric FWIW: We are using our test suite (~2000 nightly tests) successfully with openmpi-1.{6,8,10}.* and MPICH since many years...
Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end
The default MPI I/O library has changed in the 2.x release to OMPIO for most file systems. I can look into that problem, any chance to get access to the testsuite that you mentioned? Thanks Edgar On 7/8/2016 11:32 AM, Eric Chamberland wrote: Hi, I am testing for the first time the 2.X release candidate. I have a segmentation violation using MPI_File_write_all_end(MPI_File fh, const void *buf, MPI_Status *status) The "special" thing, may be that in the faulty test cases, there are processes that haven't written anything, so they a a zero length buffer, so the second parameter (buf) passed is a null pointer. Until now, it was a valid call, has it changed? Thanks, Eric FWIW: We are using our test suite (~2000 nightly tests) successfully with openmpi-1.{6,8,10}.* and MPICH since many years... ___ devel mailing list de...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19169.php -- Edgar Gabriel Associate Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 --
Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end
Hi, On 08/07/16 12:52 PM, Edgar Gabriel wrote: The default MPI I/O library has changed in the 2.x release to OMPIO for ok, I am now doing I/O on my own hard drive... but I can test over NFS easily. For Lustre, I will have to produce a reduced example out of our test suite... most file systems. I can look into that problem, any chance to get access to the testsuite that you mentioned? Yikes! Sounds interesting, but difficult to realize... Our in-house code is not public... :/ I however proposed (to myself) to add a nightly compilation of openmpi (see http://www.open-mpi.org/community/lists/users/2016/06/29515.php) so I can report problems before releases are made... Anyway, I will work on the little script to automate the MPI+PETSc+InHouseCode combination so I get give you a feedback as soon as you will propose me to test a patch... Hoping this will be enough convenient for you... Thanks! Eric Thanks Edgar On 7/8/2016 11:32 AM, Eric Chamberland wrote: Hi, I am testing for the first time the 2.X release candidate. I have a segmentation violation using MPI_File_write_all_end(MPI_File fh, const void *buf, MPI_Status *status) The "special" thing, may be that in the faulty test cases, there are processes that haven't written anything, so they a a zero length buffer, so the second parameter (buf) passed is a null pointer. Until now, it was a valid call, has it changed? Thanks, Eric FWIW: We are using our test suite (~2000 nightly tests) successfully with openmpi-1.{6,8,10}.* and MPICH since many years... ___ devel mailing list de...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19169.php
Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end
ok, but just to be able to construct a test case, basically what you are doing is MPI_File_write_all_begin (fh, NULL, 0, some datatype); MPI_File_write_all_end (fh, NULL, &status), is this correct? Thanks Edgar On 7/8/2016 12:19 PM, Eric Chamberland wrote: Hi, On 08/07/16 12:52 PM, Edgar Gabriel wrote: The default MPI I/O library has changed in the 2.x release to OMPIO for ok, I am now doing I/O on my own hard drive... but I can test over NFS easily. For Lustre, I will have to produce a reduced example out of our test suite... most file systems. I can look into that problem, any chance to get access to the testsuite that you mentioned? Yikes! Sounds interesting, but difficult to realize... Our in-house code is not public... :/ I however proposed (to myself) to add a nightly compilation of openmpi (see http://www.open-mpi.org/community/lists/users/2016/06/29515.php) so I can report problems before releases are made... Anyway, I will work on the little script to automate the MPI+PETSc+InHouseCode combination so I get give you a feedback as soon as you will propose me to test a patch... Hoping this will be enough convenient for you... Thanks! Eric Thanks Edgar On 7/8/2016 11:32 AM, Eric Chamberland wrote: Hi, I am testing for the first time the 2.X release candidate. I have a segmentation violation using MPI_File_write_all_end(MPI_File fh, const void *buf, MPI_Status *status) The "special" thing, may be that in the faulty test cases, there are processes that haven't written anything, so they a a zero length buffer, so the second parameter (buf) passed is a null pointer. Until now, it was a valid call, has it changed? Thanks, Eric FWIW: We are using our test suite (~2000 nightly tests) successfully with openmpi-1.{6,8,10}.* and MPICH since many years... ___ devel mailing list de...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19169.php ___ devel mailing list de...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19171.php -- Edgar Gabriel Associate Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 --
Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end
On 08/07/16 01:44 PM, Edgar Gabriel wrote: ok, but just to be able to construct a test case, basically what you are doing is MPI_File_write_all_begin (fh, NULL, 0, some datatype); MPI_File_write_all_end (fh, NULL, &status), is this correct? Yes, but with 2 processes: rank 0 writes something, but not rank 1... other info: rank 0 didn't wait for rank1 after MPI_File_write_all_end so it continued to the next MPI_File_write_all_begin with a different datatype but on the same file... thanks! Eric
Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end
I think I found the problem, I filed a pr towards master, and if that passes I will file a pr for the 2.x branch. Thanks! Edgar On 7/8/2016 1:14 PM, Eric Chamberland wrote: On 08/07/16 01:44 PM, Edgar Gabriel wrote: ok, but just to be able to construct a test case, basically what you are doing is MPI_File_write_all_begin (fh, NULL, 0, some datatype); MPI_File_write_all_end (fh, NULL, &status), is this correct? Yes, but with 2 processes: rank 0 writes something, but not rank 1... other info: rank 0 didn't wait for rank1 after MPI_File_write_all_end so it continued to the next MPI_File_write_all_begin with a different datatype but on the same file... thanks! Eric ___ devel mailing list de...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19173.php -- Edgar Gabriel Associate Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 --