Re: [OMPI users] ompio on Lustre
"Gabriel, Edgar" writes: > a) if we detect a Lustre file system without flock support, we can > printout an error message. Completely disabling MPI I/O is on the > ompio architecture not possible at the moment, since the Lustre > component can disqualify itself, but the generic Unix FS component > would kick in in that case, and still continue execution. To be more > precise, the query function of the Lustre component has no way to > return anything than "I am interested to run" or "I am not interested > to run" > > b) I can add an MCA parameter that would allow the Lustre component to > abort execution of the job entirely. While this parameter would > probably be by default set to 'false', a system administrator could > configure it to be set to 'true' an particular platform. Assuming the operations which didn't fail for me are actually OK with noflock (and maybe they're not in other circumstances), can't you just do the same as ROMIO and fail with an explanation on just the ones that will fail without flock? That seems the best from a user's point of view if there's an advantage to using OMPIO rather than ROMIO. I guess it might be clear which operations are problematic if I understood what in fs/lustre requires flock mounts and what the full semantics of the option are, which seem to be more than documented. Thanks for looking into it, anyhow. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
"Latham, Robert J." writes: > it's hard to implement fcntl-lock-free versions of Atomic mode and > Shared file pointer so file systems like PVFS don't support those modes > (and return an error indicating such at open time). Ah. For some reason I thought PVFS had the support to pass the tests somehow, but it's been quite a while since I used it. > You can run lock-free for noncontiguous writes, though at a significant > performance cost. In ROMIO we can disable data sieving write by > setting the hint "romio_ds_write" to "disable", which will fall back to > piece-wise operations. Could be OK if you know your noncontiguous > accesses are only a little bit noncontiguous. Does that mean it could actually support more operations (without failing due to missing flock)? Of course, I realize one should just use flock mounts with Lustre, as I used to. I don't remember this stuff being written down explicitly anywhere, though -- is it somewhere? Thanks for the info. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
Dave, Thank you for your detailed report and testing, that is indeed very helpful. We will definitely have to do something. Here is what I think would be potentially doable. a) if we detect a Lustre file system without flock support, we can printout an error message. Completely disabling MPI I/O is on the ompio architecture not possible at the moment, since the Lustre component can disqualify itself, but the generic Unix FS component would kick in in that case, and still continue execution. To be more precise, the query function of the Lustre component has no way to return anything than "I am interested to run" or "I am not interested to run" b) I can add an MCA parameter that would allow the Lustre component to abort execution of the job entirely. While this parameter would probably be by default set to 'false', a system administrator could configure it to be set to 'true' an particular platform. I will discuss this also with a couple of other people in the next couple of days. Thanks Edgar > -Original Message- > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Dave > Love > Sent: Monday, October 15, 2018 4:22 AM > To: Open MPI Users > Subject: Re: [OMPI users] ompio on Lustre > > For what it's worth, I found the following from running ROMIO's tests with > OMPIO on Lustre mounted without flock (or localflock). I used 48 processes > on two nodes with Lustre for tests which don't require a specific number. > > OMPIO fails tests atomicity, misc, and error on ext4; it additionally fails > noncontig_coll2, fp, shared_fp, and ordered_fp on Lustre/noflock. > > On Lustre/noflock, ROMIO fails on atomicity, i_noncontig, noncontig, > shared_fp, ordered_fp, and error. > > Please can OMPIO be changed to fail in the same way as ROMIO (with a clear > message) for the operations it can't support without flock. > Otherwise it looks as if you can potentially get invalid data, or at least > waste > time debugging other errors. > > I'd debug the common failure on the "error" test, but ptrace is disabled on > the > system. > > In case anyone else is in the same boat and can't get mounts changed, I > suggested staging data to and from a PVFS2^WOrangeFS ephemeral > filesystem on jobs' TMPDIR local mounts if they will fit. Of course other > libraries will potentially corrupt data on nolock mounts. > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
On Mon, 2018-10-15 at 12:21 +0100, Dave Love wrote: > For what it's worth, I found the following from running ROMIO's tests > with OMPIO on Lustre mounted without flock (or localflock). I used > 48 > processes on two nodes with Lustre for tests which don't require a > specific number. > > OMPIO fails tests atomicity, misc, and error on ext4; it additionally > fails noncontig_coll2, fp, shared_fp, and ordered_fp on > Lustre/noflock. > > On Lustre/noflock, ROMIO fails on atomicity, i_noncontig, noncontig, > shared_fp, ordered_fp, and error. > > Please can OMPIO be changed to fail in the same way as ROMIO (with a > clear message) for the operations it can't support without flock. > Otherwise it looks as if you can potentially get invalid data, or at > least waste time debugging other errors. > > I'd debug the common failure on the "error" test, but ptrace is > disabled > on the system. > > In case anyone else is in the same boat and can't get mounts changed, > I > suggested staging data to and from a PVFS2^WOrangeFS ephemeral > filesystem on jobs' TMPDIR local mounts if they will fit. Of course > other libraries will potentially corrupt data on nolock mounts. ROMIO uses fcntl locks for Atomic mode, Shared file pointer updates, and to prefent false sharing in the data sieving optimization for noncontiguous writes. it's hard to implement fcntl-lock-free versions of Atomic mode and Shared file pointer so file systems like PVFS don't support those modes (and return an error indicating such at open time). You can run lock-free for noncontiguous writes, though at a significant performance cost. In ROMIO we can disable data sieving write by setting the hint "romio_ds_write" to "disable", which will fall back to piece-wise operations. Could be OK if you know your noncontiguous accesses are only a little bit noncontiguous. Perhaps OMPIO has a similar option, but I am not familiar with its tuning knobs. ==rob ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
For what it's worth, I found the following from running ROMIO's tests with OMPIO on Lustre mounted without flock (or localflock). I used 48 processes on two nodes with Lustre for tests which don't require a specific number. OMPIO fails tests atomicity, misc, and error on ext4; it additionally fails noncontig_coll2, fp, shared_fp, and ordered_fp on Lustre/noflock. On Lustre/noflock, ROMIO fails on atomicity, i_noncontig, noncontig, shared_fp, ordered_fp, and error. Please can OMPIO be changed to fail in the same way as ROMIO (with a clear message) for the operations it can't support without flock. Otherwise it looks as if you can potentially get invalid data, or at least waste time debugging other errors. I'd debug the common failure on the "error" test, but ptrace is disabled on the system. In case anyone else is in the same boat and can't get mounts changed, I suggested staging data to and from a PVFS2^WOrangeFS ephemeral filesystem on jobs' TMPDIR local mounts if they will fit. Of course other libraries will potentially corrupt data on nolock mounts. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
Well, good question. To be fair, the test passes if you run it with a lower number of processes. In addition, I had a couple of years back a discussion on that with one of the HDF5 developers, and it seemed to be ok to run it this way. That being said, after thinking about it a bit, I think the fix to properly support it is at this point relatively easy, I will try to make it work in the next couple of days (there was a big chunk of code brought in for another fix last year in fall, and I think we have actually everything in place to properly support the atomicity operations). Edgar > -Original Message- > From: Dave Love [mailto:dave.l...@manchester.ac.uk] > Sent: Wednesday, October 10, 2018 3:46 AM > To: Gabriel, Edgar > Cc: Open MPI Users > Subject: Re: [OMPI users] ompio on Lustre > > "Gabriel, Edgar" writes: > > > Ok, thanks. I usually run these test with 4 or 8, but the major item > > is that atomicity is one of the areas that are not well supported in > > ompio (along with data representations), so a failure in those tests > > is not entirely surprising . > > If it's not expected to work, could it be made to return a helpful error, > rather > than just not working properly? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
"Gabriel, Edgar" writes: > Ok, thanks. I usually run these test with 4 or 8, but the major item > is that atomicity is one of the areas that are not well supported in > ompio (along with data representations), so a failure in those tests > is not entirely surprising . If it's not expected to work, could it be made to return a helpful error, rather than just not working properly? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
Ok, thanks. I usually run these test with 4 or 8, but the major item is that atomicity is one of the areas that are not well supported in ompio (along with data representations), so a failure in those tests is not entirely surprising . Most of the work to support atomicity properly is actually in place, but we didn't have the manpower (and requests to be honest) to finish that work. Thanks Edgar > -Original Message- > From: Dave Love [mailto:dave.l...@manchester.ac.uk] > Sent: Tuesday, October 9, 2018 7:05 AM > To: Gabriel, Edgar > Cc: Open MPI Users > Subject: Re: [OMPI users] ompio on Lustre > > "Gabriel, Edgar" writes: > > > Hm, thanks for the report, I will look into this. I did not run the > > romio tests, but the hdf5 tests are run regularly and with 3.1.2 you > > should not have any problems on a regular unix fs. How many processes > > did you use, and which tests did you run specifically? The main tests > > that I execute from their parallel testsuite are testphdf5 and > > t_shapesame. > > Using OMPI 3.1.2, in the hdf5 testpar directory I ran this as a 24-core SMP > job > (so 24 processes), where $TMPDIR is on ext4: > > export HDF5_PARAPREFIX=$TMPDIR > make check RUNPARALLEL='mpirun' > > It stopped after testphdf5 spewed "Atomicity Test Failed" errors. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
"Gabriel, Edgar" writes: > Hm, thanks for the report, I will look into this. I did not run the > romio tests, but the hdf5 tests are run regularly and with 3.1.2 you > should not have any problems on a regular unix fs. How many processes > did you use, and which tests did you run specifically? The main tests > that I execute from their parallel testsuite are testphdf5 and > t_shapesame. Using OMPI 3.1.2, in the hdf5 testpar directory I ran this as a 24-core SMP job (so 24 processes), where $TMPDIR is on ext4: export HDF5_PARAPREFIX=$TMPDIR make check RUNPARALLEL='mpirun' It stopped after testphdf5 spewed "Atomicity Test Failed" errors. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
Hm, thanks for the report, I will look into this. I did not run the romio tests, but the hdf5 tests are run regularly and with 3.1.2 you should not have any problems on a regular unix fs. How many processes did you use, and which tests did you run specifically? The main tests that I execute from their parallel testsuite are testphdf5 and t_shapesame. I will also look into the testmpio that you mentioned in the next couple of days. Thanks Edgar > -Original Message- > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Dave > Love > Sent: Monday, October 8, 2018 10:20 AM > To: Open MPI Users > Subject: Re: [OMPI users] ompio on Lustre > > I said I'd report back about trying ompio on lustre mounted without flock. > > I couldn't immediately figure out how to run MTT. I tried the parallel > hdf5 tests from the hdf5 1.10.3, but I got errors with that even with the > relevant environment variable to put the files on (local) /tmp. > Then it occurred to me rather late that romio would have tests. Using the > "runtests" script modified to use "--mca io ompio" in the romio/test directory > from ompi 3.1.2 on no-flock-mounted Lustre, after building the tests with an > installed ompi-3.1.2, it did this and apparently hung at the end: > > Testing simple.c >No Errors > Testing async.c >No Errors > Testing async-multiple.c >No Errors > Testing atomicity.c > Process 3: readbuf[118] is 0, should be 10 > Process 2: readbuf[65] is 0, should be 10 > -- > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD > with errorcode 1. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -- > Process 1: readbuf[145] is 0, should be 10 > Testing coll_test.c >No Errors > Testing excl.c > error opening file test > error opening file test > error opening file test > > Then I ran on local /tmp as a sanity check and still got errors: > > Testing I/O functions > Testing simple.c >No Errors > Testing async.c >No Errors > Testing async-multiple.c >No Errors > Testing atomicity.c > Process 2: readbuf[155] is 0, should be 10 > Process 1: readbuf[128] is 0, should be 10 > Process 3: readbuf[128] is 0, should be 10 > -- > MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD > with errorcode 1. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -- > Testing coll_test.c >No Errors > Testing excl.c >No Errors > Testing file_info.c >No Errors > Testing i_noncontig.c >No Errors > Testing noncontig.c >No Errors > Testing noncontig_coll.c >No Errors > Testing noncontig_coll2.c >No Errors > Testing aggregation1 >No Errors > Testing aggregation2 >No Errors > Testing hindexed >No Errors > Testing misc.c > file pointer posn = 265, should be 10 > > byte offset = 3020, should be 1080 > > file pointer posn = 265, should be 10 > > byte offset = 3020, should be 1080 > > file pointer posn = 265, should be 10 > > byte offset = 3020, should be 1080 > > file pointer posn in bytes = 3280, should be 1000 > > file pointer posn = 265, should be 10 > > byte offset = 3020, should be 1080 > > file pointer posn in bytes = 3280, should be 1000 > > file pointer posn in bytes = 3280, should be 1000 > > file pointer posn in bytes = 3280, should be 1000 > > Found 12 errors > Testing shared_fp.c >No Errors > Testing ordered_fp.c >No Errors > Testing split_coll.c >No Errors > Testing psimple.c >No Errors > Testing error.c > File set view did not return an error >Found 1 errors > Testing status.c >No Errors > Testing types_with_zeros >No Errors > Testing darray_read >
Re: [OMPI users] ompio on Lustre
I said I'd report back about trying ompio on lustre mounted without flock. I couldn't immediately figure out how to run MTT. I tried the parallel hdf5 tests from the hdf5 1.10.3, but I got errors with that even with the relevant environment variable to put the files on (local) /tmp. Then it occurred to me rather late that romio would have tests. Using the "runtests" script modified to use "--mca io ompio" in the romio/test directory from ompi 3.1.2 on no-flock-mounted Lustre, after building the tests with an installed ompi-3.1.2, it did this and apparently hung at the end: Testing simple.c No Errors Testing async.c No Errors Testing async-multiple.c No Errors Testing atomicity.c Process 3: readbuf[118] is 0, should be 10 Process 2: readbuf[65] is 0, should be 10 -- MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- Process 1: readbuf[145] is 0, should be 10 Testing coll_test.c No Errors Testing excl.c error opening file test error opening file test error opening file test Then I ran on local /tmp as a sanity check and still got errors: Testing I/O functions Testing simple.c No Errors Testing async.c No Errors Testing async-multiple.c No Errors Testing atomicity.c Process 2: readbuf[155] is 0, should be 10 Process 1: readbuf[128] is 0, should be 10 Process 3: readbuf[128] is 0, should be 10 -- MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- Testing coll_test.c No Errors Testing excl.c No Errors Testing file_info.c No Errors Testing i_noncontig.c No Errors Testing noncontig.c No Errors Testing noncontig_coll.c No Errors Testing noncontig_coll2.c No Errors Testing aggregation1 No Errors Testing aggregation2 No Errors Testing hindexed No Errors Testing misc.c file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn in bytes = 3280, should be 1000 file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn in bytes = 3280, should be 1000 file pointer posn in bytes = 3280, should be 1000 file pointer posn in bytes = 3280, should be 1000 Found 12 errors Testing shared_fp.c No Errors Testing ordered_fp.c No Errors Testing split_coll.c No Errors Testing psimple.c No Errors Testing error.c File set view did not return an error Found 1 errors Testing status.c No Errors Testing types_with_zeros No Errors Testing darray_read No Errors I even got an error with romio on /tmp (modifying the script to use mpirun --mca io romi314): Testing error.c Unexpected error message MPI_ERR_ARG: invalid argument of some other kind Found 1 errors ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
"Gabriel, Edgar" writes: > It was originally for performance reasons, but this should be fixed at > this point. I am not aware of correctness problems. > > However, let me try to clarify your question about: What do you > precisely mean by "MPI I/O on Lustre mounts without flock"? Was the > Lustre filesystem mounted without flock? No, it wasn't (and romio complains). > If yes, that could lead to > some problems, we had that on our Lustre installation for a while, but > problems were even occurring without MPI I/O in that case (although I > do not recall all details, just that we had to change the mount > options). Yes, without at least localflock you might expect problems with things like bdb and sqlite, but I couldn't see any file locking calls in the Lustre component. If it is a problem, shouldn't the component fail like without it like romio does? I have suggested ephemeral PVFS^WOrangeFS but I doubt that will be thought useful. > Maybe just take a testsuite (either ours or HDF5), make sure > to run it in a multi-node configuration and see whether it works > correctly. For some reason I didn't think MTT, if that's what you mean, was available, but I see it is; I'll see if I can drive it when I have a chance. Tests from HDF5 might be easiest, thanks for the suggestion. I'd tried with ANL's "testmpio", which was the only thing I found immediately, but it threw up errors even on a local filesystem, at which stage I thought it was best to ask... I'll report back if I get useful results. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] ompio on Lustre
It was originally for performance reasons, but this should be fixed at this point. I am not aware of correctness problems. However, let me try to clarify your question about: What do you precisely mean by "MPI I/O on Lustre mounts without flock"? Was the Lustre filesystem mounted without flock? If yes, that could lead to some problems, we had that on our Lustre installation for a while, but problems were even occurring without MPI I/O in that case (although I do not recall all details, just that we had to change the mount options). Maybe just take a testsuite (either ours or HDF5), make sure to run it in a multi-node configuration and see whether it works correctly. Thanks Edgar > -Original Message- > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Dave > Love > Sent: Friday, October 5, 2018 5:15 AM > To: users@lists.open-mpi.org > Subject: [OMPI users] ompio on Lustre > > Is romio preferred over ompio on Lustre for performance or correctness? > If it's relevant, the context is MPI-IO on Lustre mounts without flock, which > ompio doesn't seem to require. > Thanks. > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] ompio on Lustre
Is romio preferred over ompio on Lustre for performance or correctness? If it's relevant, the context is MPI-IO on Lustre mounts without flock, which ompio doesn't seem to require. Thanks. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users