Hi Elaine, We installed new pvfs2 version with the fix that you pointed out. We ran the same application ( mpi-tile-io ) that we were running earlier and with 2.9.0.
This time we observed that we successfully ran the 100 process case. However, for higher number of processes we still ended up having those locktest files in the output directory. Although, the servers remained functional and did not crash, unlike 2.9.0. I have shared server logs from server 1 , server 13 and server 15 on google drive , since they had some significant messages . Please let me know if you might need others too or if you are unable to access these. ( Just as a reference the locktest files were generated at Feb 14 18:07 , Messages after 17:53 on Feb 14 might be relevant, since that is around when the application was launched. ) We would appreciate if you could let us know what might have caused this. Thanks, Jyothi On Fri, Feb 6, 2015 at 3:20 PM, Elaine Quarles <[email protected]> wrote: > Thank you for sending the log. It does confirm that we have a fix for > this problem in testing. If you would like to go ahead and try this > fix before the official release, you can get the code from the svn > repository at > http://www.orangefs.org/svn/orangefs/branches/elaine.bugfix.trunk. > > Thanks, > Elaine > > On Fri, Feb 6, 2015 at 3:55 PM, Mangala Jyothi Bhaskar > <[email protected]> wrote: > > Thank you for the quick response. > > > > To answer the question that Becky asked, yes these problems do occur even > > when using a clean filesystem. > > > > I have now attached both the conf file and the server-log form crill-013 > > that Elaine asked for. > > > > Thanks for your time, > > Jyothi > > > > On Fri, Feb 6, 2015 at 2:40 PM, Elaine Quarles <[email protected]> > wrote: > >> > >> It looks like this was caused by an issue with 2.9.0 that we recently > >> discovered. A fix is in testing. If you could send the log from the > >> server running on crill-013 that could help confirm whether this is, > >> indeed, the known problem. > >> > >> Thanks, > >> Elaine > >> > >> On Fri, Feb 6, 2015 at 3:25 PM, Becky Ligon <[email protected]> wrote: > >> > Jyothi: > >> > > >> > Can you also send us your config file? > >> > > >> > QUESTION: Do you get these problems when using a clean filesystem? I > >> > think > >> > the answer is yes based on your email, but please verify my question. > >> > > >> > Looks like you are having problems with removing files and/or > >> > directories. > >> > Can you give me some idea of what your code is doing in terms of > >> > creating > >> > and/or deleting files and directories? > >> > > >> > Becky > >> > > >> > On Fri, Feb 6, 2015 at 2:41 PM, Mangala Jyothi Bhaskar > >> > <[email protected]> wrote: > >> >> > >> >> Hi, > >> >> > >> >> We have a cluster with PVFS installed with 16 nodes configured as I/O > >> >> and > >> >> Meta data servers. We used for many years PVFS 2.8.2 without any > major > >> >> problems. Recently, we upgraded the cluster software, and at the same > >> >> time > >> >> also upgraded PVFS to OrangeFS 2.9.0. The cluster is now running > kernel > >> >> 3.11 > >> >> > >> >> We have been running some parallel I/O tests using OpenMPI and have > >> >> observed an issue while writing with higher number of processes. > (Upto > >> >> about > >> >> 100 processes we do not see this issue). > >> >> > >> >> We are running a case of MPI Tile I/O with 100 processes. This leads > to > >> >> file system crash as one of the servers fails. We did not observe > this > >> >> issue > >> >> with pvfs 2.8.2. > >> >> > >> >> Also after having restored the file system we see a lot of locktest > >> >> files( > >> >> like shown below) in the folder which makes the folder corrupt for > any > >> >> further usage . These files cannot be deleted unless meta data is > >> >> deleted > >> >> and the storage is re-created. > >> >> > >> >> Please find the attached server-log from the server that crashed for > >> >> more > >> >> details. > >> >> > >> >> -rw-r--r-- 1 mjbhaskar users 0 Feb 5 17:27 > >> >> output_256_16_16_2048_1600_64_testpvfs.txt.locktest.104 > >> >> > >> >> -rw-r--r-- 1 mjbhaskar users 0 Feb 5 17:27 > >> >> output_256_16_16_2048_1600_64_testpvfs.txt.locktest.100 > >> >> > >> >> -rw-r--r-- 1 mjbhaskar users 0 Feb 5 17:27 > >> >> output_256_16_16_2048_1600_64_testpvfs.txt.locktest.1 > >> >> > >> >> -rw-r--r-- 1 mjbhaskar users 0 Feb 5 17:27 > >> >> output_256_16_16_2048_1600_64_testpvfs.txt.locktest.230 > >> >> > >> >> -rw-r--r-- 1 mjbhaskar users 0 Feb 5 17:27 > >> >> output_256_16_16_2048_1600_64_testpvfs.txt.locktest.68 > >> >> > >> >> -rw-r--r-- 1 mjbhaskar users 0 Feb 5 17:27 > >> >> output_256_16_16_2048_1600_64_testpvfs.txt.locktest.138 > >> >> > >> >> > >> >> We would appreciate if you could let us know what might have caused > >> >> this, > >> >> or how to debug this problem. > >> >> > >> >> > >> >> Thanks, > >> >> > >> >> Jyothi > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> Pvfs2-users mailing list > >> >> [email protected] > >> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >> >> > >> > > >> > > >> > _______________________________________________ > >> > Pvfs2-users mailing list > >> > [email protected] > >> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >> > > > > > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
