Hi Ken, I think it's more than just a file contention issue. On hopper@nersc I did set DVS_MAXNODES to 14 and that helped out a lot. Without that set before I was able to run with 480 processes accessing the same data file (the 17*768*1152 with 324 time steps data set) but with the "bad" one that was 768*1152 with 9855 time steps I had problems with just 24 processes.
I have some things which I want to try out but I think you're right that using a parallel netcdf library should help a lot, if it doesn't cause conflicts. Thanks, Andy On Wed, Feb 6, 2013 at 5:20 PM, Moreland, Kenneth <kmo...@sandia.gov> wrote: > This does not surprise me. The current version of the netCDF reader > only uses the basic interface for accessing files, which is basically a > serial interface. You are probably getting a lot of file request > contention. > > At the time I wrote the netCDF reader, parallel versions were just > coming online. I think it would be relatively straightforward to update > the reader to use collective parallel calls from a parallel netCDF library. > Unfortunately, I have lost track on the status of the parallel netCDF > library and file formats. Last I looked, there were actually two parallel > netCDF libraries and formats. One version directly added collective > parallel calls to the library. The other changed the format to use hdf5 > under the covers and use the parallel calls therein. These two libraries > use different formats for the files and I don't think are compatible with > each other. Also, it might be the case for one or both libraries that you > cannot read the data in parallel if it was not written in parallel or > written in an older version of netCDF. > > -Ken > > From: Andy Bauer <andy.ba...@kitware.com> > Date: Wednesday, February 6, 2013 10:38 AM > To: "paraview@paraview.org" <paraview@paraview.org>, Kenneth Moreland < > kmo...@sandia.gov> > Subject: [EXTERNAL] vtkNetCDFCFReader parallel performance > > Hi Ken, > > I'm having some performance issues with a fairly large NetCDF file using > the vtkNetCDFCFReader. The dimensions of it are 768 lat, 1152 lon and 9855 > time steps (no elevation dimension). It has one float variable with these > dimensions -- pr(time, lat, lon). This results in a file around 33 GB. I'm > running on hopper and for small amounts of processes (at most 24 which is > the number of cores per node) and the run time seems to increase > dramatically as I add more processes. The tests I did read in the first 2 > time steps and did nothing else. The results are below but weren't done too > rigorously: > > numprocs -- time > 1 -- 1:22 > 2 -- 1:52 > 4 -- 7:52 > 8 -- 5:34 > 16 -- 10:46 > 22 -- 10:37 > 24 -- didn't complete on hopper's "regular" node with 32 GB of memory but > I was able to run it in a reasonable amount of time on hopper's big memory > nodes with 64 GB of memory. > > I have the data in a reasonable place on hopper. I'm still playing around > with settings (things get a bit better if I set DVS_MAXNODES -- > http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/hopperdvs/) > but this seems a bit weird as I'm not having any problems like this on a > data set that has spatial dimensions of 17*768*1152 with 324 time steps. > > Any quick thoughts on this? I'm still investigating but was hoping you > could point out if I'm doing anything stupid. > > Thanks, > Andy > > >
_______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview