I agree about the dependencies. I'm a bit worried about dealing with a vtkPNetCDFReader and a vtkPNetCDFCFReader but for parallel performance for climate data sets I think it's important enough to update it.
Thanks, Andy On Thu, Feb 21, 2013 at 4:46 PM, Moreland, Kenneth <kmo...@sandia.gov>wrote: > Both of these sound fine time. The only caveat is that (1) is a > parallel-specific improvement that requires communication. I don't > remember the dependencies of vtkNetCDFCFReader or the library it is in, but > I would hesitate changing the dependencies. It might be necessary to make > a vtkPNetCDFCFReader subclass. > > -Ken > > From: Andy Bauer <andy.ba...@kitware.com> > Date: Thursday, February 21, 2013 12:38 PM > To: Burlen Loring <blor...@lbl.gov> > Cc: Kenneth Moreland <kmo...@sandia.gov>, "paraview@paraview.org" < > paraview@paraview.org> > Subject: Re: [Paraview] [EXTERNAL] vtkNetCDFCFReader parallel performance > > I've been looking at the code and there's a bunch of small reads in > order to get all of the meta-data and set things up according to the CF > conventions. I'm going through now just making process 0 do this work and > broadcast it. I'm getting decent speedups on this for 240 processes on > hopper where the runs go from taking about 58 seconds down to 38 seconds. > > Any objections to some significant refactoring of the reader? The 2 things > I want to try are: > > 1) read in the meta data on process 0 and broadcast to the other processes > > 2) reduce the amount of file opens and closes in the reader at the expense > of keeping the file pointer open. > > Thanks, > Andy > > On Thu, Feb 7, 2013 at 4:33 PM, Burlen Loring <blor...@lbl.gov> wrote: > >> Hi Andy, >> >> data that small should be fairly fast, and nersc's global scratch >> shouldn't blink when 24 procs access file in read only mode. maybe PV is >> reading all the data on a single process(or worse all of them) then doing a >> redistribution behind the scenes?? That would certainly explain your >> results. either way good luck. >> >> Burlen >> >> >> On 02/07/2013 09:51 AM, Andy Bauer wrote: >> >> Hi Burlen, >> >> I got the data from a different user and that's where he put the data. I >> thought about copying it to $SCRATCH. I just thought though that it was >> really funky that trying to read in data that was under 4 MB for a single >> time step should be pretty fast for when I only have 24 processes asking >> for data. I was thinking that using the scratch space would just be >> covering up some deeper problem too in that I want to scale up to much more >> than 24 processes. After all, any run that can't scale beyond 24 processes >> shouldn't be running on Hopper anyways! >> >> Andy >> >> On Thu, Feb 7, 2013 at 11:57 AM, Burlen Loring <blor...@lbl.gov> wrote: >> >>> Hi Andy, >>> >>> do you have a strong reason for using the global scratch fs? if not you >>> may have better luck using hopper's dedicated lustre scratch. Spec quote > >>> 2x bandwidth[*]. In reality I'm sure it depends on the number of user's >>> hammering it at the time in question. may help to use lustre scratch while >>> you're working on parallelization of the netcdf readers. >>> >>> Burlen >>> >>> * >>> http://www.nersc.gov/users/computational-systems/hopper/file-storage-and-i-o/ >>> >>> >>> On 02/06/2013 03:35 PM, Andy Bauer wrote: >>> >>> Hi Ken, >>> >>> I think it's more than just a file contention issue. On hopper@nersc I >>> did set DVS_MAXNODES to 14 and that helped out a lot. Without that set >>> before I was able to run with 480 processes accessing the same data file >>> (the 17*768*1152 with 324 time steps data set) but with the "bad" one >>> that was 768*1152 with 9855 time steps I had problems with just 24 >>> processes. >>> >>> I have some things which I want to try out but I think you're right that >>> using a parallel netcdf library should help a lot, if it doesn't cause >>> conflicts. >>> >>> Thanks, >>> Andy >>> >>> On Wed, Feb 6, 2013 at 5:20 PM, Moreland, Kenneth <kmo...@sandia.gov>wrote: >>> >>>> This does not surprise me. The current version of the netCDF reader >>>> only uses the basic interface for accessing files, which is basically a >>>> serial interface. You are probably getting a lot of file request >>>> contention. >>>> >>>> At the time I wrote the netCDF reader, parallel versions were just >>>> coming online. I think it would be relatively straightforward to update >>>> the reader to use collective parallel calls from a parallel netCDF library. >>>> Unfortunately, I have lost track on the status of the parallel netCDF >>>> library and file formats. Last I looked, there were actually two parallel >>>> netCDF libraries and formats. One version directly added collective >>>> parallel calls to the library. The other changed the format to use hdf5 >>>> under the covers and use the parallel calls therein. These two libraries >>>> use different formats for the files and I don't think are compatible with >>>> each other. Also, it might be the case for one or both libraries that you >>>> cannot read the data in parallel if it was not written in parallel or >>>> written in an older version of netCDF. >>>> >>>> -Ken >>>> >>>> From: Andy Bauer <andy.ba...@kitware.com> >>>> Date: Wednesday, February 6, 2013 10:38 AM >>>> To: "paraview@paraview.org" <paraview@paraview.org>, Kenneth Moreland < >>>> kmo...@sandia.gov> >>>> Subject: [EXTERNAL] vtkNetCDFCFReader parallel performance >>>> >>>> Hi Ken, >>>> >>>> I'm having some performance issues with a fairly large NetCDF file >>>> using the vtkNetCDFCFReader. The dimensions of it are 768 lat, 1152 lon and >>>> 9855 time steps (no elevation dimension). It has one float variable with >>>> these dimensions -- pr(time, lat, lon). This results in a file around 33 >>>> GB. I'm running on hopper and for small amounts of processes (at most 24 >>>> which is the number of cores per node) and the run time seems to increase >>>> dramatically as I add more processes. The tests I did read in the first 2 >>>> time steps and did nothing else. The results are below but weren't done too >>>> rigorously: >>>> >>>> numprocs -- time >>>> 1 -- 1:22 >>>> 2 -- 1:52 >>>> 4 -- 7:52 >>>> 8 -- 5:34 >>>> 16 -- 10:46 >>>> 22 -- 10:37 >>>> 24 -- didn't complete on hopper's "regular" node with 32 GB of memory >>>> but I was able to run it in a reasonable amount of time on hopper's big >>>> memory nodes with 64 GB of memory. >>>> >>>> I have the data in a reasonable place on hopper. I'm still playing >>>> around with settings (things get a bit better if I set DVS_MAXNODES -- >>>> http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/hopperdvs/) >>>> but this seems a bit weird as I'm not having any problems like this on a >>>> data set that has spatial dimensions of 17*768*1152 with 324 time steps. >>>> >>>> Any quick thoughts on this? I'm still investigating but was hoping you >>>> could point out if I'm doing anything stupid. >>>> >>>> Thanks, >>>> Andy >>>> >>>> >>>> >>> >>> >>> _______________________________________________ >>> Powered by www.kitware.com >>> >>> Visit other Kitware open-source projects at >>> http://www.kitware.com/opensource/opensource.html >>> >>> Please keep messages on-topic and check the ParaView Wiki at: >>> http://paraview.org/Wiki/ParaView >>> >>> Follow this link to >>> subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview >>> >>> >>> >> >> >
_______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview