Hello List, Since I'm also dealing a bit with data sets holding "particle collections" through time, I'd like to contribute some thoughts regarding this. Our primary use here at SINTEF is for inputs to and results from oil drift simulations, so we're dealing mostly at the sea surface and below, although it appears to me that similar data sets are just as applicable up in the atmosphere as well. (Particle collections and bounding polygons for the ash cloud from the Eyjafjallajökull eruption springs to mind as a fairly recent example.)
On 04.11.2010 03:19, John Caron wrote: > 1) It seems clear that at each time step, you need to write out the > data for whatever particles currently exist. This is a very fair assessment in our case. One could generalize a bit more: We have data organized as a series of time steps (as the primary dimension). At each time step we have a number of "data" to store, and they are of various sizes and types. Particles are but one of these kinds of objects. Most of them may probably be treated similar to particles, though, where a fixed set of properties describing each object can simply be represented by a separate netCDF variable per property. A more nasty example could be to represent an oil slick's shape and position with a polygon. The number of vertices of that polygon would be highly variable through time. (This is a typical GIS-like representation.) > I assume that if you wanted to break up the data for a very > long run, you would partition by time, ie time steps 1-1000, > 1001-2000, etc. would be in seperate files. How one decides to partition I think can depend a lot on the application. Sometimes splitting them on data type can be more appropriate. In a recent case I had, the data were to be transferred to a client computer over the Internet for viewing locally. In that case reducing the content of the file to the absolute minimum set of properties (that the client needed in order to visualize) became paramount. Even a fast Internet connection does have bandwidth limitations... :-) > 2) Apparently the common read pattern is to retrieve the set of > particles at a given time step. If so, that makes things easier. Yes, often sequentially by time as well. > 3) I assume that you want to be able to figure out an individual > particle's trajectory, even if that doesnt need to be optimized > for speed. Not my primary need, but if an object is "tracked" like that it would not be unlikely that the trajectory might need to be accessed "interactively", eg. while a user is viewing a visualization of the data directly on screen. Does that count as "optimized for speed"? > 1) is the avg number (Navg) of particles that exist much smaller, > or approx the same as the max number (Nmax) that exist at one time > step? This varies a lot. Sometimes it is like you suggest, but sometimes maybe only a few. Sometimes there isn't any defined Nmax either (dynamic implementations), or such a limit can be difficult to know beforehand. Even where an Nmax is set, would it be unreasonable to require the _same_ value to be used every time if the netCDF dataset was accumulated through _multiple_ simulation runs? > 2) How much data is associated with each particle at a given > time step (just an estimate needed here - 10 bytes? 1000 bytes?) In our case this varies a lot with type of particle, and how the simulation was set up. A quick assessment indicates that some are only 16 bytes per particle, while others may currently require up to 824 bytes. (This does not account for shared info like the time itself, which we don't store per particle.) It also wouldn't be very atypical if this amount is then to be multiplied by say 20000 particles per time step. Hope that provides some useful ideas of the real-life needs! :-) -- Regards, -+- Ben Hetland <ben.a.hetl...@sintef.no> -+- Opinions expressed are my own, not necessarily those of my employer. _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata