On 11/25/2011 5:01 AM, Ute Brönner wrote:
Hi folks,
I kind of lost track of our latest discussions and had the feeling
that this was partly outside the mailing group;
yes, it was -- we had some discussion among a subset of teh CF list that
was interested in particle model output.
so I will try to sum up what we were discussing.
IN our group, we've settled on format for the GNOME model (at least for
now, we needed to use something) based on the discussion -- I"ve been
remiss at posting about it to larger group -- I was waiting for the time
to write it up a bit more clearly. More on that soon...
My latest try was to produce NetCDF for
particle trajectory trying to write out the concentration grid which
resulted in a 11GB netFCDF3 file :-(
when you say "grid" I'm wondering what you mean -- particle tracks don't
produce a grid of data -- maybe we're mixing issue here?
So we have different motivations for discussion particle trajectory
and netcdf4.
First question: Does anybody know if and if yes, when writing netCDF4
will be incorporated into the NetCDF Java library? Or will we use
Python with the help of Jython etc.
(http://www.slideshare.net/onyame/mixing-python-and-java) to write
netCDF4?
I'm not sure mixin python and Java is going to help here -- the Python
libs use the C libs -- so mixing C and Java would probably be a better
bet, if you need Java. Jython isn't going to get you C-based Oython
packages. (JEPP might, as mentioned in that talk -- though if the goal
is functionality that really comes from C, straight JNI might make more
sense)
Second question: Is there a de facto standard / proposal for writing
Particle Trajectory Data which could be CF:featureType:<whatever we
agree on>? The suggestion below is not suitable because: 1) we don't
track a particle the whole time, it may disappear and show up again
later, but if I have 1000 particles in time step 1 and 1000 in time
step 2 we cannot be sure these 1000 are the same as before.
This was the whole point of the "ragged array" approach -- so that's
covered.
2) I cannot know the number of time steps in advance.
OK -- that is a challenge -- if we know neither the number of time
steps, nor the number of particles in advance, then we, by definition,
need two unspecified dimensions. I understand netcdf4 allows this -- may
be a good reason to go that route.
One question, though -- with the proposed ragged_array-specified format,
the time dimension is only used in one place - for the: int
rowSize(time) (or "particleCount", or whatever we want to call it) variable.
Is it possible, in netcdf3, to write the big array, with the UNLIMITED
dimension, then specify the time dimension and associated variable at
the end? Or does it need to all vbe defined at the start?
and I might have int number_particles_per_timestep(time); :units =
"1"; :long_name = "number particles per current timestep";
:CF:ragged_row_count = "particle";
That some of you need to know which spill a particle came from, may
be solved with a 3rd dimension spill dimensions: spill = 3;
unless the spills all have the same number of particles at any given
time, that's not going to work.
Our solution is to have an "ID" variable to each particle, so they can
be isolated -- this can be used to track a given particle over time, and
also mapped to other data, like which spill it came from, etc.
// or how
many one has particle = UNLIMITED; //because it may change each time
step
actually ULIMITED does help if it's going to change each time step
(hence the ragged array solution) -- but it is required as we often
don't know how many particles are going to be used in the end.
how would one write this? With coordinates or as hierarchical data
structure? At least we need the ability to use several unlimited
dimensions and the ragged-array feature.
apparently, yes.
Third question: How can we compress big netCDF3 files? Or is it
smarter to go for netCDF4 directly with hierarchical data.
I do think compression and hierarchical data structure are separate
issues. netcdf4 is certainly the easy way to get compression, IIUC, to
compress neetcdf3, you need to do it before/after file reading/writing
-- so helpful for storing and transmitting the data, but you still need
to deal with the big files at some stage.
(or has anyone adapted a netcdf lib to use on-the fly compression (like
with libz) -- that would be cool)
Hoping to get up the discussion again and that we agree on a standard
quite soon!
yes, thanks for reviving it!
-Chris
Have a nice weekend!
Best, Ute
-------- Original Message -------- Subject: [CF-metadata] Particle
Track Feature Type (was: Re: point observation data in CF 1.4) Date:
Fri, 19 Nov 2010 04:15:35 +0100 From: John
Caron<ca...@unidata.ucar.edu> To:
cf-metadata@cgd.ucar.edu<cf-metadata@cgd.ucar.edu>
Im thinking that we need a new feature type for this. Im calling it
"particleTrack" but theres probably a better name.
My reasoning is that the nested table representation of trajectories
is:
Table { traj_id; Table { time; lat, lon, z; data; } }
but this case has the inner and outer table inverted:
Table { time; Table { particle_id; lat, lon, z; data; data2; } }
So, following that line of thought, the possibilities in CDL are:
1) If avg number of particles ~ max number of particles at any time
step, then one could use multdimensional arrays:
dimensions: maxParticles = 1000 ; time = 7777 ; // may be UNLIMITED
variables:
double time(time) ;
int particle_id(time, maxParticles) ; float lon(time, maxParticles)
; float lat(time, maxParticles) ; float z(time, maxParticles) ; float
data(time, maxParticles) ;
attributes: :featureType = "particleTrack";
note maxParticles is the max number of particles at any one time
step, not total particle tracks. The particle trajectories have to be
found by examining the values of particle_id(time, maxParticles).
2) The CDL of the ragged case would look like:
dimensions: obs = 500000; // UNLIMITED time = 7777 ;
variables: int time(time) ; int rowSize(time) ;
int particle_id(obs) ; float lon(obs) ; float lat(obs) ; float z(obs)
; float data(obs) ;
attributes: :featureType = "particleTrack";
in this case, you dont have to know the max number of particles at
any one time step, but you do need to know the number of time steps
beforehand. The particle trajectories have to be found by examining
the values of particle_id(obs). The particles at time step i are
contained in the obs variables between start(i) to start(i) +
rowSize(i).
these layouts are optimized for processing all particles at a given
time, and for sequentially processing time steps. If one wanted to
process particle trajectories, that will be much slower. If you
needed to do it a lot, you might want to rewrite the file. a more
sophisticated application, possibly a server, could write an index to
speed it up.
_______________________________________________ CF-metadata mailing
list CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
-----Original Message----- From: rsign...@gmail.com
[mailto:rsign...@gmail.com] On Behalf Of Rich Signell Sent:
Donnerstag, 18. August 2011 19:04 To: Christopher Barker Cc: Ute
Brönner; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ
Beegle-Krause; Caitlin O'Connor; Alex Hadjilambris; Rob Hetland
Subject: Re: netcdf for particle trajectories
Chris,
so I'll make part of my homework to deliver you a Python
script using Whitaker's NetCDF4 that writes a sample file.
How did this go, Rich?
Yes, I took Rob Hetland's Python short course, and yes, I wrote a
small example showing how to take NetCDF3 particle tracking output
and create a compressed NetCDF4 file with chunking. I just forgot to
send it. ;-)
Note: You can get a OpenDAP-enabled NetCDF4 Python module for both 32
and 64 bit windows from: http://www.lfd.uci.edu/~gohlke/pythonlibs/
-Rich
We're getting closer to a prototype file (i.e. we've got GNOME
writing something, but it still needs some tweaking). I'll sent out
an example when I think we're close.
One new issue:
In GNOME, we have the concept of any number of "spills" -- each
spill is a set of particles that usually share some properties.
So we're trying to figure out how to capture that. Two ideas:
1) each spill is a unique set of data -- but I think that it would
ony be possible to do this by using a convension on teh variable
names:
data_1 particle_count_1 longitude_1 latitude_1 ...
data_2 particle_count_2 longitude_2 latitude_2 ...
That seems pretty ugly. Could netcdf4's "hierarchical data" help us
here? Maybe this provides the motivation to use it.
Option two:
put all the particles in one big array, but identify the different
"spills" by particle ID:
ID_range_1 = 0-1000 ID_range_2 = 1000-2000 ...
then they could get split up by the client software, if desired,
or the separate spills could be ignored, and it could all be
treated as one.
-- thoughts?
-- Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959
voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA
98115 (206) 526-6317 main reception
chris.bar...@noaa.gov
-- Dr. Richard P. Signell (508) 457-2229 USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
chris.bar...@noaa.gov
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata