Hi Ute:

On 11/25/2011 6:01 AM, Ute Brönner wrote:
Hi folks,

I kind of lost track of our latest discussions and had the feeling that this 
was partly outside the mailing group; so I will try to sum up what we were 
discussing.
My latest try was to produce NetCDF for particle trajectory trying to write out 
the concentration grid which resulted in a 11GB netFCDF3 file :-(

So we have different motivations for discussion particle trajectory and netcdf4.

First question:
Does anybody know if and if yes, when writing netCDF4 will be incorporated into 
the NetCDF Java library? Or will we use Python with the help of Jython etc. 
(http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4?

Im intending to incorporate the netcdf-4 C library into the netcdf-java library using JNI. Im hoping to have something working in the next few months, but we'll see. This will be an optional component, and will obviously make portability an issue. If you want to use Python, probably the one to use is Jeff Whittaker's at http://code.google.com/p/netcdf4-python/, which is also an interface to the netcdf-4 C library.

Second question:
Is there a de facto standard / proposal for writing Particle Trajectory Data which 
could be CF:featureType:<whatever we agree on>? The suggestion below is not 
suitable because:
1) we don't track a particle the whole time, it may disappear and show up again 
later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we 
cannot be sure these 1000 are the same as before.
2) I cannot know the number of time steps in advance.


I think its time to start using netcdf-4 for large collections of point data which need to be compressed. Instead of first making a standard, we need to try out the possibilities and see how it performs. I think you want to use Structures, as well as multiple unlimited dimensions. With netcdf, we dont need the ragged array mecahnism - thats only needed to overcome the limitations of the classic model.

Has anyone started down this path? If so, can you post example netcdf-4 files?

I would like sth. like
dimensions:
    particle = UNLIMITED; //because it may change each time step
    time = UNLIMITED; // because I don't know

then every variable is like
latitude (particle, time)
longitude (particle, time)

and I might have
int number_particles_per_timestep(time);
      :units = "1";
      :long_name = "number particles per current timestep";
      :CF:ragged_row_count = "particle";

That some of you need to know which spill a particle came from, may be solved 
with a 3rd dimension spill
dimensions:
    spill = 3; // or how many one has
    particle = UNLIMITED; //because it may change each time step
    time = UNLIMITED; // because I don't know

particle (spill, time)

then every variable is like
latitude (particle)
longitude (particle)

how would one write this? With coordinates or as hierarchical data structure?
At least we need the ability to use several unlimited dimensions and the 
ragged-array feature.

Third question:
How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 
directly with hierarchical data. As in my example above I would need to write 
out a 11 GB file and then deflate it like described here 
http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html
  or with Rich's script; but is that really necessary?


Hoping to get up the discussion again and that we agree on a standard quite 
soon!
Have a nice weekend!

Best,
Ute

-------- Original Message --------
Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation 
data in CF 1.4)
Date: Fri, 19 Nov 2010 04:15:35 +0100
From: John Caron<ca...@unidata.ucar.edu>
To: cf-metadata@cgd.ucar.edu<cf-metadata@cgd.ucar.edu>

Im thinking that we need a new feature type for this. Im calling it 
"particleTrack" but theres probably a better name.

My reasoning is that the nested table representation of trajectories is:

Table {
    traj_id;
    Table {
       time;
       lat, lon, z;
       data;
    }
}

but this case has the inner and outer table inverted:

Table {
    time;
    Table {
       particle_id;
       lat, lon, z;
       data;
       data2;
    }
}

So, following that line of thought, the possibilities in CDL are:

1) If avg number of particles ~ max number of particles at any time step, then 
one could use multdimensional arrays:

dimensions:
    maxParticles = 1000 ;
    time = 7777 ; // may be UNLIMITED

variables:

    double time(time) ;

    int particle_id(time, maxParticles) ;
    float lon(time, maxParticles) ;
    float lat(time, maxParticles) ;
    float z(time, maxParticles) ;
    float data(time, maxParticles) ;

attributes:
    :featureType = "particleTrack";

note maxParticles is the max number of particles at any one time step, not 
total particle tracks. The particle trajectories have to be found by examining 
the values of particle_id(time, maxParticles).

2) The CDL of the ragged case would look like:

dimensions:
    obs = 500000; // UNLIMITED
    time = 7777 ;

variables:
    int time(time) ;
    int rowSize(time) ;

    int particle_id(obs) ;
    float lon(obs) ;
    float lat(obs) ;
    float z(obs) ;
    float data(obs) ;

attributes:
    :featureType = "particleTrack";

in this case, you dont have to know the max number of particles at any one time 
step, but you do need to know the number of time steps beforehand. The particle 
trajectories have to be found by examining the values of particle_id(obs). The 
particles at time step i are contained in the obs variables between start(i) to 
start(i) + rowSize(i).

these layouts are optimized for processing all particles at a given time, and 
for sequentially processing time steps. If one wanted to process particle 
trajectories, that will be much slower. If you needed to do it a lot, you might 
want to rewrite the file. a more sophisticated application, possibly a server, 
could write an index to speed it up.


_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


-----Original Message-----
From: rsign...@gmail.com [mailto:rsign...@gmail.com] On Behalf Of Rich Signell
Sent: Donnerstag, 18. August 2011 19:04
To: Christopher Barker
Cc: Ute Brönner; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ Beegle-Krause; 
Caitlin O'Connor; Alex Hadjilambris; Rob Hetland
Subject: Re: netcdf for particle trajectories

Chris,


so I'll make part of my homework to deliver you a Python script
using Whitaker's NetCDF4 that writes a sample file.
How did this go, Rich?
Yes, I took Rob Hetland's Python short course, and yes, I wrote a small example 
showing how to take NetCDF3 particle tracking output and create a compressed 
NetCDF4 file with chunking.  I just forgot to send it.  ;-)

Note: You can get a OpenDAP-enabled NetCDF4 Python module for both 32 and 64 
bit windows from:
http://www.lfd.uci.edu/~gohlke/pythonlibs/

-Rich
We're getting closer to a prototype file (i.e. we've got GNOME writing
something, but it still needs some tweaking). I'll sent out an example
when I think we're close.

One new issue:

In GNOME, we have the concept of any number of "spills" -- each spill
is a set of particles that usually share some properties.

So we're trying to figure out how to capture that. Two ideas:

1) each spill is a unique set of data -- but I think that it would ony
be possible to do this by using a convension on teh variable names:

data_1
particle_count_1
longitude_1
latitude_1
...

data_2
particle_count_2
longitude_2
latitude_2
...

That seems pretty ugly. Could netcdf4's "hierarchical data" help us here?
Maybe this provides the motivation to use it.

Option two:

put all the particles in one big array, but identify the different "spills"
by particle ID:

ID_range_1 = 0-1000
ID_range_2 = 1000-2000
...

then they could get split up by the client software, if desired, or
the separate spills could be ignored, and it could all be treated as one.

-- thoughts?


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax Seattle, WA  98115
(206) 526-6317   main reception

chris.bar...@noaa.gov



--
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to