Re: [CF-metadata] FW: netcdf for particle trajectories

2012-01-12 Thread Ute Brönner
Rodrigo,

Thank you for your mail and suggestions!
While I agree that it might be useful to have grid information as well as 
particle trajectory information in the same file I would suggest to keep them 
in separate files.
Especially as we are aiming to agree on a standard for particle trajectories 
which might not always be related to oil. As well I am not sure if the others 
want to provide grid information at all.

The information that you use HDF5 directly instead of NetCDF is interesting. Is 
that due to lack of functionality in NetCDF libraries or what is the reason for 
that? Are you willing to share an example?

Best,
Ute

-Original Message-
From: Rodrigo Fernandes [mailto:rodrigo.mare...@ist.utl.pt] 
Sent: Dienstag, 10. Januar 2012 20:09
To: 'Chris Barker'; Ute Brönner
Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; 'CJ 
Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland'; 
rsign...@gmail.com
Subject: RE: FW: netcdf for particle trajectories

Hi everyone,
I suppose most of you (except Mark) doesn't know me, I was introduced in this 
discussion group through Mark, which I met in Kuwait oil spill modelling 
working group, where I was presenting the work from our group in terms of oil 
spill modelling with MOHID (www.mohid.com) and risk management in the Atlantic 
Area, through some EU projects. I'm an oil spill modeller from MARETEC 
(www.maretec.org) in IST university (Portugal), and it is a pleasure to 
participate in your discussion. Sorry for the late feedback.


I hope not to increase the entropy in the discussion, but I feel that probably 
you are more focussed in some technical details than with the properties needed 
(it's like you are more focussed on how than on what).
I have some trouble following some points of your discussion, because we 
produce our outputs in HDF5, although we can easily convert our input and 
output files from / to NetCDF.
I just hope you don't impose limits to the outputs and standards needed due to 
some technical details or limitations (like hierarchical structures) on the 
file formats. In order to avoid this, I think NetCDF4 should be adopted.
In fact I'm not a specialist in file formats and their specificities, however, 
I think I have a clear idea of what I need as an output from a particle 
tracking model in terms of oil spill.
I'm sending my idea in a general format, in a table from the Word document 
attached. I propose an hierarchical structure, which I think is definitely more 
convenient.
 
Additionally, I think that some other properties could also be considered to be 
included in the particle tracking outputs: the wind velocity at surface, water 
temperature and currents velocity used by each lagrangian particle could also 
be interesting. And probably also the particle velocity. This was discussed and 
adopted as a common standard output from an European project (ECOOP), and I 
also think some oil spill models have this natively, like MOTHY (from 
Météo-France).

Best regards
Rodrigo Fernandes


Rodrigo Fernandes
MARETEC - Instituto Superior Técnico
Secção de Ambiente e Energia - Departamento de Engenharia Mecânica Avenida 
Rovisco Pais
1049 - 001 Lisboa - Portugal
Tel. +351 218 419 434 - Fax: +351 218 419 423 www.mohid.com www.maretec.org

-Original Message-
From: Chris Barker [mailto:chris.bar...@noaa.gov]
Sent: segunda-feira, 28 de Novembro de 2011 19:41
To: Ute Brönner
Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ 
Beegle-Krause (cj.beegle-kra...@noaa.gov); Caitlin O'Connor 
(caitlin.ocon...@noaa.gov); Alex Hadjilambris (alex.hadjilamb...@noaa.gov); Rob 
Hetland (hetl...@tamu.edu); Rodrigo Fernandes; rsign...@gmail.com
Subject: Re: FW: netcdf for particle trajectories

On 11/25/2011 5:01 AM, Ute Brönner wrote:
 Hi folks,

 I kind of lost track of our latest discussions and had the feeling 
 that this was partly outside the mailing group;

yes, it was -- we had some discussion among a subset of teh CF list that was 
interested in particle model output.

 so I will try to sum up what we were discussing.

IN our group, we've settled on format for the GNOME model (at least for now, we 
needed to use something) based on the discussion -- Ive been remiss at posting 
about it to larger group -- I was waiting for the time to write it up a bit 
more clearly. More on that soon...

 My latest try was to produce NetCDF for particle trajectory trying to 
 write out the concentration grid which resulted in a 11GB netFCDF3 
 file :-(

when you say grid I'm wondering what you mean -- particle tracks don't 
produce a grid of data -- maybe we're mixing issue here?

 So we have different motivations for discussion particle trajectory 
 and netcdf4.

 First question: Does anybody know if and if yes, when writing netCDF4 
 will be incorporated into the NetCDF Java library? Or will we use 
 Python with the help of 

Re: [CF-metadata] FW: netcdf for particle trajectories

2012-01-12 Thread Rodrigo Fernandes
Hi again,

The decision of making MOHID model to handle HDF files instead of NetCDF was
taken by MARETEC probably 12 years ago (before my entrance in MARETEC), and
I think it was taken due to NetCDF limitations in that time - HDF was
providing the opportunity to compress and to establish an hierarchical
structure.
Meanwhile, NetCDF became a complete standard among modellers, and instead of
changing MOHID inputs or outputs, we developed converting tools to convert
HDF to NetCDF files and vice-versa. 
In this stage, MOHID is now becoming in a new stage, because MOHID is being
prepared to handle NetCDF or HDF files, it's an end-user option.
I just put an example of an old Lagrangian output in the following temporary
link:
http://ge.tt/8KzvhDC (6.51 MB)
Our lagrangian model is being reformulated, because some outputs (like
weathering processes) are not included in lagrangian files, only in ascii
outputs.
If you need further details, just ask.

Other issue, just let me correct some information that I sent in the last
email in the word document: I proposed a time coordinate based on a array
with 6 columns, but I know that's not a standard in CG Conventions - the
standard is time (seconds, per example) since an initial reference. I just
forgot to include this. I think that the two time formats could be included,
because the fact is that in terms of Graphic user interfaces to handle
NetCDF files, the seconds since 1992-10-8 15:15:42.5 -6:00 specification
is annoying and complex to handle: the initial time reference is variable,
as well as the units (In fact, one of the problem in CF conventions is that
they are so comprehensive, and include so many options, that in fact we can
almost do everything. This is an obstacle to generate new software tools to
handle NetCDF files from THREDDS catalogues, per example, as we are doing in
ARCOPOL and EASYCO project. I suppose that for particle files, we should be
more strict...

Best Regards
Rodrigo

-Original Message-
From: Ute Brönner [mailto:ute.broen...@sintef.no] 
Sent: quinta-feira, 12 de Janeiro de 2012 10:35
To: Rodrigo Fernandes; 'Chris Barker'
Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg;
'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland';
rsign...@gmail.com
Subject: RE: FW: netcdf for particle trajectories

Rodrigo,

Thank you for your mail and suggestions!
While I agree that it might be useful to have grid information as well as
particle trajectory information in the same file I would suggest to keep
them in separate files.
Especially as we are aiming to agree on a standard for particle trajectories
which might not always be related to oil. As well I am not sure if the
others want to provide grid information at all.

The information that you use HDF5 directly instead of NetCDF is interesting.
Is that due to lack of functionality in NetCDF libraries or what is the
reason for that? Are you willing to share an example?

Best,
Ute

-Original Message-
From: Rodrigo Fernandes [mailto:rodrigo.mare...@ist.utl.pt] 
Sent: Dienstag, 10. Januar 2012 20:09
To: 'Chris Barker'; Ute Brönner
Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg;
'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland';
rsign...@gmail.com
Subject: RE: FW: netcdf for particle trajectories

Hi everyone,
I suppose most of you (except Mark) doesn't know me, I was introduced in
this discussion group through Mark, which I met in Kuwait oil spill
modelling working group, where I was presenting the work from our group in
terms of oil spill modelling with MOHID (www.mohid.com) and risk management
in the Atlantic Area, through some EU projects. I'm an oil spill modeller
from MARETEC (www.maretec.org) in IST university (Portugal), and it is a
pleasure to participate in your discussion. Sorry for the late feedback.



I hope not to increase the entropy in the discussion, but I feel that
probably you are more focussed in some technical details than with the
properties needed (it's like you are more focussed on how than on what).
I have some trouble following some points of your discussion, because we
produce our outputs in HDF5, although we can easily convert our input and
output files from / to NetCDF.
I just hope you don't impose limits to the outputs and standards needed due
to some technical details or limitations (like hierarchical structures) on
the file formats. In order to avoid this, I think NetCDF4 should be adopted.
In fact I'm not a specialist in file formats and their specificities,
however, I think I have a clear idea of what I need as an output from a
particle tracking model in terms of oil spill.
I'm sending my idea in a general format, in a table from the Word document
attached. I propose an hierarchical structure, which I think is definitely
more convenient.
 
Additionally, I think that some other properties could also be considered to
be included in the particle tracking outputs: the wind 

Re: [CF-metadata] FW: netcdf for particle trajectories

2012-01-10 Thread Rodrigo Fernandes
Hi everyone,
I suppose most of you (except Mark) doesn't know me, I was introduced in
this discussion group through Mark, which I met in Kuwait oil spill
modelling working group, where I was presenting the work from our group in
terms of oil spill modelling with MOHID (www.mohid.com) and risk management
in the Atlantic Area, through some EU projects. I'm an oil spill modeller
from MARETEC (www.maretec.org) in IST university (Portugal), and it is a
pleasure to participate in your discussion. Sorry for the late feedback.

I hope not to increase the entropy in the discussion, but I feel that
probably you are more focussed in some technical details than with the
properties needed (it's like you are more focussed on how than on what).
I have some trouble following some points of your discussion, because we
produce our outputs in HDF5, although we can easily convert our input and
output files from / to NetCDF.
I just hope you don't impose limits to the outputs and standards needed due
to some technical details or limitations (like hierarchical structures) on
the file formats. In order to avoid this, I think NetCDF4 should be adopted.
In fact I'm not a specialist in file formats and their specificities,
however, I think I have a clear idea of what I need as an output from a
particle tracking model in terms of oil spill.
I'm sending my idea in a general format, in a table from the Word document
attached. I propose an hierarchical structure, which I think is definitely
more convenient.
 
Additionally, I think that some other properties could also be considered to
be included in the particle tracking outputs: the wind velocity at surface,
water temperature and currents velocity used by each lagrangian particle
could also be interesting. And probably also the particle velocity. This was
discussed and adopted as a common standard output from an European project
(ECOOP), and I also think some oil spill models have this natively, like
MOTHY (from Météo-France).

Best regards
Rodrigo Fernandes


Rodrigo Fernandes
MARETEC - Instituto Superior Técnico
Secção de Ambiente e Energia - Departamento de Engenharia Mecânica
Avenida Rovisco Pais
1049 - 001 Lisboa - Portugal
Tel. +351 218 419 434 - Fax: +351 218 419 423
www.mohid.com
www.maretec.org

-Original Message-
From: Chris Barker [mailto:chris.bar...@noaa.gov] 
Sent: segunda-feira, 28 de Novembro de 2011 19:41
To: Ute Brönner
Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ
Beegle-Krause (cj.beegle-kra...@noaa.gov); Caitlin O'Connor
(caitlin.ocon...@noaa.gov); Alex Hadjilambris (alex.hadjilamb...@noaa.gov);
Rob Hetland (hetl...@tamu.edu); Rodrigo Fernandes; rsign...@gmail.com
Subject: Re: FW: netcdf for particle trajectories

On 11/25/2011 5:01 AM, Ute Brönner wrote:
 Hi folks,

 I kind of lost track of our latest discussions and had the feeling
 that this was partly outside the mailing group;

yes, it was -- we had some discussion among a subset of teh CF list that 
was interested in particle model output.

 so I will try to sum up what we were discussing.

IN our group, we've settled on format for the GNOME model (at least for 
now, we needed to use something) based on the discussion -- Ive been 
remiss at posting about it to larger group -- I was waiting for the time 
to write it up a bit more clearly. More on that soon...

 My latest try was to produce NetCDF for
 particle trajectory trying to write out the concentration grid which
 resulted in a 11GB netFCDF3 file :-(

when you say grid I'm wondering what you mean -- particle tracks don't 
produce a grid of data -- maybe we're mixing issue here?

 So we have different motivations for discussion particle trajectory
 and netcdf4.

 First question: Does anybody know if and if yes, when writing netCDF4
 will be incorporated into the NetCDF Java library? Or will we use
 Python with the help of Jython etc.
 (http://www.slideshare.net/onyame/mixing-python-and-java) to write
 netCDF4?

I'm not sure mixin python and Java is going to help here -- the Python 
libs use the C libs -- so mixing C and Java would probably be a better 
bet, if you need Java. Jython isn't going to get you C-based Oython 
packages. (JEPP might, as mentioned in that talk -- though if the goal 
is functionality that really comes from C, straight JNI might make more 
sense)

 Second question: Is there a de facto standard / proposal for writing
 Particle Trajectory Data which could be CF:featureType:whatever we
 agree on? The suggestion below is not suitable because: 1) we don't
 track a particle the whole time, it may disappear and show up again
 later, but if I have 1000 particles in time step 1 and 1000 in time
 step 2 we cannot be sure these 1000 are the same as before.

This was the whole point of the ragged array approach -- so that's 
covered.

2) I cannot know the number of time steps in advance.

OK -- that is a challenge -- if we know neither the 

Re: [CF-metadata] FW: netcdf for particle trajectories

2011-12-05 Thread Jonathan Gregory
Dear Chris

 The only place the time dimension is used is for the time and
 row_size variables, so I can't help thinking there could be a work
 around. Rich Signell suggested you could just use a
 as-big-as-you-might-need time coordinate -- which is one option.

yes, I think that's a good and easy solution.

Get well soon!

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] FW: netcdf for particle trajectories

2011-12-02 Thread Chris Barker

Sorry I haven't send what I promise yet -- I've been home sick.

On 12/2/11 1:37 AM, Jonathan Gregory wrote:

I agree, this is a case for a ragged array, with two unlimited dimensions,
logically speaking. However such a thing can be accommodated in the new
discrete geometry conventions

http://www.unidata.ucar.edu/staff/caron/public/CFch9-may10.pdf

This describes two ragged array representations, which are intended for this
purpose. Only one netCDF unlimited dimension is used; the two logical axes are
combined into one netCDF axis.



int row_size(time) ;
   row_size:long_name = number of particles for this time;
   row_size:sample_dimension = obs ;
double time(obs) ;
   time:standard_name = time;
   time:units = days since 1970-01-01 00:00:00 ;
int particle(obs)
   particle:long_name=particle number;
float quantity(obs) ;
   quantity:long_name=one of the physical properties of the particle
   quantity:coordinates = lat lon alt particle ;


I'm sorry, I wasn't quite clear -- what you give below is pretty much 
what we've arrived at, but I had already considered combining the two 
logical axis into one unlimited dimension.


The conflict Ute presented what that he doesn't know at the start of the 
model run the number of time steps that are required. So he'd like the 
time dimension to be unlimited as well -- thus two unlimited dimensions.


The only place the time dimension is used is for the time and row_size 
variables, so I can't help thinking there could be a work around. Rich 
Signell suggested you could just use a as-big-as-you-might-need time 
coordinate -- which is one option.


If I ever feel better -- more later, but I think we are close.



If there is also
invariant information for each particle, a large enough particle dimension
would also be needed and various auxiliary coord vars of this dimension.


That would be a trick if you don't know how many particles you're 
starting with -- another nice use for an unlimited dimension...


Thanks,
  -Chris




--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-29 Thread Jonathan Gregory
Dear Chris

 I think its time to start using netcdf-4 for large collections of point
 data which need to be compressed. Instead of first making a standard, we
 need to try out the possibilities and see how it performs.
 
 That may be true, time to move on eventually! However, you can use
 netcdf4 for compression, but stille use a netcdf3 compatible data
 model, so I'd like to see netcdf4-only features used only if they
 really are necessary to get the data model we need.

That could be done if you can represent the data using a new kind of
featureType to be added to the CF chapter on discrete sampling geometries,
which will be included in CF 1.6 (coming soon). The text for the discrete
sampling geometry chapter is at
http://www.unidata.ucar.edu/staff/caron/public/CFch9-feb25_jg.pdf

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-29 Thread Jonathan Gregory
Sorry, quite right, this is correct.

 http://www.unidata.ucar.edu/staff/caron/public/CFch9-may10.pdf

Thanks, Rich. J
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-29 Thread Chris.Barker

On 11/29/11 4:15 AM, Jonathan Gregory wrote:

That could be done if you can represent the data using a new kind of
featureType to be added to the CF chapter on discrete sampling geometries,
which will be included in CF 1.6 (coming soon). The text for the discrete
sampling geometry chapter is at
http://www.unidata.ucar.edu/staff/caron/public/CFch9-feb25_jg.pdf


Sorry that the discussion about this has been so disjointed, but I think 
our needs can not be met with somethign as simple a a new feature type.


We've had a bit of discussion about this, both on and off this list, but 
I don't think anyone has kept good notes of the main points raised. I'll 
try to write up a proposal soon, but briefly:


The goal is to store the output of partical trackingmodels. These are 
used to mode the advection and dispersion of various substances in a 
flow field: oil spills, larval transport, pollutants in the atmosphere, etc.


Some key features:

 * In general, what is of interest is a collection (100s to 10,000s, or 
more... ) of particles, rather than an one individual particle. Thus, it 
is more likely that the user might ask:


where are all the particles at time T?

than:

How did particle X travel over time?

This has consequences on how one stores the data, so that either 
question can be asked but the first is the more efficient one.


 * particles can have many associated attributes (properties, etc) that 
change over time.


 * Some models create a set of particles at one time, the track them 
for the duration of the run -- that is the easy case. But many models 
create and destroy particles as the model runs -- adding particles when 
increased resolution is desired, removing them as they move out of the 
domain, or are destroyed by physical processes.


This is a key issue -- it is not so straightforward how to store them 
when they numbers change, and when you don't knoe at the start of teh 
model run how many particles there will be at any given time, or even 
the maximum number of particles.


With discussion, we had come to something of a consensus that in order 
to accommodate these needs, a ragged array approach would work. i.e. a 
2-d table of sorts, with one row for each time step, and where each row 
might be any length. There appears to be something of a standard for 
this in CF already, and we have attempted to use that (more later).


We've got a version of this working now in out software, but...

The trick that Ute has brought up is that you may now neither how many 
particles there will be, nor how many time steps. Thus you would like to 
have two unlimited dimensions, which netcdf3 does not support. We've 
accomplished it because we know how many time steps will be run before 
we start.


My first thought is that we could use exactly the same format as hs been 
discusses already, but make it optional to use netcdf4, an an unlimited 
time dimension. Presumably these files could be easily converted, after 
the fact, to a netcdf3 format, as the number of time steps would then be 
known.


About netcdf 3 vs. 4 -- it seems netcdf4 has some nice features, after 
all, it was developed for a reason. However it doesn't not appear to 
have been widely adopted yet. However, maybe we really shouldn't bend 
over backwards to fit a data model to netcdf3 anymore -- it's a chick 
and egg problem, maybe time to make some eggs.


For our part, we use the netcdf4 lib with Python anyway, though our 
C/C++ code is all using netcdf3 -- the burden of compiling the hdf libs 
is something we choose to avoid, though it's not that big a deal.


Anyway -- more soon, I hope.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-28 Thread Chris Barker

On 11/25/2011 5:01 AM, Ute Brönner wrote:

Hi folks,

I kind of lost track of our latest discussions and had the feeling
that this was partly outside the mailing group;


yes, it was -- we had some discussion among a subset of teh CF list that 
was interested in particle model output.



so I will try to sum up what we were discussing.


IN our group, we've settled on format for the GNOME model (at least for 
now, we needed to use something) based on the discussion -- Ive been 
remiss at posting about it to larger group -- I was waiting for the time 
to write it up a bit more clearly. More on that soon...



My latest try was to produce NetCDF for
particle trajectory trying to write out the concentration grid which
resulted in a 11GB netFCDF3 file :-(


when you say grid I'm wondering what you mean -- particle tracks don't 
produce a grid of data -- maybe we're mixing issue here?



So we have different motivations for discussion particle trajectory
and netcdf4.

First question: Does anybody know if and if yes, when writing netCDF4
will be incorporated into the NetCDF Java library? Or will we use
Python with the help of Jython etc.
(http://www.slideshare.net/onyame/mixing-python-and-java) to write
netCDF4?


I'm not sure mixin python and Java is going to help here -- the Python 
libs use the C libs -- so mixing C and Java would probably be a better 
bet, if you need Java. Jython isn't going to get you C-based Oython 
packages. (JEPP might, as mentioned in that talk -- though if the goal 
is functionality that really comes from C, straight JNI might make more 
sense)



Second question: Is there a de facto standard / proposal for writing
Particle Trajectory Data which could be CF:featureType:whatever we
agree on? The suggestion below is not suitable because: 1) we don't
track a particle the whole time, it may disappear and show up again
later, but if I have 1000 particles in time step 1 and 1000 in time
step 2 we cannot be sure these 1000 are the same as before.


This was the whole point of the ragged array approach -- so that's 
covered.


2) I cannot know the number of time steps in advance.

OK -- that is a challenge -- if we know neither the number of time 
steps, nor the number of particles in advance, then we, by definition, 
need two unspecified dimensions. I understand netcdf4 allows this -- may 
be a good reason to go that route.


One question, though -- with the proposed ragged_array-specified format, 
the time dimension is only used in one place - for the: int 
rowSize(time) (or particleCount, or whatever we want to call it) variable.


Is it possible, in netcdf3, to write the big array, with the UNLIMITED 
dimension, then specify the time dimension and associated variable at 
the end? Or does it need to all vbe defined at the start?



and I might have int number_particles_per_timestep(time); :units =
1; :long_name = number particles per current timestep;
:CF:ragged_row_count = particle;



That some of you need to know which spill a particle came from, may
be solved with a 3rd dimension spill dimensions: spill = 3;


unless the spills all have the same number of particles at any given 
time, that's not going to work.


Our solution is to have an ID variable to each particle, so they can 
be isolated -- this can be used to track a given particle over time, and 
also mapped to other data, like which spill it came from, etc.



 // or how

many one has particle = UNLIMITED; //because it may change each time
step


actually ULIMITED does help if it's going to change each time step 
(hence the ragged array solution) -- but it is required as we often 
don't know how many particles are going to be used in the end.



how would one write this? With coordinates or as hierarchical data
structure? At least we need the ability to use several unlimited
dimensions and the ragged-array feature.


apparently, yes.


Third question: How can we compress big netCDF3 files? Or is it
smarter to go for netCDF4 directly with hierarchical data.


I do think compression and hierarchical data structure are separate 
issues. netcdf4 is certainly the easy way to get compression, IIUC, to 
compress neetcdf3, you need to do it before/after file reading/writing 
-- so helpful for storing and transmitting the data, but you still need 
to deal with the big files at some stage.


(or has anyone adapted a netcdf lib to use on-the fly compression (like 
with libz) -- that would be cool)



Hoping to get up the discussion again and that we agree on a standard
quite soon!


yes, thanks for reviving it!

-Chris



 Have a nice weekend!


Best, Ute

 Original Message  Subject: [CF-metadata] Particle
Track Feature Type (was: Re: point observation data in CF 1.4) Date:
Fri, 19 Nov 2010 04:15:35 +0100 From: John
Caronca...@unidata.ucar.edu To:
cf-metadata@cgd.ucar.educf-metadata@cgd.ucar.edu

Im thinking that we need a new feature type for this. Im calling it
particleTrack but theres probably a better name.

My 

Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-28 Thread Chris Barker

On 11/26/2011 9:14 AM, John Caron wrote:

Im intending to incorporate the netcdf-4 C library into the netcdf-java
library using JNI. Im hoping to have something working in the next few
months, but we'll see. This will be an optional component, and will
obviously make portability an issue.


Good idea, none the less.


If you want to use Python, probably
the one to use is Jeff Whittaker's at
http://code.google.com/p/netcdf4-python/, which is also an interface to
the netcdf-4 C library.


yes -- I think that's the best option for Python -- it's nicely done.


I think its time to start using netcdf-4 for large collections of point
data which need to be compressed. Instead of first making a standard, we
need to try out the possibilities and see how it performs.


That may be true, time to move on eventually! However, you can use 
netcdf4 for compression, but stille use a netcdf3 compatible data model, 
so I'd like to see netcdf4-only features used only if they really are 
necessary to get the data model we need.



I think you
want to use Structures, as well as multiple unlimited dimensions. With
netcdf, we dont need the ragged array mecahnism - thats only needed to
overcome the limitations of the classic model.


Can you tell us more? how do you express a ragged_array in netcdf 4? 
Variable length user-defined types, maybe?


This is all a bit frustrating, as we've had a fair bit of discussion, 
and I though had settled on ragged arrays, and I don't think anyone said 
(this would all be so much easier in netcdf 4)


Ute: I'm a bit confused -- was your 11 GB file a result of using the 
ragged array approach, or of using the rectangular array with LOTS of 
empty values approach?


I don't think compression is the answer to the problem of how to store 
what is naturally a ragged array -- partly because it simply doesn't 
appeal to me aesthetically, but also because it hides, and moves the 
problem -- the tools really should understand and be able to work with 
the fact that the number oar particles is not the same at all times, and 
we don't want to have the client apps to deal with a lot of empty arrays 
either.


Note that it seems netcdf4 groups could be handy for dealing with 
multiple spills.


More soon on our current solution.

-Chris


--
Christopher Barker, Ph.D.   
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-26 Thread John Caron

Hi Ute:

On 11/25/2011 6:01 AM, Ute Brönner wrote:

Hi folks,

I kind of lost track of our latest discussions and had the feeling that this 
was partly outside the mailing group; so I will try to sum up what we were 
discussing.
My latest try was to produce NetCDF for particle trajectory trying to write out 
the concentration grid which resulted in a 11GB netFCDF3 file :-(

So we have different motivations for discussion particle trajectory and netcdf4.

First question:
Does anybody know if and if yes, when writing netCDF4 will be incorporated into 
the NetCDF Java library? Or will we use Python with the help of Jython etc. 
(http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4?


Im intending to incorporate the netcdf-4 C library into the netcdf-java 
library using JNI. Im hoping to have something working in the next few 
months, but we'll see. This will be an optional component, and will 
obviously make portability an issue. If you want to use Python, probably 
the one to use is Jeff Whittaker's at 
http://code.google.com/p/netcdf4-python/, which is also an interface to 
the netcdf-4 C library.



Second question:
Is there a de facto standard / proposal for writing Particle Trajectory Data which 
could be CF:featureType:whatever we agree on? The suggestion below is not 
suitable because:
1) we don't track a particle the whole time, it may disappear and show up again 
later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we 
cannot be sure these 1000 are the same as before.
2) I cannot know the number of time steps in advance.



I think its time to start using netcdf-4 for large collections of point 
data which need to be compressed. Instead of first making a standard, we 
need to try out the possibilities and see how it performs. I think you 
want to use Structures, as well as multiple unlimited dimensions. With 
netcdf, we dont need the ragged array mecahnism - thats only needed to 
overcome the limitations of the classic model.


Has anyone started down this path? If so, can you post example netcdf-4 
files?



I would like sth. like
dimensions:
particle = UNLIMITED; //because it may change each time step
time = UNLIMITED; // because I don't know

then every variable is like
latitude (particle, time)
longitude (particle, time)

and I might have
int number_particles_per_timestep(time);
  :units = 1;
  :long_name = number particles per current timestep;
  :CF:ragged_row_count = particle;

That some of you need to know which spill a particle came from, may be solved 
with a 3rd dimension spill
dimensions:
spill = 3; // or how many one has
particle = UNLIMITED; //because it may change each time step
time = UNLIMITED; // because I don't know

particle (spill, time)

then every variable is like
latitude (particle)
longitude (particle)

how would one write this? With coordinates or as hierarchical data structure?
At least we need the ability to use several unlimited dimensions and the 
ragged-array feature.

Third question:
How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 
directly with hierarchical data. As in my example above I would need to write 
out a 11 GB file and then deflate it like described here 
http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html
  or with Rich's script; but is that really necessary?


Hoping to get up the discussion again and that we agree on a standard quite 
soon!
Have a nice weekend!

Best,
Ute

 Original Message 
Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation 
data in CF 1.4)
Date: Fri, 19 Nov 2010 04:15:35 +0100
From: John Caronca...@unidata.ucar.edu
To: cf-metadata@cgd.ucar.educf-metadata@cgd.ucar.edu

Im thinking that we need a new feature type for this. Im calling it 
particleTrack but theres probably a better name.

My reasoning is that the nested table representation of trajectories is:

Table {
traj_id;
Table {
   time;
   lat, lon, z;
   data;
}
}

but this case has the inner and outer table inverted:

Table {
time;
Table {
   particle_id;
   lat, lon, z;
   data;
   data2;
}
}

So, following that line of thought, the possibilities in CDL are:

1) If avg number of particles ~ max number of particles at any time step, then 
one could use multdimensional arrays:

dimensions:
maxParticles = 1000 ;
time =  ; // may be UNLIMITED

variables:

double time(time) ;

int particle_id(time, maxParticles) ;
float lon(time, maxParticles) ;
float lat(time, maxParticles) ;
float z(time, maxParticles) ;
float data(time, maxParticles) ;

attributes:
:featureType = particleTrack;

note maxParticles is the max number of particles at any one time step, not 
total particle tracks. The particle trajectories have to be found by examining 
the values of particle_id(time, maxParticles).

2) The CDL of the ragged case would 

[CF-metadata] FW: netcdf for particle trajectories

2011-11-25 Thread Ute Brönner
Hi folks,

I kind of lost track of our latest discussions and had the feeling that this 
was partly outside the mailing group; so I will try to sum up what we were 
discussing.
My latest try was to produce NetCDF for particle trajectory trying to write out 
the concentration grid which resulted in a 11GB netFCDF3 file :-(

So we have different motivations for discussion particle trajectory and netcdf4.

First question: 
Does anybody know if and if yes, when writing netCDF4 will be incorporated into 
the NetCDF Java library? Or will we use Python with the help of Jython etc. 
(http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4?

Second question:
Is there a de facto standard / proposal for writing Particle Trajectory Data 
which could be CF:featureType: whatever we agree on? The suggestion below is 
not suitable because:
1) we don't track a particle the whole time, it may disappear and show up again 
later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we 
cannot be sure these 1000 are the same as before.
2) I cannot know the number of time steps in advance.

I would like sth. like
dimensions:
   particle = UNLIMITED; //because it may change each time step
   time = UNLIMITED; // because I don't know

then every variable is like 
latitude (particle, time)
longitude (particle, time)

and I might have
int number_particles_per_timestep(time);
 :units = 1;
 :long_name = number particles per current timestep;
 :CF:ragged_row_count = particle;

That some of you need to know which spill a particle came from, may be solved 
with a 3rd dimension spill
dimensions:
   spill = 3; // or how many one has
   particle = UNLIMITED; //because it may change each time step
   time = UNLIMITED; // because I don't know

particle (spill, time)

then every variable is like 
latitude (particle)
longitude (particle)

how would one write this? With coordinates or as hierarchical data structure?
At least we need the ability to use several unlimited dimensions and the 
ragged-array feature.

Third question:
How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 
directly with hierarchical data. As in my example above I would need to write 
out a 11 GB file and then deflate it like described here 
http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html
  or with Rich's script; but is that really necessary?


Hoping to get up the discussion again and that we agree on a standard quite 
soon!
Have a nice weekend!

Best,
Ute

 Original Message 
Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation 
data in CF 1.4)
Date: Fri, 19 Nov 2010 04:15:35 +0100
From: John Caron ca...@unidata.ucar.edu
To: cf-metadata@cgd.ucar.edu cf-metadata@cgd.ucar.edu

Im thinking that we need a new feature type for this. Im calling it 
particleTrack but theres probably a better name.

My reasoning is that the nested table representation of trajectories is:

Table {
   traj_id;
   Table {
  time;
  lat, lon, z;
  data;
   }
}

but this case has the inner and outer table inverted:

Table {
   time;
   Table {
  particle_id;
  lat, lon, z;
  data;
  data2;
   }
}

So, following that line of thought, the possibilities in CDL are:

1) If avg number of particles ~ max number of particles at any time step, then 
one could use multdimensional arrays:

dimensions:
   maxParticles = 1000 ;
   time =  ; // may be UNLIMITED

variables:

   double time(time) ;

   int particle_id(time, maxParticles) ;
   float lon(time, maxParticles) ;
   float lat(time, maxParticles) ;
   float z(time, maxParticles) ;
   float data(time, maxParticles) ;

attributes:
   :featureType = particleTrack;

note maxParticles is the max number of particles at any one time step, not 
total particle tracks. The particle trajectories have to be found by examining 
the values of particle_id(time, maxParticles).

2) The CDL of the ragged case would look like:

dimensions:
   obs = 50; // UNLIMITED
   time =  ;

variables:
   int time(time) ;
   int rowSize(time) ;

   int particle_id(obs) ;
   float lon(obs) ;
   float lat(obs) ;
   float z(obs) ;
   float data(obs) ;

attributes:
   :featureType = particleTrack;

in this case, you dont have to know the max number of particles at any one time 
step, but you do need to know the number of time steps beforehand. The particle 
trajectories have to be found by examining the values of particle_id(obs). The 
particles at time step i are contained in the obs variables between start(i) to 
start(i) + rowSize(i).

these layouts are optimized for processing all particles at a given time, and 
for sequentially processing time steps. If one wanted to process particle 
trajectories, that will be much slower. If you needed to do it a lot, you might 
want to rewrite the file. a more sophisticated application, possibly a server, 
could write an index to speed it up.



Re: [CF-metadata] FW: netcdf for particle trajectories

2011-11-25 Thread Etienne Tourigny
You might prefer to try Nujan instead of mixing python and netcdf,
although variables are limited to 2GB

http://www.ral.ucar.edu/~steves/nujan.html


On Fri, Nov 25, 2011 at 11:01 AM, Ute Brönner ute.broen...@sintef.no wrote:
 Hi folks,

 I kind of lost track of our latest discussions and had the feeling that this 
 was partly outside the mailing group; so I will try to sum up what we were 
 discussing.
 My latest try was to produce NetCDF for particle trajectory trying to write 
 out the concentration grid which resulted in a 11GB netFCDF3 file :-(

 So we have different motivations for discussion particle trajectory and 
 netcdf4.

 First question:
 Does anybody know if and if yes, when writing netCDF4 will be incorporated 
 into the NetCDF Java library? Or will we use Python with the help of Jython 
 etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write 
 netCDF4?

 Second question:
 Is there a de facto standard / proposal for writing Particle Trajectory Data 
 which could be CF:featureType: whatever we agree on? The suggestion below 
 is not suitable because:
 1) we don't track a particle the whole time, it may disappear and show up 
 again later, but if I have 1000 particles in time step 1 and 1000 in time 
 step 2 we cannot be sure these 1000 are the same as before.
 2) I cannot know the number of time steps in advance.

 I would like sth. like
 dimensions:
   particle = UNLIMITED; //because it may change each time step
   time = UNLIMITED; // because I don't know

 then every variable is like
 latitude (particle, time)
 longitude (particle, time)

 and I might have
 int number_particles_per_timestep(time);
     :units = 1;
     :long_name = number particles per current timestep;
     :CF:ragged_row_count = particle;

 That some of you need to know which spill a particle came from, may be solved 
 with a 3rd dimension spill
 dimensions:
   spill = 3; // or how many one has
   particle = UNLIMITED; //because it may change each time step
   time = UNLIMITED; // because I don't know

 particle (spill, time)

 then every variable is like
 latitude (particle)
 longitude (particle)

 how would one write this? With coordinates or as hierarchical data structure?
 At least we need the ability to use several unlimited dimensions and the 
 ragged-array feature.

 Third question:
 How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 
 directly with hierarchical data. As in my example above I would need to write 
 out a 11 GB file and then deflate it like described here 
 http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html
   or with Rich's script; but is that really necessary?


 Hoping to get up the discussion again and that we agree on a standard quite 
 soon!
 Have a nice weekend!

 Best,
 Ute

  Original Message 
 Subject: [CF-metadata] Particle Track Feature Type (was: Re: point 
 observation data in CF 1.4)
 Date: Fri, 19 Nov 2010 04:15:35 +0100
 From: John Caron ca...@unidata.ucar.edu
 To: cf-metadata@cgd.ucar.edu cf-metadata@cgd.ucar.edu

 Im thinking that we need a new feature type for this. Im calling it 
 particleTrack but theres probably a better name.

 My reasoning is that the nested table representation of trajectories is:

 Table {
   traj_id;
   Table {
      time;
      lat, lon, z;
      data;
   }
 }

 but this case has the inner and outer table inverted:

 Table {
   time;
   Table {
      particle_id;
      lat, lon, z;
      data;
      data2;
   }
 }

 So, following that line of thought, the possibilities in CDL are:

 1) If avg number of particles ~ max number of particles at any time step, 
 then one could use multdimensional arrays:

 dimensions:
   maxParticles = 1000 ;
   time =  ; // may be UNLIMITED

 variables:

   double time(time) ;

   int particle_id(time, maxParticles) ;
   float lon(time, maxParticles) ;
   float lat(time, maxParticles) ;
   float z(time, maxParticles) ;
   float data(time, maxParticles) ;

 attributes:
   :featureType = particleTrack;

 note maxParticles is the max number of particles at any one time step, not 
 total particle tracks. The particle trajectories have to be found by 
 examining the values of particle_id(time, maxParticles).

 2) The CDL of the ragged case would look like:

 dimensions:
   obs = 50; // UNLIMITED
   time =  ;

 variables:
   int time(time) ;
   int rowSize(time) ;

   int particle_id(obs) ;
   float lon(obs) ;
   float lat(obs) ;
   float z(obs) ;
   float data(obs) ;

 attributes:
   :featureType = particleTrack;

 in this case, you dont have to know the max number of particles at any one 
 time step, but you do need to know the number of time steps beforehand. The 
 particle trajectories have to be found by examining the values of 
 particle_id(obs). The particles at time step i are contained in the obs 
 variables between start(i) to start(i) + rowSize(i).

 these layouts are optimized for processing all particles at a given