Re: [CF-metadata] FW: netcdf for particle trajectories
Rodrigo, Thank you for your mail and suggestions! While I agree that it might be useful to have grid information as well as particle trajectory information in the same file I would suggest to keep them in separate files. Especially as we are aiming to agree on a standard for particle trajectories which might not always be related to oil. As well I am not sure if the others want to provide grid information at all. The information that you use HDF5 directly instead of NetCDF is interesting. Is that due to lack of functionality in NetCDF libraries or what is the reason for that? Are you willing to share an example? Best, Ute -Original Message- From: Rodrigo Fernandes [mailto:rodrigo.mare...@ist.utl.pt] Sent: Dienstag, 10. Januar 2012 20:09 To: 'Chris Barker'; Ute Brönner Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; 'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland'; rsign...@gmail.com Subject: RE: FW: netcdf for particle trajectories Hi everyone, I suppose most of you (except Mark) doesn't know me, I was introduced in this discussion group through Mark, which I met in Kuwait oil spill modelling working group, where I was presenting the work from our group in terms of oil spill modelling with MOHID (www.mohid.com) and risk management in the Atlantic Area, through some EU projects. I'm an oil spill modeller from MARETEC (www.maretec.org) in IST university (Portugal), and it is a pleasure to participate in your discussion. Sorry for the late feedback. I hope not to increase the entropy in the discussion, but I feel that probably you are more focussed in some technical details than with the properties needed (it's like you are more focussed on how than on what). I have some trouble following some points of your discussion, because we produce our outputs in HDF5, although we can easily convert our input and output files from / to NetCDF. I just hope you don't impose limits to the outputs and standards needed due to some technical details or limitations (like hierarchical structures) on the file formats. In order to avoid this, I think NetCDF4 should be adopted. In fact I'm not a specialist in file formats and their specificities, however, I think I have a clear idea of what I need as an output from a particle tracking model in terms of oil spill. I'm sending my idea in a general format, in a table from the Word document attached. I propose an hierarchical structure, which I think is definitely more convenient. Additionally, I think that some other properties could also be considered to be included in the particle tracking outputs: the wind velocity at surface, water temperature and currents velocity used by each lagrangian particle could also be interesting. And probably also the particle velocity. This was discussed and adopted as a common standard output from an European project (ECOOP), and I also think some oil spill models have this natively, like MOTHY (from Météo-France). Best regards Rodrigo Fernandes Rodrigo Fernandes MARETEC - Instituto Superior Técnico Secção de Ambiente e Energia - Departamento de Engenharia Mecânica Avenida Rovisco Pais 1049 - 001 Lisboa - Portugal Tel. +351 218 419 434 - Fax: +351 218 419 423 www.mohid.com www.maretec.org -Original Message- From: Chris Barker [mailto:chris.bar...@noaa.gov] Sent: segunda-feira, 28 de Novembro de 2011 19:41 To: Ute Brönner Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ Beegle-Krause (cj.beegle-kra...@noaa.gov); Caitlin O'Connor (caitlin.ocon...@noaa.gov); Alex Hadjilambris (alex.hadjilamb...@noaa.gov); Rob Hetland (hetl...@tamu.edu); Rodrigo Fernandes; rsign...@gmail.com Subject: Re: FW: netcdf for particle trajectories On 11/25/2011 5:01 AM, Ute Brönner wrote: Hi folks, I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; yes, it was -- we had some discussion among a subset of teh CF list that was interested in particle model output. so I will try to sum up what we were discussing. IN our group, we've settled on format for the GNOME model (at least for now, we needed to use something) based on the discussion -- Ive been remiss at posting about it to larger group -- I was waiting for the time to write it up a bit more clearly. More on that soon... My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-( when you say grid I'm wondering what you mean -- particle tracks don't produce a grid of data -- maybe we're mixing issue here? So we have different motivations for discussion particle trajectory and netcdf4. First question: Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of
Re: [CF-metadata] FW: netcdf for particle trajectories
Hi again, The decision of making MOHID model to handle HDF files instead of NetCDF was taken by MARETEC probably 12 years ago (before my entrance in MARETEC), and I think it was taken due to NetCDF limitations in that time - HDF was providing the opportunity to compress and to establish an hierarchical structure. Meanwhile, NetCDF became a complete standard among modellers, and instead of changing MOHID inputs or outputs, we developed converting tools to convert HDF to NetCDF files and vice-versa. In this stage, MOHID is now becoming in a new stage, because MOHID is being prepared to handle NetCDF or HDF files, it's an end-user option. I just put an example of an old Lagrangian output in the following temporary link: http://ge.tt/8KzvhDC (6.51 MB) Our lagrangian model is being reformulated, because some outputs (like weathering processes) are not included in lagrangian files, only in ascii outputs. If you need further details, just ask. Other issue, just let me correct some information that I sent in the last email in the word document: I proposed a time coordinate based on a array with 6 columns, but I know that's not a standard in CG Conventions - the standard is time (seconds, per example) since an initial reference. I just forgot to include this. I think that the two time formats could be included, because the fact is that in terms of Graphic user interfaces to handle NetCDF files, the seconds since 1992-10-8 15:15:42.5 -6:00 specification is annoying and complex to handle: the initial time reference is variable, as well as the units (In fact, one of the problem in CF conventions is that they are so comprehensive, and include so many options, that in fact we can almost do everything. This is an obstacle to generate new software tools to handle NetCDF files from THREDDS catalogues, per example, as we are doing in ARCOPOL and EASYCO project. I suppose that for particle files, we should be more strict... Best Regards Rodrigo -Original Message- From: Ute Brönner [mailto:ute.broen...@sintef.no] Sent: quinta-feira, 12 de Janeiro de 2012 10:35 To: Rodrigo Fernandes; 'Chris Barker' Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; 'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland'; rsign...@gmail.com Subject: RE: FW: netcdf for particle trajectories Rodrigo, Thank you for your mail and suggestions! While I agree that it might be useful to have grid information as well as particle trajectory information in the same file I would suggest to keep them in separate files. Especially as we are aiming to agree on a standard for particle trajectories which might not always be related to oil. As well I am not sure if the others want to provide grid information at all. The information that you use HDF5 directly instead of NetCDF is interesting. Is that due to lack of functionality in NetCDF libraries or what is the reason for that? Are you willing to share an example? Best, Ute -Original Message- From: Rodrigo Fernandes [mailto:rodrigo.mare...@ist.utl.pt] Sent: Dienstag, 10. Januar 2012 20:09 To: 'Chris Barker'; Ute Brönner Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; 'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland'; rsign...@gmail.com Subject: RE: FW: netcdf for particle trajectories Hi everyone, I suppose most of you (except Mark) doesn't know me, I was introduced in this discussion group through Mark, which I met in Kuwait oil spill modelling working group, where I was presenting the work from our group in terms of oil spill modelling with MOHID (www.mohid.com) and risk management in the Atlantic Area, through some EU projects. I'm an oil spill modeller from MARETEC (www.maretec.org) in IST university (Portugal), and it is a pleasure to participate in your discussion. Sorry for the late feedback. I hope not to increase the entropy in the discussion, but I feel that probably you are more focussed in some technical details than with the properties needed (it's like you are more focussed on how than on what). I have some trouble following some points of your discussion, because we produce our outputs in HDF5, although we can easily convert our input and output files from / to NetCDF. I just hope you don't impose limits to the outputs and standards needed due to some technical details or limitations (like hierarchical structures) on the file formats. In order to avoid this, I think NetCDF4 should be adopted. In fact I'm not a specialist in file formats and their specificities, however, I think I have a clear idea of what I need as an output from a particle tracking model in terms of oil spill. I'm sending my idea in a general format, in a table from the Word document attached. I propose an hierarchical structure, which I think is definitely more convenient. Additionally, I think that some other properties could also be considered to be included in the particle tracking outputs: the wind
Re: [CF-metadata] FW: netcdf for particle trajectories
Hi everyone, I suppose most of you (except Mark) doesn't know me, I was introduced in this discussion group through Mark, which I met in Kuwait oil spill modelling working group, where I was presenting the work from our group in terms of oil spill modelling with MOHID (www.mohid.com) and risk management in the Atlantic Area, through some EU projects. I'm an oil spill modeller from MARETEC (www.maretec.org) in IST university (Portugal), and it is a pleasure to participate in your discussion. Sorry for the late feedback. I hope not to increase the entropy in the discussion, but I feel that probably you are more focussed in some technical details than with the properties needed (it's like you are more focussed on how than on what). I have some trouble following some points of your discussion, because we produce our outputs in HDF5, although we can easily convert our input and output files from / to NetCDF. I just hope you don't impose limits to the outputs and standards needed due to some technical details or limitations (like hierarchical structures) on the file formats. In order to avoid this, I think NetCDF4 should be adopted. In fact I'm not a specialist in file formats and their specificities, however, I think I have a clear idea of what I need as an output from a particle tracking model in terms of oil spill. I'm sending my idea in a general format, in a table from the Word document attached. I propose an hierarchical structure, which I think is definitely more convenient. Additionally, I think that some other properties could also be considered to be included in the particle tracking outputs: the wind velocity at surface, water temperature and currents velocity used by each lagrangian particle could also be interesting. And probably also the particle velocity. This was discussed and adopted as a common standard output from an European project (ECOOP), and I also think some oil spill models have this natively, like MOTHY (from Météo-France). Best regards Rodrigo Fernandes Rodrigo Fernandes MARETEC - Instituto Superior Técnico Secção de Ambiente e Energia - Departamento de Engenharia Mecânica Avenida Rovisco Pais 1049 - 001 Lisboa - Portugal Tel. +351 218 419 434 - Fax: +351 218 419 423 www.mohid.com www.maretec.org -Original Message- From: Chris Barker [mailto:chris.bar...@noaa.gov] Sent: segunda-feira, 28 de Novembro de 2011 19:41 To: Ute Brönner Cc: CF-metadata@cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ Beegle-Krause (cj.beegle-kra...@noaa.gov); Caitlin O'Connor (caitlin.ocon...@noaa.gov); Alex Hadjilambris (alex.hadjilamb...@noaa.gov); Rob Hetland (hetl...@tamu.edu); Rodrigo Fernandes; rsign...@gmail.com Subject: Re: FW: netcdf for particle trajectories On 11/25/2011 5:01 AM, Ute Brönner wrote: Hi folks, I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; yes, it was -- we had some discussion among a subset of teh CF list that was interested in particle model output. so I will try to sum up what we were discussing. IN our group, we've settled on format for the GNOME model (at least for now, we needed to use something) based on the discussion -- Ive been remiss at posting about it to larger group -- I was waiting for the time to write it up a bit more clearly. More on that soon... My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-( when you say grid I'm wondering what you mean -- particle tracks don't produce a grid of data -- maybe we're mixing issue here? So we have different motivations for discussion particle trajectory and netcdf4. First question: Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of Jython etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4? I'm not sure mixin python and Java is going to help here -- the Python libs use the C libs -- so mixing C and Java would probably be a better bet, if you need Java. Jython isn't going to get you C-based Oython packages. (JEPP might, as mentioned in that talk -- though if the goal is functionality that really comes from C, straight JNI might make more sense) Second question: Is there a de facto standard / proposal for writing Particle Trajectory Data which could be CF:featureType:whatever we agree on? The suggestion below is not suitable because: 1) we don't track a particle the whole time, it may disappear and show up again later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we cannot be sure these 1000 are the same as before. This was the whole point of the ragged array approach -- so that's covered. 2) I cannot know the number of time steps in advance. OK -- that is a challenge -- if we know neither the
Re: [CF-metadata] FW: netcdf for particle trajectories
Dear Chris The only place the time dimension is used is for the time and row_size variables, so I can't help thinking there could be a work around. Rich Signell suggested you could just use a as-big-as-you-might-need time coordinate -- which is one option. yes, I think that's a good and easy solution. Get well soon! Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] FW: netcdf for particle trajectories
Sorry I haven't send what I promise yet -- I've been home sick. On 12/2/11 1:37 AM, Jonathan Gregory wrote: I agree, this is a case for a ragged array, with two unlimited dimensions, logically speaking. However such a thing can be accommodated in the new discrete geometry conventions http://www.unidata.ucar.edu/staff/caron/public/CFch9-may10.pdf This describes two ragged array representations, which are intended for this purpose. Only one netCDF unlimited dimension is used; the two logical axes are combined into one netCDF axis. int row_size(time) ; row_size:long_name = number of particles for this time; row_size:sample_dimension = obs ; double time(obs) ; time:standard_name = time; time:units = days since 1970-01-01 00:00:00 ; int particle(obs) particle:long_name=particle number; float quantity(obs) ; quantity:long_name=one of the physical properties of the particle quantity:coordinates = lat lon alt particle ; I'm sorry, I wasn't quite clear -- what you give below is pretty much what we've arrived at, but I had already considered combining the two logical axis into one unlimited dimension. The conflict Ute presented what that he doesn't know at the start of the model run the number of time steps that are required. So he'd like the time dimension to be unlimited as well -- thus two unlimited dimensions. The only place the time dimension is used is for the time and row_size variables, so I can't help thinking there could be a work around. Rich Signell suggested you could just use a as-big-as-you-might-need time coordinate -- which is one option. If I ever feel better -- more later, but I think we are close. If there is also invariant information for each particle, a large enough particle dimension would also be needed and various auxiliary coord vars of this dimension. That would be a trick if you don't know how many particles you're starting with -- another nice use for an unlimited dimension... Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] FW: netcdf for particle trajectories
Dear Chris I think its time to start using netcdf-4 for large collections of point data which need to be compressed. Instead of first making a standard, we need to try out the possibilities and see how it performs. That may be true, time to move on eventually! However, you can use netcdf4 for compression, but stille use a netcdf3 compatible data model, so I'd like to see netcdf4-only features used only if they really are necessary to get the data model we need. That could be done if you can represent the data using a new kind of featureType to be added to the CF chapter on discrete sampling geometries, which will be included in CF 1.6 (coming soon). The text for the discrete sampling geometry chapter is at http://www.unidata.ucar.edu/staff/caron/public/CFch9-feb25_jg.pdf Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] FW: netcdf for particle trajectories
Sorry, quite right, this is correct. http://www.unidata.ucar.edu/staff/caron/public/CFch9-may10.pdf Thanks, Rich. J ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] FW: netcdf for particle trajectories
On 11/29/11 4:15 AM, Jonathan Gregory wrote: That could be done if you can represent the data using a new kind of featureType to be added to the CF chapter on discrete sampling geometries, which will be included in CF 1.6 (coming soon). The text for the discrete sampling geometry chapter is at http://www.unidata.ucar.edu/staff/caron/public/CFch9-feb25_jg.pdf Sorry that the discussion about this has been so disjointed, but I think our needs can not be met with somethign as simple a a new feature type. We've had a bit of discussion about this, both on and off this list, but I don't think anyone has kept good notes of the main points raised. I'll try to write up a proposal soon, but briefly: The goal is to store the output of partical trackingmodels. These are used to mode the advection and dispersion of various substances in a flow field: oil spills, larval transport, pollutants in the atmosphere, etc. Some key features: * In general, what is of interest is a collection (100s to 10,000s, or more... ) of particles, rather than an one individual particle. Thus, it is more likely that the user might ask: where are all the particles at time T? than: How did particle X travel over time? This has consequences on how one stores the data, so that either question can be asked but the first is the more efficient one. * particles can have many associated attributes (properties, etc) that change over time. * Some models create a set of particles at one time, the track them for the duration of the run -- that is the easy case. But many models create and destroy particles as the model runs -- adding particles when increased resolution is desired, removing them as they move out of the domain, or are destroyed by physical processes. This is a key issue -- it is not so straightforward how to store them when they numbers change, and when you don't knoe at the start of teh model run how many particles there will be at any given time, or even the maximum number of particles. With discussion, we had come to something of a consensus that in order to accommodate these needs, a ragged array approach would work. i.e. a 2-d table of sorts, with one row for each time step, and where each row might be any length. There appears to be something of a standard for this in CF already, and we have attempted to use that (more later). We've got a version of this working now in out software, but... The trick that Ute has brought up is that you may now neither how many particles there will be, nor how many time steps. Thus you would like to have two unlimited dimensions, which netcdf3 does not support. We've accomplished it because we know how many time steps will be run before we start. My first thought is that we could use exactly the same format as hs been discusses already, but make it optional to use netcdf4, an an unlimited time dimension. Presumably these files could be easily converted, after the fact, to a netcdf3 format, as the number of time steps would then be known. About netcdf 3 vs. 4 -- it seems netcdf4 has some nice features, after all, it was developed for a reason. However it doesn't not appear to have been widely adopted yet. However, maybe we really shouldn't bend over backwards to fit a data model to netcdf3 anymore -- it's a chick and egg problem, maybe time to make some eggs. For our part, we use the netcdf4 lib with Python anyway, though our C/C++ code is all using netcdf3 -- the burden of compiling the hdf libs is something we choose to avoid, though it's not that big a deal. Anyway -- more soon, I hope. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] FW: netcdf for particle trajectories
On 11/25/2011 5:01 AM, Ute Brönner wrote: Hi folks, I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; yes, it was -- we had some discussion among a subset of teh CF list that was interested in particle model output. so I will try to sum up what we were discussing. IN our group, we've settled on format for the GNOME model (at least for now, we needed to use something) based on the discussion -- Ive been remiss at posting about it to larger group -- I was waiting for the time to write it up a bit more clearly. More on that soon... My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-( when you say grid I'm wondering what you mean -- particle tracks don't produce a grid of data -- maybe we're mixing issue here? So we have different motivations for discussion particle trajectory and netcdf4. First question: Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of Jython etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4? I'm not sure mixin python and Java is going to help here -- the Python libs use the C libs -- so mixing C and Java would probably be a better bet, if you need Java. Jython isn't going to get you C-based Oython packages. (JEPP might, as mentioned in that talk -- though if the goal is functionality that really comes from C, straight JNI might make more sense) Second question: Is there a de facto standard / proposal for writing Particle Trajectory Data which could be CF:featureType:whatever we agree on? The suggestion below is not suitable because: 1) we don't track a particle the whole time, it may disappear and show up again later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we cannot be sure these 1000 are the same as before. This was the whole point of the ragged array approach -- so that's covered. 2) I cannot know the number of time steps in advance. OK -- that is a challenge -- if we know neither the number of time steps, nor the number of particles in advance, then we, by definition, need two unspecified dimensions. I understand netcdf4 allows this -- may be a good reason to go that route. One question, though -- with the proposed ragged_array-specified format, the time dimension is only used in one place - for the: int rowSize(time) (or particleCount, or whatever we want to call it) variable. Is it possible, in netcdf3, to write the big array, with the UNLIMITED dimension, then specify the time dimension and associated variable at the end? Or does it need to all vbe defined at the start? and I might have int number_particles_per_timestep(time); :units = 1; :long_name = number particles per current timestep; :CF:ragged_row_count = particle; That some of you need to know which spill a particle came from, may be solved with a 3rd dimension spill dimensions: spill = 3; unless the spills all have the same number of particles at any given time, that's not going to work. Our solution is to have an ID variable to each particle, so they can be isolated -- this can be used to track a given particle over time, and also mapped to other data, like which spill it came from, etc. // or how many one has particle = UNLIMITED; //because it may change each time step actually ULIMITED does help if it's going to change each time step (hence the ragged array solution) -- but it is required as we often don't know how many particles are going to be used in the end. how would one write this? With coordinates or as hierarchical data structure? At least we need the ability to use several unlimited dimensions and the ragged-array feature. apparently, yes. Third question: How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 directly with hierarchical data. I do think compression and hierarchical data structure are separate issues. netcdf4 is certainly the easy way to get compression, IIUC, to compress neetcdf3, you need to do it before/after file reading/writing -- so helpful for storing and transmitting the data, but you still need to deal with the big files at some stage. (or has anyone adapted a netcdf lib to use on-the fly compression (like with libz) -- that would be cool) Hoping to get up the discussion again and that we agree on a standard quite soon! yes, thanks for reviving it! -Chris Have a nice weekend! Best, Ute Original Message Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation data in CF 1.4) Date: Fri, 19 Nov 2010 04:15:35 +0100 From: John Caronca...@unidata.ucar.edu To: cf-metadata@cgd.ucar.educf-metadata@cgd.ucar.edu Im thinking that we need a new feature type for this. Im calling it particleTrack but theres probably a better name. My
Re: [CF-metadata] FW: netcdf for particle trajectories
On 11/26/2011 9:14 AM, John Caron wrote: Im intending to incorporate the netcdf-4 C library into the netcdf-java library using JNI. Im hoping to have something working in the next few months, but we'll see. This will be an optional component, and will obviously make portability an issue. Good idea, none the less. If you want to use Python, probably the one to use is Jeff Whittaker's at http://code.google.com/p/netcdf4-python/, which is also an interface to the netcdf-4 C library. yes -- I think that's the best option for Python -- it's nicely done. I think its time to start using netcdf-4 for large collections of point data which need to be compressed. Instead of first making a standard, we need to try out the possibilities and see how it performs. That may be true, time to move on eventually! However, you can use netcdf4 for compression, but stille use a netcdf3 compatible data model, so I'd like to see netcdf4-only features used only if they really are necessary to get the data model we need. I think you want to use Structures, as well as multiple unlimited dimensions. With netcdf, we dont need the ragged array mecahnism - thats only needed to overcome the limitations of the classic model. Can you tell us more? how do you express a ragged_array in netcdf 4? Variable length user-defined types, maybe? This is all a bit frustrating, as we've had a fair bit of discussion, and I though had settled on ragged arrays, and I don't think anyone said (this would all be so much easier in netcdf 4) Ute: I'm a bit confused -- was your 11 GB file a result of using the ragged array approach, or of using the rectangular array with LOTS of empty values approach? I don't think compression is the answer to the problem of how to store what is naturally a ragged array -- partly because it simply doesn't appeal to me aesthetically, but also because it hides, and moves the problem -- the tools really should understand and be able to work with the fact that the number oar particles is not the same at all times, and we don't want to have the client apps to deal with a lot of empty arrays either. Note that it seems netcdf4 groups could be handy for dealing with multiple spills. More soon on our current solution. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] FW: netcdf for particle trajectories
Hi Ute: On 11/25/2011 6:01 AM, Ute Brönner wrote: Hi folks, I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; so I will try to sum up what we were discussing. My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-( So we have different motivations for discussion particle trajectory and netcdf4. First question: Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of Jython etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4? Im intending to incorporate the netcdf-4 C library into the netcdf-java library using JNI. Im hoping to have something working in the next few months, but we'll see. This will be an optional component, and will obviously make portability an issue. If you want to use Python, probably the one to use is Jeff Whittaker's at http://code.google.com/p/netcdf4-python/, which is also an interface to the netcdf-4 C library. Second question: Is there a de facto standard / proposal for writing Particle Trajectory Data which could be CF:featureType:whatever we agree on? The suggestion below is not suitable because: 1) we don't track a particle the whole time, it may disappear and show up again later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we cannot be sure these 1000 are the same as before. 2) I cannot know the number of time steps in advance. I think its time to start using netcdf-4 for large collections of point data which need to be compressed. Instead of first making a standard, we need to try out the possibilities and see how it performs. I think you want to use Structures, as well as multiple unlimited dimensions. With netcdf, we dont need the ragged array mecahnism - thats only needed to overcome the limitations of the classic model. Has anyone started down this path? If so, can you post example netcdf-4 files? I would like sth. like dimensions: particle = UNLIMITED; //because it may change each time step time = UNLIMITED; // because I don't know then every variable is like latitude (particle, time) longitude (particle, time) and I might have int number_particles_per_timestep(time); :units = 1; :long_name = number particles per current timestep; :CF:ragged_row_count = particle; That some of you need to know which spill a particle came from, may be solved with a 3rd dimension spill dimensions: spill = 3; // or how many one has particle = UNLIMITED; //because it may change each time step time = UNLIMITED; // because I don't know particle (spill, time) then every variable is like latitude (particle) longitude (particle) how would one write this? With coordinates or as hierarchical data structure? At least we need the ability to use several unlimited dimensions and the ragged-array feature. Third question: How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 directly with hierarchical data. As in my example above I would need to write out a 11 GB file and then deflate it like described here http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html or with Rich's script; but is that really necessary? Hoping to get up the discussion again and that we agree on a standard quite soon! Have a nice weekend! Best, Ute Original Message Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation data in CF 1.4) Date: Fri, 19 Nov 2010 04:15:35 +0100 From: John Caronca...@unidata.ucar.edu To: cf-metadata@cgd.ucar.educf-metadata@cgd.ucar.edu Im thinking that we need a new feature type for this. Im calling it particleTrack but theres probably a better name. My reasoning is that the nested table representation of trajectories is: Table { traj_id; Table { time; lat, lon, z; data; } } but this case has the inner and outer table inverted: Table { time; Table { particle_id; lat, lon, z; data; data2; } } So, following that line of thought, the possibilities in CDL are: 1) If avg number of particles ~ max number of particles at any time step, then one could use multdimensional arrays: dimensions: maxParticles = 1000 ; time = ; // may be UNLIMITED variables: double time(time) ; int particle_id(time, maxParticles) ; float lon(time, maxParticles) ; float lat(time, maxParticles) ; float z(time, maxParticles) ; float data(time, maxParticles) ; attributes: :featureType = particleTrack; note maxParticles is the max number of particles at any one time step, not total particle tracks. The particle trajectories have to be found by examining the values of particle_id(time, maxParticles). 2) The CDL of the ragged case would
[CF-metadata] FW: netcdf for particle trajectories
Hi folks, I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; so I will try to sum up what we were discussing. My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-( So we have different motivations for discussion particle trajectory and netcdf4. First question: Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of Jython etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4? Second question: Is there a de facto standard / proposal for writing Particle Trajectory Data which could be CF:featureType: whatever we agree on? The suggestion below is not suitable because: 1) we don't track a particle the whole time, it may disappear and show up again later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we cannot be sure these 1000 are the same as before. 2) I cannot know the number of time steps in advance. I would like sth. like dimensions: particle = UNLIMITED; //because it may change each time step time = UNLIMITED; // because I don't know then every variable is like latitude (particle, time) longitude (particle, time) and I might have int number_particles_per_timestep(time); :units = 1; :long_name = number particles per current timestep; :CF:ragged_row_count = particle; That some of you need to know which spill a particle came from, may be solved with a 3rd dimension spill dimensions: spill = 3; // or how many one has particle = UNLIMITED; //because it may change each time step time = UNLIMITED; // because I don't know particle (spill, time) then every variable is like latitude (particle) longitude (particle) how would one write this? With coordinates or as hierarchical data structure? At least we need the ability to use several unlimited dimensions and the ragged-array feature. Third question: How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 directly with hierarchical data. As in my example above I would need to write out a 11 GB file and then deflate it like described here http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html or with Rich's script; but is that really necessary? Hoping to get up the discussion again and that we agree on a standard quite soon! Have a nice weekend! Best, Ute Original Message Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation data in CF 1.4) Date: Fri, 19 Nov 2010 04:15:35 +0100 From: John Caron ca...@unidata.ucar.edu To: cf-metadata@cgd.ucar.edu cf-metadata@cgd.ucar.edu Im thinking that we need a new feature type for this. Im calling it particleTrack but theres probably a better name. My reasoning is that the nested table representation of trajectories is: Table { traj_id; Table { time; lat, lon, z; data; } } but this case has the inner and outer table inverted: Table { time; Table { particle_id; lat, lon, z; data; data2; } } So, following that line of thought, the possibilities in CDL are: 1) If avg number of particles ~ max number of particles at any time step, then one could use multdimensional arrays: dimensions: maxParticles = 1000 ; time = ; // may be UNLIMITED variables: double time(time) ; int particle_id(time, maxParticles) ; float lon(time, maxParticles) ; float lat(time, maxParticles) ; float z(time, maxParticles) ; float data(time, maxParticles) ; attributes: :featureType = particleTrack; note maxParticles is the max number of particles at any one time step, not total particle tracks. The particle trajectories have to be found by examining the values of particle_id(time, maxParticles). 2) The CDL of the ragged case would look like: dimensions: obs = 50; // UNLIMITED time = ; variables: int time(time) ; int rowSize(time) ; int particle_id(obs) ; float lon(obs) ; float lat(obs) ; float z(obs) ; float data(obs) ; attributes: :featureType = particleTrack; in this case, you dont have to know the max number of particles at any one time step, but you do need to know the number of time steps beforehand. The particle trajectories have to be found by examining the values of particle_id(obs). The particles at time step i are contained in the obs variables between start(i) to start(i) + rowSize(i). these layouts are optimized for processing all particles at a given time, and for sequentially processing time steps. If one wanted to process particle trajectories, that will be much slower. If you needed to do it a lot, you might want to rewrite the file. a more sophisticated application, possibly a server, could write an index to speed it up.
Re: [CF-metadata] FW: netcdf for particle trajectories
You might prefer to try Nujan instead of mixing python and netcdf, although variables are limited to 2GB http://www.ral.ucar.edu/~steves/nujan.html On Fri, Nov 25, 2011 at 11:01 AM, Ute Brönner ute.broen...@sintef.no wrote: Hi folks, I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; so I will try to sum up what we were discussing. My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-( So we have different motivations for discussion particle trajectory and netcdf4. First question: Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of Jython etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4? Second question: Is there a de facto standard / proposal for writing Particle Trajectory Data which could be CF:featureType: whatever we agree on? The suggestion below is not suitable because: 1) we don't track a particle the whole time, it may disappear and show up again later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we cannot be sure these 1000 are the same as before. 2) I cannot know the number of time steps in advance. I would like sth. like dimensions: particle = UNLIMITED; //because it may change each time step time = UNLIMITED; // because I don't know then every variable is like latitude (particle, time) longitude (particle, time) and I might have int number_particles_per_timestep(time); :units = 1; :long_name = number particles per current timestep; :CF:ragged_row_count = particle; That some of you need to know which spill a particle came from, may be solved with a 3rd dimension spill dimensions: spill = 3; // or how many one has particle = UNLIMITED; //because it may change each time step time = UNLIMITED; // because I don't know particle (spill, time) then every variable is like latitude (particle) longitude (particle) how would one write this? With coordinates or as hierarchical data structure? At least we need the ability to use several unlimited dimensions and the ragged-array feature. Third question: How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 directly with hierarchical data. As in my example above I would need to write out a 11 GB file and then deflate it like described here http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html or with Rich's script; but is that really necessary? Hoping to get up the discussion again and that we agree on a standard quite soon! Have a nice weekend! Best, Ute Original Message Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation data in CF 1.4) Date: Fri, 19 Nov 2010 04:15:35 +0100 From: John Caron ca...@unidata.ucar.edu To: cf-metadata@cgd.ucar.edu cf-metadata@cgd.ucar.edu Im thinking that we need a new feature type for this. Im calling it particleTrack but theres probably a better name. My reasoning is that the nested table representation of trajectories is: Table { traj_id; Table { time; lat, lon, z; data; } } but this case has the inner and outer table inverted: Table { time; Table { particle_id; lat, lon, z; data; data2; } } So, following that line of thought, the possibilities in CDL are: 1) If avg number of particles ~ max number of particles at any time step, then one could use multdimensional arrays: dimensions: maxParticles = 1000 ; time = ; // may be UNLIMITED variables: double time(time) ; int particle_id(time, maxParticles) ; float lon(time, maxParticles) ; float lat(time, maxParticles) ; float z(time, maxParticles) ; float data(time, maxParticles) ; attributes: :featureType = particleTrack; note maxParticles is the max number of particles at any one time step, not total particle tracks. The particle trajectories have to be found by examining the values of particle_id(time, maxParticles). 2) The CDL of the ragged case would look like: dimensions: obs = 50; // UNLIMITED time = ; variables: int time(time) ; int rowSize(time) ; int particle_id(obs) ; float lon(obs) ; float lat(obs) ; float z(obs) ; float data(obs) ; attributes: :featureType = particleTrack; in this case, you dont have to know the max number of particles at any one time step, but you do need to know the number of time steps beforehand. The particle trajectories have to be found by examining the values of particle_id(obs). The particles at time step i are contained in the obs variables between start(i) to start(i) + rowSize(i). these layouts are optimized for processing all particles at a given