Re: [CF-metadata] Swath observational data

2009-11-20 Thread John Caron

Raskin, Rob (388M) wrote:

While the Point observational conventions document is undergoing final review, 
I want to initiate a discussion on a complementary topic - Swath observational 
conventions. This model addresses satellite observational measurements and 
potentially airborne measurements.

The Swath conceptual model is essentially a grid in spacecraft coordinates. One dimension of this grid (along_track) follows 
the path of the satellite. Normally there are one or two additional dimensions: cross_track and/or vertical. The 
cross_track dimension is perpendicular to the satellite path, as the instrument typically makes side views of the 
surface rather than just at the nadir. The vertical dimension is present when a vertical profiler instrument is used. 
CF:FeatureType will need to account for each possible combination of these 2-D and 3-D swaths.

Typically, time is explicitly stored and associated only with the along-track dimension. Spatial resolution generally will differ in the along_track and cross_track directions. 

Orbits are not mapped to files in any consistent way: a file might correspond to a complete orbit, a half-orbit, or some other value. However, it is common to explicitly consider yet another dimension: satellite_node, with values ascending (crosses equator going northward) and descending (crosses equator going southward).  


Common satellites are in sun-synchronous polar orbits such that the ascending node 
remains at a near constant Local Solar Time (LST), while the descending node remains at a 
near constant LST shifted by 12 hours. For example, the ascending node may be at 6am LST 
and the descending node at 6pm LST. Often gridded data products are produced from these 
swaths, with separate grids corresponding to the AM and PM cases. A new CF time 
representation for LST is required to indicate that the global data are all 
at a time such as 6am LST.

Unrelated to the swath geometry, some measurements use spectral band as an independent variable, as they 
sample at multiple channels. This capability requires a new standard name for 
spectral_band or spectral_channel with values that may be numeric, a numeric range, 
or string.

Swath data include many new dependent variables that correspond to engineering 
parameters of the retrieval rather than geophysical parameters (point spread 
function is a common example). If these names are standardized at all, they 
should be indicated as being of the engineering type.

In the case of an airborne (rather than satellite) measurement, there is more commonality 
with the trajectory representation from the Point observation model. Hence, 
the focus here is on spacecraft measurements.

Finally, on an unrelated note, I have semantically mapped the entire CF 
Standard Name list to an ontological representation. But that is the subject of 
a separate communication.

-Rob



Hi Rob, thanks for starting this up.

We have done some preliminary thinking about the swath feature type in the 
CDM data model, though we dont have any implementations.

A prototype coordinate system would look something like:  


dimension:
 scan = 1234;
 xscan = 987;
 wavelength = 123;

variables:
 double lon(scan, xscan);
 double lat(scan, xscan);
 double alt(scan, xscan);
 double time(scan);
 double wavelength(wavelength);

 byte data( scan, xscan);
   data:coordinates = lon lat time alt;

 byte spectral( scan, xscan, wavelength);
   spectral:coordinates = lon lat time alt;


I think this should handle zigzags or grids, although perhaps adding a scan 
strategy attribute would be good.

The geometry of each point is an interesting wrinkle, and may need some new 
conventions. would a rotated ellipse work (3 params) or do we need a more 
general polygon? Does it have to be specified per point, or can is be common to 
all points? I would imagine that quick visualizers might ignore the details of 
this (essentially assuming a tesselating grid), but more sophisticated and 
specialized tools would need this.


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Multiple file datasets (was: Swath observational data)

2009-11-20 Thread John Caron

This topic deserves its own heading, so here it is.

Perhaps we should gather current practices and ideas. I think Balaji's gridspec 
has a proposal about this. Can anyone summarize what SAFE does?

Im imagining how this is actually used, eg:

float data(y,x);
 data:coordinates = l...@file1 l...@file2;





John Graybeal wrote:

I like Bryan's recommendation for a UUID or similar.

Now I'm going to be annoying and suggest the UUID *could* be a URI, or 
these days, an IRI (International ..).


And I think the way of 'locating' the file should be neither in 
packaging nor in local resolution; it should be in global namespace 
resolution.  This is the way of the future, and is already more 
'permanent' than either packaging or local resolution, IMHO.


There is one form of URI in particular that is already resolvable: a 
URL.  OK, that's an old song, but I'm gonna stick to it for a while 
longer.  That form meets all the other requirements: it can be 
registered in a resolver, it can be guaranteed unique (to the same 
authority level as a UUID, anyway), and it is a unique string that can 
be used to validate the link).  And it has the obvious benefit of being 
resolvable right now, for as long as the domain is held and properly 
maintained (Good URLs don't die).


Since the last paragraph risks starting another unique identifier war, I 
promise not to re-engage unless someone asks me to. Meanwhile, I like


John


On Nov 19, 2009, at 22:23, Bryan Lawrence wrote:


On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:

...  In  some cases, referencing attributes such as
 coordinates and ancillary_variables would, ideally, point to a
 variable in a different dataset.


This is a general problem to which CF doesn't have a solution because 
it was
conceived as a convention for single netCDF files. However we need a 
solution

as often several files should be treated as a single dataset.

If the files don't overlap i.e. their contents are complementary, I 
think it
should be satisfactory to allow variables in one file to be pointed 
to by name
from another file, with no other mechanism being required within the 
file. I
don't like the idea of naming one file within another file, as that 
would be

very fragile. Instead, I think the file aggregation should be implied by
simply defining the group of files which are to be treated as one 
file e.g.

by putting them in one directory.


It's the old ones that are the best ones :-) :-)  this issue keeps on 
coming back ... :-) :-) and we keep trying to ignore it ...


I think we agree that an actual physical filename including path is 
useless. We need both  a relative link which relies on the 
preservation of a group of files in a particular arrangement ...  AND 
an internal identifier so more robust linking mechanisms can be used 
when (if) the data ends up in a managed environment.


I think it's crucial in this situation to ensure that each file has a 
unique identifier within it (created, for example, with uuid), because 
all solutions which rely on packaging are fragile (SAFE is probably 
better than most), but the bottom line is that users move files around 
... and we need some way of ensuring that we/they can validate the 
links that are in place are the ones that were originally intended.


So relative links would also include the identifier of the intended 
target as well as the relative path in operating system agnostic terms.


That identifier can be used in two ways: to validate the link (my 
software can always check that the variable that I just opened 
following a link from another one is the one that was expected by 
checking the container identifier), and b) to produce an identifier 
resolver service for the situation where the packaging has had to be 
broken (which might occur for performance reasons or ...)


CF could recommend something like this ...

Bryan

--
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848;
Web: home.badc.rl.ac.uk/lawrence
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



--
I have my new work email address: jgrayb...@ucsd.edu
--

John Graybeal   mailto:jgrayb...@ucsd.edu
phone: 858-534-2162
Development Manager
Ocean Observatories Initiative Cyberinfrastructure Project: 
http://ci.oceanobservatories.org

Marine Metadata Interoperability Project: http://marinemetadata.org

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Multiple file datasets (was: Swath observational data)

2009-11-20 Thread Stephen Emsley
 Can anyone summarize what SAFE does?

I will give it a shot as I brought it up in the first place!

The Standard Archive Format for Europe (SAFE) was developed as a common format 
for archiving to ensure long-term preservation of EO data holdings, both 
historical and operational. The SAFE website [www.esa.int/safe] is the official 
ESA maintained site for the maintenance and distribution of the standard 
format, specification, XML-schemas and tools.

SAFE is a specialisation of the XML Formatted Data Unit (XFDU), a CCSDS 
(Consultative Committee for Space Data Systems) recommended standard for the 
packaging of data and metadata to facilitate information transfer and 
archiving. Every SAFE product is an XFDU package. SAFE is a specialisation of 
XFDU, which defines a restriction of the generic XFDU package. SAFE inherits 
its main structure from XFDU packaging format and defines high level 
constraints and new rules for Earth Observation ground segment data products.

A SAFE product wraps, or references, data and associates that data with 
metadata, both global and local. SAFE product metadata contains basic 
information, such as the acquisition period, platform and sensor identification 
and a processing history to ensure traceability. For each included, or external 
referenced, dataset another layer of associated metadata may be attached 
providing orbit and geo-location information, quality information and 
representational information.

Basically a SAFE product is a directory. At the top level is a manifest file, 
written in XML, that provides both a map of the contained data sets, defines 
the relationships between these datasets, and contains global metadata (such as 
platform name, acquisition period etc.). There is a set of required metadata 
defined by the SAFE specialisation (e.g. there is an ENVISAT specialisation, 
further restricted to apply to, say, MERIS, and still further specialised to, 
say, Level 1 processed products).

The contained datasets are collections of records. They are of three types:

Measurement Data Sets: These are typically binary format files and, in our 
case, will be netCDF-CF files. As an example we will have 46 measurement data 
products and each will be stored at a netCDF file (data record) along with a 
data record containing associated quality information and another containing 
status flags.

Annotation Data Sets: These contain metadata and common data. Although to be 
decided in the case of Sentinel 3 Level 2 we are considering storing a common 
set of coordinate data that is applicable to subsets of the measurement data. 
The manifest file will provide the association between specific measurement 
datasets and the associated coordinate data.

Representation Data Sets: These are XML Schema descriptions of the measurement 
and annotation datasets. Firstly it is a key concept for OAIS digital 
preservation and secondarily third party applications may use these for 
displaying / accessing the corresponding measurement data sets. I appreciate 
that it might seem a little 'belt-and-braces' to have an XML schema for a 
netCDF file (which is by nature self-describing) but that is how the SAFE 
people have decided to include netCDF into the convention.

There is a third type of data which can be considered as resources. These may 
be, for instance, data required for the generation of the end-user data 
products. For instance, for Level 2 data products they would include the Level 
1 input products and possibly, for instance, ECMWF data required for processing 
(although the latter might equally be an annotation dataset). These resources 
are not packaged inside a SAFE container but are referenced (in the manifest 
file) using a URI.

All of these taken together are a SAFE package.

I hope that this provides a reasonably informative overview. The SAFE website 
is the place to go for more detailed info.

Steve


---
Dr Stephen Emsley       
Tel: +44 (0)1752 764 289
  ARGANS Limited  
Mobile: +44 (0)7912 515 418


-Original Message-
From: cf-metadata-boun...@cgd.ucar.edu 
[mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Caron
Sent: 20 November 2009 12:30
To: cf-metadata@cgd.ucar.edu
Subject: [CF-metadata] Multiple file datasets (was: Swath observational data)

This topic deserves its own heading, so here it is.

Perhaps we should gather current practices and ideas. I think Balaji's gridspec 
has a proposal about this. Can anyone summarize what SAFE does?

Im imagining how this is actually used, eg:

float data(y,x);
  data:coordinates = l...@file1 l...@file2;





John Graybeal wrote:
 I like Bryan's recommendation for a UUID or similar.
 
 Now I'm going to be annoying and suggest the UUID *could* be a URI, or 
 these days, an IRI (International ..).
 
 And I think the way of 'locating' the file should be neither in 
 packaging nor 

Re: [CF-metadata] [CF Metadata] #37: Conventions for Point Observation Data

2009-11-20 Thread John Caron

Martina Stockhause wrote:

Hi John,

right thanks, we could describe several z coordinates. In our case with
z dimensions:

dimensions:
station = 8 ;
time = UNLIMITED ;
lon = 1;
lat = 1;
z0 = 1;// e.g. VTP in 110 m
z1 = 7;// MINERVA
z2 = 1050;  // MRR (rain radar)

The constructors of meteorological instruments weren't able to design an
instrument beautiful enough to be called APHRODITE, yet.

Nevertheless we probably will stay with separated files for each
instrument type to keep the files simple and their contents close to our
provided ASCII versions.

Thanks a lot,
Martina


yes, very good. just checking that if you wanted to store 
multiple instruments in one file, the proposal would work.

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Multiple file datasets (was: Swath observational data)

2009-11-20 Thread Stephen Emsley
Sorry, my mistake:

http://earth.esa.int/SAFE

is the correct address for the SAFE web site


---
Dr Stephen Emsley       
Tel: +44 (0)1752 764 289
  ARGANS Limited  
Mobile: +44 (0)7912 515 418


-Original Message-
From: cf-metadata-boun...@cgd.ucar.edu 
[mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of John Caron
Sent: 20 November 2009 12:30
To: cf-metadata@cgd.ucar.edu
Subject: [CF-metadata] Multiple file datasets (was: Swath observational data)

This topic deserves its own heading, so here it is.

Perhaps we should gather current practices and ideas. I think Balaji's gridspec 
has a proposal about this. Can anyone summarize what SAFE does?

Im imagining how this is actually used, eg:

float data(y,x);
  data:coordinates = l...@file1 l...@file2;





John Graybeal wrote:
 I like Bryan's recommendation for a UUID or similar.
 
 Now I'm going to be annoying and suggest the UUID *could* be a URI, or 
 these days, an IRI (International ..).
 
 And I think the way of 'locating' the file should be neither in 
 packaging nor in local resolution; it should be in global namespace 
 resolution.  This is the way of the future, and is already more 
 'permanent' than either packaging or local resolution, IMHO.
 
 There is one form of URI in particular that is already resolvable: a 
 URL.  OK, that's an old song, but I'm gonna stick to it for a while 
 longer.  That form meets all the other requirements: it can be 
 registered in a resolver, it can be guaranteed unique (to the same 
 authority level as a UUID, anyway), and it is a unique string that can 
 be used to validate the link).  And it has the obvious benefit of being 
 resolvable right now, for as long as the domain is held and properly 
 maintained (Good URLs don't die).
 
 Since the last paragraph risks starting another unique identifier war, I 
 promise not to re-engage unless someone asks me to. Meanwhile, I like
 
 John
 
 
 On Nov 19, 2009, at 22:23, Bryan Lawrence wrote:
 
 On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:
 ...  In  some cases, referencing attributes such as
  coordinates and ancillary_variables would, ideally, point to a
  variable in a different dataset.

 This is a general problem to which CF doesn't have a solution because 
 it was
 conceived as a convention for single netCDF files. However we need a 
 solution
 as often several files should be treated as a single dataset.

 If the files don't overlap i.e. their contents are complementary, I 
 think it
 should be satisfactory to allow variables in one file to be pointed 
 to by name
 from another file, with no other mechanism being required within the 
 file. I
 don't like the idea of naming one file within another file, as that 
 would be
 very fragile. Instead, I think the file aggregation should be implied by
 simply defining the group of files which are to be treated as one 
 file e.g.
 by putting them in one directory.

 It's the old ones that are the best ones :-) :-)  this issue keeps on 
 coming back ... :-) :-) and we keep trying to ignore it ...

 I think we agree that an actual physical filename including path is 
 useless. We need both  a relative link which relies on the 
 preservation of a group of files in a particular arrangement ...  AND 
 an internal identifier so more robust linking mechanisms can be used 
 when (if) the data ends up in a managed environment.

 I think it's crucial in this situation to ensure that each file has a 
 unique identifier within it (created, for example, with uuid), because 
 all solutions which rely on packaging are fragile (SAFE is probably 
 better than most), but the bottom line is that users move files around 
 ... and we need some way of ensuring that we/they can validate the 
 links that are in place are the ones that were originally intended.

 So relative links would also include the identifier of the intended 
 target as well as the relative path in operating system agnostic terms.

 That identifier can be used in two ways: to validate the link (my 
 software can always check that the variable that I just opened 
 following a link from another one is the one that was expected by 
 checking the container identifier), and b) to produce an identifier 
 resolver service for the situation where the packaging has had to be 
 broken (which might occur for performance reasons or ...)

 CF could recommend something like this ...

 Bryan

 -- 
 Bryan Lawrence
 Director of Environmental Archival and Associated Research
 (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
 STFC, Rutherford Appleton Laboratory
 Phone +44 1235 445012; Fax ... 5848;
 Web: home.badc.rl.ac.uk/lawrence
 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu
 http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
 
 
 --
 I have my new 

Re: [CF-metadata] Multiple file datasets (was: Swath observational data)

2009-11-20 Thread V. Balaji

The gridspec indeed had a proposal about this. Clearly it was a bit
off-topic, but some mechanism of referring to other files was needed. It
consists of an attribute called a link_spec, which has attributes of a
baseURL, a relative pathname, and a checksum for verifying whether the
external file being referenced is indeed the one you're looking for.
There wasn't a special v...@link syntax, but I don't see why it couldn't
have had one.

CMIP5 is proposing a simplified variant on the link_spec. A file
can have a global attribute associated_files which are also
formed out of a baseURL and relative pathnames. The only permitted
associated_files are gridspec, and cell areas and volumes that may
be used in cell_methods.

Other approaches have been proposed in this forum, most notably on Trac
#24 and #27, the common_concept thread and Benno's namespace thread.

SAFE has been explained already in this thread.

I agree with John, it would be good to consider this problem in
isolation, without the baggage of gridspecs or common concepts or
namespaces.

John Caron writes:


This topic deserves its own heading, so here it is.

Perhaps we should gather current practices and ideas. I think Balaji's 
gridspec has a proposal about this. Can anyone summarize what SAFE does?


Im imagining how this is actually used, eg:

float data(y,x);
data:coordinates = l...@file1 l...@file2;





John Graybeal wrote:

I like Bryan's recommendation for a UUID or similar.

Now I'm going to be annoying and suggest the UUID *could* be a URI, or 
these days, an IRI (International ..).


And I think the way of 'locating' the file should be neither in packaging 
nor in local resolution; it should be in global namespace resolution.  This 
is the way of the future, and is already more 'permanent' than either 
packaging or local resolution, IMHO.


There is one form of URI in particular that is already resolvable: a URL. 
OK, that's an old song, but I'm gonna stick to it for a while longer.  That 
form meets all the other requirements: it can be registered in a resolver, 
it can be guaranteed unique (to the same authority level as a UUID, 
anyway), and it is a unique string that can be used to validate the link). 
And it has the obvious benefit of being resolvable right now, for as long 
as the domain is held and properly maintained (Good URLs don't die).


Since the last paragraph risks starting another unique identifier war, I 
promise not to re-engage unless someone asks me to. Meanwhile, I like


John


On Nov 19, 2009, at 22:23, Bryan Lawrence wrote:


On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:

...  In  some cases, referencing attributes such as
 coordinates and ancillary_variables would, ideally, point to a
 variable in a different dataset.


This is a general problem to which CF doesn't have a solution because it 
was
conceived as a convention for single netCDF files. However we need a 
solution

as often several files should be treated as a single dataset.

If the files don't overlap i.e. their contents are complementary, I think 
it
should be satisfactory to allow variables in one file to be pointed to by 
name
from another file, with no other mechanism being required within the 
file. I
don't like the idea of naming one file within another file, as that would 
be

very fragile. Instead, I think the file aggregation should be implied by
simply defining the group of files which are to be treated as one file 
e.g.

by putting them in one directory.


It's the old ones that are the best ones :-) :-)  this issue keeps on 
coming back ... :-) :-) and we keep trying to ignore it ...


I think we agree that an actual physical filename including path is 
useless. We need both  a relative link which relies on the preservation of 
a group of files in a particular arrangement ...  AND an internal 
identifier so more robust linking mechanisms can be used when (if) the 
data ends up in a managed environment.


I think it's crucial in this situation to ensure that each file has a 
unique identifier within it (created, for example, with uuid), because all 
solutions which rely on packaging are fragile (SAFE is probably better 
than most), but the bottom line is that users move files around ... and we 
need some way of ensuring that we/they can validate the links that are in 
place are the ones that were originally intended.


So relative links would also include the identifier of the intended target 
as well as the relative path in operating system agnostic terms.


That identifier can be used in two ways: to validate the link (my software 
can always check that the variable that I just opened following a link 
from another one is the one that was expected by checking the container 
identifier), and b) to produce an identifier resolver service for the 
situation where the packaging has had to be broken (which might occur for 
performance reasons or ...)


CF could recommend something like this ...

Bryan

--
Bryan 

Re: [CF-metadata] Multiple file datasets

2009-11-20 Thread John Caron

Stephen Emsley wrote:

Can anyone summarize what SAFE does?


I will give it a shot as I brought it up in the first place!

The Standard Archive Format for Europe (SAFE) was developed as a common format 
for archiving to ensure long-term preservation of EO data holdings, both 
historical and operational. The SAFE website [www.esa.int/safe] is the official 
ESA maintained site for the maintenance and distribution of the standard 
format, specification, XML-schemas and tools.

SAFE is a specialisation of the XML Formatted Data Unit (XFDU), a CCSDS 
(Consultative Committee for Space Data Systems) recommended standard for the 
packaging of data and metadata to facilitate information transfer and 
archiving. Every SAFE product is an XFDU package. SAFE is a specialisation of 
XFDU, which defines a restriction of the generic XFDU package. SAFE inherits 
its main structure from XFDU packaging format and defines high level 
constraints and new rules for Earth Observation ground segment data products.

A SAFE product wraps, or references, data and associates that data with 
metadata, both global and local. SAFE product metadata contains basic 
information, such as the acquisition period, platform and sensor identification 
and a processing history to ensure traceability. For each included, or external 
referenced, dataset another layer of associated metadata may be attached 
providing orbit and geo-location information, quality information and 
representational information.

Basically a SAFE product is a directory. At the top level is a manifest file, 
written in XML, that provides both a map of the contained data sets, defines 
the relationships between these datasets, and contains global metadata (such as 
platform name, acquisition period etc.). There is a set of required metadata 
defined by the SAFE specialisation (e.g. there is an ENVISAT specialisation, 
further restricted to apply to, say, MERIS, and still further specialised to, 
say, Level 1 processed products).

The contained datasets are collections of records. They are of three types:

Measurement Data Sets: These are typically binary format files and, in our 
case, will be netCDF-CF files. As an example we will have 46 measurement data 
products and each will be stored at a netCDF file (data record) along with a 
data record containing associated quality information and another containing 
status flags.

Annotation Data Sets: These contain metadata and common data. Although to be 
decided in the case of Sentinel 3 Level 2 we are considering storing a common 
set of coordinate data that is applicable to subsets of the measurement data. 
The manifest file will provide the association between specific measurement 
datasets and the associated coordinate data.

Representation Data Sets: These are XML Schema descriptions of the measurement 
and annotation datasets. Firstly it is a key concept for OAIS digital 
preservation and secondarily third party applications may use these for 
displaying / accessing the corresponding measurement data sets. I appreciate 
that it might seem a little 'belt-and-braces' to have an XML schema for a 
netCDF file (which is by nature self-describing) but that is how the SAFE 
people have decided to include netCDF into the convention.

There is a third type of data which can be considered as resources. These may 
be, for instance, data required for the generation of the end-user data 
products. For instance, for Level 2 data products they would include the Level 
1 input products and possibly, for instance, ECMWF data required for processing 
(although the latter might equally be an annotation dataset). These resources 
are not packaged inside a SAFE container but are referenced (in the manifest 
file) using a URI.

All of these taken together are a SAFE package.

I hope that this provides a reasonably informative overview. The SAFE website 
is the place to go for more detailed info.

Steve


Thanks, Steve for the summary.

A quick perusal of the SAFE spec for our purposes indicates that the referenced 
file is a full path HTTP URL:

 The fileLocation element specifies an HTTP GET URL to request the latest version 
of data from an online registry/repository.

I suppose we are interested only in local netcdf files?

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Swath observational data

2009-11-20 Thread tjn98
Dear All,

I think there may be two distinct cases here:

1) Local cross-referencing, where it is only necessary to establish
   a relationship within a well-defined grouping of files,

2) Referencing to a universal resource, such as a specific file
   held on a server.

For the former, it should only be necessary that every NetCDF file
within the grouping holds the same unique identifier (this could
be the product or group name, or an ID string from a managed soure).
Satellite swath products, where they have this sort of structure,
almost always fall in the first category. In general, a user would
want to use his or her local copy of a file, rather than re-download
a remote file.

This may be redundant by now, but my thoughts were that:

1) We only consider whether we can extend cross-referencing within
   a local scope,

2) All related files within the scope should contain the same unique
   identifier, perhaps a global attribute named something like
   ³cross_reference_ID².

3) Referenced variable names within the scope should be unique and so
   do not need modifiers. An alternative is that modifiers are not
   needed in references by default, but could be included to
   disambiguate variables - perhaps in a form like ³geo:latitude²
   where geo.nc is the file containing the required latitude variable.

If the attribute contains an empty string or is absent, CF compliant
systems only look for referenced variables within the same file, as
at the moment. If present, the system is allowed to search other files
within a limited scope, containing the same ID.

One possibility is that scope could be modified with, perhaps,
unix-like relative directory prefixes to the ID, so that

  :cross_reference_ID = ³my_unique_id²;

refers just to to files in the same directory, whereas

  :cross_reference_ID = ³../*/my_unique_id²;

refers to all files held under the parent directory and its
subdirectories, and so on.

If the purpose of the ID is only to disambiguate local files, then
form and integrity of the ID string itself could probably be left
to the discretion of the data provider, since it would only need to
be checked within a defined scope. More rigorous implementations
are a bit beyond my experience.

Anybody who¹s interested can find the SAFE format definition at
earth.esa.int/SAFE. You should probably enjoy UML diagrams to
appreciate it fully. Note that the format doesn¹t discuss NetCDF
in particular ­ this is just the format that Sentinel-3 is adopting
for its data containers.

  Tim.




On 20/11/2009 06:23, Bryan Lawrence bryan.lawre...@stfc.ac.uk wrote:

 On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:
...  In  some cases, referencing attributes such as
 coordinates and ancillary_variables would, ideally, point to a
 variable in a different dataset.
  
  This is a general problem to which CF doesn't have a solution because it 
was
  conceived as a convention for single netCDF files. However we need a
 solution
  as often several files should be treated as a single dataset.
  
  If the files don't overlap i.e. their contents are complementary, I think
 it
  should be satisfactory to allow variables in one file to be pointed to by
 name
  from another file, with no other mechanism being required within the file.
I
  don't like the idea of naming one file within another file, as that would
 be
  very fragile. Instead, I think the file aggregation should be implied by
  simply defining the group of files which are to be treated as one file e.g.
  by putting them in one directory.
 
 It's the old ones that are the best ones :-) :-)  this issue keeps on coming
 back ... :-) :-) and we keep trying to ignore it ...
 
 I think we agree that an actual physical filename including path is useless.
 We need both  a relative link which relies on the preservation of a group of
 files in a particular arrangement ...  AND an internal identifier so more
 robust linking mechanisms can be used when (if) the data ends up in a managed
 environment.
 
 I think it's crucial in this situation to ensure that each file has a unique
 identifier within it (created, for example, with uuid), because all solutions
 which rely on packaging are fragile (SAFE is probably better than most), but
 the bottom line is that users move files around ... and we need some way of
 ensuring that we/they can validate the links that are in place are the ones
 that were originally intended.
 
 So relative links would also include the identifier of the intended target as
 well as the relative path in operating system agnostic terms.
 
 That identifier can be used in two ways: to validate the link (my software can
 always check that the variable that I just opened following a link from
 another one is the one that was expected by checking the container
 identifier), and b) to produce an identifier resolver service for the
 situation where the packaging has had to be broken (which might occur for
 performance reasons 

Re: [CF-metadata] Swath observational data

2009-11-20 Thread Thomas Lavergne
Dear John,

- John Caron ca...@unidata.ucar.edu wrote:

 The geometry of each point is an interesting wrinkle, and may need
 some new conventions. would a rotated ellipse work (3 params) or do we
 need a more general polygon? Does it have to be specified per point,
 or can is be common to all points? I would imagine that quick
 visualizers might ignore the details of this (essentially assuming a
 tesselating grid), but more sophisticated and specialized tools would
 need this.

I do not thing the FOV (field of view) of single point should be described as 
projected on the Earth surface (rotated ellipse and/or polygon) if this is what 
you meant. It should come as a response function of angular incoming radiation. 
This response function might be a formula (2D Gaussian, weighted sum of 2D 
gaussians, etc...) or given as a Look Up Table. The Earth-projected geometry 
will then be a function of the view angle, Earth topography, integration 
(photon counting) period, etc... We should definitely be able to have response 
function varying within the scan array.

I think we are entering a terribly complex (and interesting) subject when 
defining a Feature for those space- and air-borne observational data. The 
question is then, where should we put the limit in complexity and what is the 
scope: Do we aim at encoding the spacecraft instrument engineer point of view 
or the geophysical data user point of view? 

Cheers,
Thomas
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF point observation Conventions

2009-11-20 Thread Nan Galbraith




Thanks, Roy. 

There's something not quite symmetrical in this, either - maybe it's 
"just" terminology, maybe not. 

A time series is conceptually identical to a profile, just "turned on
its side"
so time is the single incrementing dimension, instead of depth. The
difference 
turns out to be important in the proposal mainly because of the way
we'd 
aggregate profiles vs time series.

A point, in my lexicon, is an atomic unit, a single measurement at a
single 
x,y,z,t. Is there a "single point" in your feature types? Why assign
the term
point to a set of measurements with single x, y, and z and progressing
t, as
opposed to a set of measurements with single x, y, t values but varying
z? 

Cheers - 
Nan


Lowry, Roy K wrote:

  
..

The feature terms we use for observational data in BODC are:

Profile - single set of measurements with single (by assumption) x, y, t values but varying spatial z.  An example is a single, fully processed (i.e. binned) CTD cast.

Profile collection - an aggregation of profiles into a single data object.  An example is all the CTDs from a section or a cruise.

Profile series - a set of measurements with single x,y a fixed set of spatial z values and progessing t. An example is a single moored ADCP deployment record.

Point - a set of measurements with single x, y, and z and progressing t.  Example is a single moored recording current meter record.

Point collection - an aggregation of point features in a single container.  Example is all the records from all the current meters on a mooring or deployed on a cruise.

Spectrum series -  a set of measurements with single x,y a fixed set of non-spatial z values and progessing t. An example is a power spectrum time series from a wave recorder.

2D-trajectory - a set of measurements with variable x, y, t and a single  spatial z.  Example is the thermosalinograph record from a cruise.

3D-trajectory - set of measurements with variable x, y, t and a single spatial z.  Example is the thermosalinograph record from an AUV mission.  It is also applicable to a yo-yo CTD station, mirroring Chris's comments on atmospheric "profiles" with variant x,y.

I think that Nan and most of the observational oceanographic community recognise these concepts and consequently, if a mapping to them to your feature definitions is maintained then it will help keep us on board.

Note that the difference between 'point' and 'point collection' is important to me as on observational data manager, which is a different perspective to an observational data ingestor.

Cheers, Roy.


  
  

--
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***
  



-- 
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***






___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF point observation Conventions

2009-11-20 Thread Lowry, Roy K
Hi Nan,

It's a terminology issue. The feature type terms were coined for local use 
under pressure (so much pressure that I even failed to consult the CSML feature 
type names!) to describe data in one of our data schemas, which doesn't include 
single instantaneous measurements.

It's the concepts that are the important thing, which we identify by neutral 
keys.  I'm quite happy to use different terms to describe the concepts 
providing the concept definitions match exactly. The only reason I exposed them 
was to ensure CF didn't head off into concepts that didn't map. Getting a set 
of terms for these concepts that are universally agreed would be worthwhile.  
Bringing our local terms into line with CSML would be an obvious first step, 
which I'll try and do next week (currently on travel) in conjunction with 
checking through John Caron's mappings to the proposed CF feature types

Meanwhile if you've any further suggestions for change (or additional 
observational feature types you'd like to see) let me know and I'll do my best 
to fall into line.

Cheers, Roy.

From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Nan Galbraith [ngalbra...@whoi.edu]
Sent: 20 November 2009 16:26
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] CF point observation Conventions

Thanks, Roy.

There's something not quite symmetrical in this, either - maybe it's
just terminology, maybe not.

A time series is conceptually identical to a profile, just turned on its side
so time is the single incrementing dimension, instead of depth.  The difference
turns out to be important in the proposal mainly because of the way we'd
aggregate profiles vs time series.

A point, in my lexicon, is an atomic unit, a single measurement at a single
x,y,z,t. Is there a single point in your feature types? Why assign the term
point to a set of measurements with single x, y, and z and progressing t, as
opposed to a set of measurements with single x, y, t values but varying z?

Cheers -
Nan


Lowry, Roy K wrote:

..

The feature terms we use for observational data in BODC are:

Profile - single set of measurements with single (by assumption) x, y, t values 
but varying spatial z.  An example is a single, fully processed (i.e. binned) 
CTD cast.

Profile collection - an aggregation of profiles into a single data object.  An 
example is all the CTDs from a section or a cruise.

Profile series - a set of measurements with single x,y a fixed set of spatial z 
values and progessing t. An example is a single moored ADCP deployment record.

Point - a set of measurements with single x, y, and z and progressing t.  
Example is a single moored recording current meter record.

Point collection - an aggregation of point features in a single container.  
Example is all the records from all the current meters on a mooring or deployed 
on a cruise.

Spectrum series -  a set of measurements with single x,y a fixed set of 
non-spatial z values and progessing t. An example is a power spectrum time 
series from a wave recorder.

2D-trajectory - a set of measurements with variable x, y, t and a single  
spatial z.  Example is the thermosalinograph record from a cruise.

3D-trajectory - set of measurements with variable x, y, t and a single spatial 
z.  Example is the thermosalinograph record from an AUV mission.  It is also 
applicable to a yo-yo CTD station, mirroring Chris's comments on atmospheric 
profiles with variant x,y.

I think that Nan and most of the observational oceanographic community 
recognise these concepts and consequently, if a mapping to them to your feature 
definitions is maintained then it will help keep us on board.

Note that the difference between 'point' and 'point collection' is important to 
me as on observational data manager, which is a different perspective to an 
observational data ingestor.

Cheers, Roy.




--
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***




--
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***




-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.

___