Hi All,

I think Steve's email (below) is a fair summary of how I see the current state 
of the discussion too.

In order to move the discussion forward, I have put forward below a simple 
strawman suggestion that is very limited, but which I think would capture the 
most useful piece of hierarchies with minimal impact on CF.   Note that credit 
for many of the elements should go to other people who have previously proposed 
them - my main contribution is to stick my neck out and try to make the case 
:-).

1) CF file structures stay 'flat'.
2) Allow an _optional_ hierarchy attribute for variables.
3) CF would define the attribute name  and the rules for the attribute.   I 
expect it would be something like: 'hierarchy = root.trunk.branch.leaf'

Key comments:
a) Since the hierarchy attribute is optional, backwards and forwards 
compatibility should be automatic (except, possibly, for updating CF checkers), 
ie no change is necessary for people who don't want to.
b) An external tool could easily parse a CF file, or set of files, that 
contains the hierarchy attributes to generate an external hierarchy structure 
that can then be used to decide how to further process the data.
c) The external hierarchy could easily be regenerated to keep it consistent 
with the underlying data files.
c) The hierarchy metadata should be human readable.
d) All variable CF attributes would stay with the variables (as currently), ie 
no inheritance of CF attributes (to maintain compatibility).   The common 
attributes that I think inheritance would be most useful for are history 
attributes, and since CF doesn't control history attributes (AFAIK) this would 
be allowed.
e) So why not let individuals add their own such syntax? Defining the syntax of 
the hierarchy will allow general CF tools to be extended (if they want to), and 
set the stage for further expansion into hierarchies if experience shows that a 
lot of people are using the hierarchy syntax and start asking for more.

In my opinion, the benefits of this extension would exceed the minimal costs of 
extending the CF standard.

Let the slings and arrows fly ;-).

Best wishes,

    Philip

-----------------------------------------------------------------------
Dr Philip Cameron-Smith, p...@llnl.gov, Lawrence Livermore National Lab.
-----------------------------------------------------------------------


From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Steve 
Hankin
Sent: Wednesday, September 25, 2013 12:34 PM
To: Charlie Zender
Cc: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Are ensembles a compelling use case for 
"group-aware" metadata? (CZ)


On 9/24/2013 9:45 PM, Charlie Zender wrote:
It is not my place to determine whether there is a consensus, or how close we 
are, but it's clear to me there is no consensus yet. Bryan Lawrence, Steve 
Hankin, Jonathan Gregory, Karl Taylor, and Philip Cameron-Smith are not "on 
board". I hope they will speak-up and say if they concur that maintaining the 
status quo (flat files) is best (period), or whether they do wish to extend CF 
to hierarchies (starting now), or the additional information they would need to 
decide.

Hi Charlie et. al.,

Since you have asked ....  I have heard two points that seemed to bolster 
Bryan's pov that the multi-model use case is  "great but not compelling".  (See 
a more positive spin at the end.)

  1.  file size.   Model outputs today are typically too large for even a 
single variable from a single model to be packaged in a single file.  
Addressing a model ensemble multiplies the size barrier by the ensemble size, 
N.   Thus the use of groups to package a model ensemble applies only for the 
cases where user is interested in quite a small subset of the model domain, or 
perhaps in pre-processed, data-reduced versions of the models.   A gut-estimate 
is that single file solutions, like netCDF4 groups addresses 25% or less of the 
stated use case.   We could argue over that number, but it seems likely to 
remain on the low side of 50%.  (Issues of THREDDS-aggregating files bearing 
groups also deserve to be discussed and understood.  What works?  what doesn't?)
  2.  The problems of the "suitcase packing" metaphor were invoked time and 
again, further narrowing the applicability of the use case.  The sweet spot 
that was identified is the case of a single user desiring a particular subset 
from a single data provider.   Essentially a multi-model ensemble encoded using 
netCDF4 groups would offer a standardized "shopping basket" with advantages 
that will be enjoyed by some high powered analysis users.

For this narrower use case I couldn't help asking myself how the cost/benefit 
found through the use of netCDF4 groups compares with the cost/benefit of 
simply zip-packaging the individual CF model files.   There is almost no cost 
to this alternative.  Tools to pack and unpack zip files are universal, have 
UIs embedded into common OSes, and offer APIs that permit ensemble analysis to 
be done on the zip file as a unit at similar programming effort to the use of 
netCDF4 groups.  Comprehension and acceptance of the zip alternative  on the 
part of user communities would likely be instantaneous -- hardly even a point 
to generate discussion.  Zip files do not address more specialized use cases, 
like a desire to view the ensemble as a 2-level hierarchy of models each 
providing multiple scenarios, but the "suitcase" metaphor discussions have 
pointed out the diminishing returns that accrue as the packing strategy is made 
more complex.
The tipping point for me is not whether a particular group of users would find 
value in a particular enhancement.  It is whether the overall cost/benefit 
considerations -- the expanded complexity, the need to enhance applications, 
the loss of interoperabilty etc. versus the breadth of users and the benefits 
they will enjoy -- clearly motivate a change.   My personal vote is that thus 
far the arguments fall well short of this tipping point.  But maybe there are 
other use cases to be explored.  Perhaps in aggregate they may tip the 
cost/benefit analysis.  What about the "group of satellite swaths" scenario? -- 
a feature collection use case.  AFAIK CF remains weak at addressing this need 
thus far.  (If we pursue this line of discussion we should add the 
'cf_satellite' list onto the thread.  That community may have new work on this 
topic to discuss.)

    - Steve
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to