On 5/23/2013 9:00 AM, John Graybeal wrote:
+1 Martin. I  am bugged (not the technical term) by the conclusions here, which 
seem to be: Because people design systems badly, I must constrain my own system 
to accommodate their failures.

Hi John,

The flip side of this argument is even more compelling: _building features into your interoperability framework that you can see in advance are going to be misused, is obviously the wrong thing to do._ When incorrect metadata becomes commonplace is does worse damage than gaps in the metadata; it casts a shadow of doubt over the entire framework.

The thrust of Martin's arguments seems right on to me. Lets look for solutions that can provide the desired metadata and be robust, too. For example, THREDDS and ncISO provide us with a level of abstraction above the attributes found in the physical files. If the min and max values for a given dataset are stable (a fact probably known to the creator of the dataset), then by all means encode the values as global attributes. If they are not stable, then omit them from the files; turn to TDS and ncISO to create (and cache) these metadata values.

Caching is a key issue here, given the cost of re-computing metadata such as actual min and max values. A Web Accessible Folder of ISO metadata *can be* an adequate cache, as long as last-modified dates are available and are carefully tracked. Making lastModified dates universally available is arguably one of the key issues in finding a robust solution to this dilemma. It is on the TDS to-do list, we understand. (We need to bring HYRAX, PyDAP, etc. into this conversation, too.)

    - Steve


The use cases for storing the summary information with the file are: (A) It's 
faster to access, which in some circumstances affect a user (or the cost of 
computer cycles), whether due to large files or lots of files.  (B) In some 
circumstance (I don't have a netCDF file mangler app sitting in hand), it's the 
only  reasonable way to access.

If someone is writing a subsetting or aggregating utility, and that utility is 
blindly copying over every metadata item it sees, then a whole lot of metadata 
is going to be wrong. (Publisher, Provenance, Last Updated, Time and/or 
Geospatial Range, Min/Max Values, LIcensing Permission, to name a few)  This 
metadata isn't fragile, it's a function of the content. The person who writes 
the transform utility must either create all new metadata, or to understand the 
kind of metadata they are copying over and make any necessary changes.

John

On May 23, 2013, at 08:10, "Schultz, Martin" <m.schu...@fz-juelich.de> wrote:

...  but computing min & max on the fly can also be very expensive.
We have aggregated model output datasets where each variable is more
than 1TB!
Sure, I can see that that's useful metadata about the dataset, and that
there's value in caching it somewhere.  I just don't think it belongs with
the metadata inside the netcdf file. What's the use case for storing it
there?
Dear all,

      that may be an issue of "style", or more technically speaking the way you set-up 
your system(s). I do think there is use for this as soon as you take a file out of an interoperable 
context. However, it's a very good and valid point to say that this information can (very) easily 
get corrupted. Thus it may be good to define some way of marking "fragile" metadata (i.e. 
metadata that can be corrupted by slicing or aggregating data from a file -- maybe even from 
several files). In fact this is related to the issue of tracking metadata information in the data 
model -- that has been brought up in the track ticket but was referred to the implementation...

Cheers,

Martin




------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


---------------
John Graybeal
Marine Metadata Interoperability Project: http://marinemetadata.org
grayb...@marinemetadata.org




_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to