On 5/23/2013 9:00 AM, John Graybeal wrote:
+1 Martin. I am bugged (not the technical term) by the conclusions here, which
seem to be: Because people design systems badly, I must constrain my own system
to accommodate their failures.
Hi John,
The flip side of this argument is even more compelling: _building
features into your interoperability framework that you can see in
advance are going to be misused, is obviously the wrong thing to do._
When incorrect metadata becomes commonplace is does worse damage than
gaps in the metadata; it casts a shadow of doubt over the entire framework.
The thrust of Martin's arguments seems right on to me. Lets look for
solutions that can provide the desired metadata and be robust, too. For
example, THREDDS and ncISO provide us with a level of abstraction above
the attributes found in the physical files. If the min and max values
for a given dataset are stable (a fact probably known to the creator of
the dataset), then by all means encode the values as global attributes.
If they are not stable, then omit them from the files; turn to TDS and
ncISO to create (and cache) these metadata values.
Caching is a key issue here, given the cost of re-computing metadata
such as actual min and max values. A Web Accessible Folder of ISO
metadata *can be* an adequate cache, as long as last-modified dates are
available and are carefully tracked. Making lastModified dates
universally available is arguably one of the key issues in finding a
robust solution to this dilemma. It is on the TDS to-do list, we
understand. (We need to bring HYRAX, PyDAP, etc. into this
conversation, too.)
- Steve
The use cases for storing the summary information with the file are: (A) It's
faster to access, which in some circumstances affect a user (or the cost of
computer cycles), whether due to large files or lots of files. (B) In some
circumstance (I don't have a netCDF file mangler app sitting in hand), it's the
only reasonable way to access.
If someone is writing a subsetting or aggregating utility, and that utility is
blindly copying over every metadata item it sees, then a whole lot of metadata
is going to be wrong. (Publisher, Provenance, Last Updated, Time and/or
Geospatial Range, Min/Max Values, LIcensing Permission, to name a few) This
metadata isn't fragile, it's a function of the content. The person who writes
the transform utility must either create all new metadata, or to understand the
kind of metadata they are copying over and make any necessary changes.
John
On May 23, 2013, at 08:10, "Schultz, Martin" <m.schu...@fz-juelich.de> wrote:
... but computing min & max on the fly can also be very expensive.
We have aggregated model output datasets where each variable is more
than 1TB!
Sure, I can see that that's useful metadata about the dataset, and that
there's value in caching it somewhere. I just don't think it belongs with
the metadata inside the netcdf file. What's the use case for storing it
there?
Dear all,
that may be an issue of "style", or more technically speaking the way you set-up
your system(s). I do think there is use for this as soon as you take a file out of an interoperable
context. However, it's a very good and valid point to say that this information can (very) easily
get corrupted. Thus it may be good to define some way of marking "fragile" metadata (i.e.
metadata that can be corrupted by slicing or aggregating data from a file -- maybe even from
several files). In fact this is related to the issue of tracking metadata information in the data
model -- that has been brought up in the track ticket but was referred to the implementation...
Cheers,
Martin
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
---------------
John Graybeal
Marine Metadata Interoperability Project: http://marinemetadata.org
grayb...@marinemetadata.org
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata