Re: [CF-metadata] Branching "history"

2011-09-06 Thread Nan Galbraith

You're good, Roy!

There's a new section (4 pages - maybe more) in the current Argo users
manual about their history implementation.

It looks like they're using character variables to store structured, global-
level provenance information.  It doesn't look like it would expand easily
to allow "branched" metadata for files that have multiple inputs, partly
because the various history records are numbered and tied to pre-defined
processing levels. It's interesting, though! Another approach, altogether.

argodatamgt.org/content/download/4729/34634/file/argo-dm-user-manual-version-2.3.pdf

Cheers - Nan



On 9/6/11 12:46 PM, Lowry, Roy K. wrote:

Hi All,

Something in the back of my mind from a project in which I have peripheral 
engagement (so may be a red herring or worse).  Didn't Argo put a lot of effort 
into a NetCDF history encoding for their format?

Cheers, Roy.

From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Nan Galbraith [ngalbra...@whoi.edu]
Sent: 06 September 2011 13:46
To: CF list
Subject: Re: [CF-metadata] Branching "history"

Hi all -


Are there any existing practices (either established or experimental)
for the use of the "history" attribute when dealing with complex,
branching processing histories?

Given the "history" attribute is only intended to be human readable, I
suspect the answer is "no". In which case, what would be more palatable:
inventing a new syntax, or throwing away everything prior to the last
linear sequence?

I'm not sure how many existing practices there are, but a new
syntax is *definitely* preferable to loss of this information.

The standard lets you continuously append processing 'events' to
the global history, using a timestamp; in theory you can append
all the branching histories, adding information about which
component was modified (maybe going from datestamp/action to
datestamp/component/action ?). The component identifiers would
need to include enough information to let the user know what
slice, and what variable, was modified by each action.

John's SSDS example shows an interesting way to accumulate
this information, which could be expanded for branched provenance.
It's not quite the same as the NetCDF-defined history attribute,
but it looks like a great way to make this field machine readable.

By the way, here's the definition, from the NetCDF Users' Guide:


history

A global attribute for an audit trail. This is a character array with
a line for each invocation of a program that has modified the
dataset. Well-behaved generic netCDF applications should append a
line containing: date, time of day, user name, program name and
command arguments.

and a snippet from the CF standard:


2.6.2. Description of file contents

The NUG defines title and history to be global attributes. We wish
to allow the newly defined attributes, i.e., institution, source,
references, and comment, to be either global or assigned to
individual variables. When an attribute appears both globally and as
a variable attribute, the variable's version has precedence. ...

history

Provides an audit trail for modifications to the original data.
Well-behaved generic netCDF filters will automatically append their
name and the parameters with which they were invoked to the global
history attribute of an input netCDF file. We recommend that each
line begin with a timestamp indicating the date and time of day that
the program was executed.

- Nan

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata--
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.



--
***
* Nan Galbraith(508) 289-2444 *
* Upper Ocean Processes GroupMail Stop 29 *
* Woods Hole Oceanographic Institution*
* Woods Hole, MA 02543*
***



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Branching "history"

2011-09-06 Thread Lowry, Roy K.
Hi All,

Something in the back of my mind from a project in which I have peripheral 
engagement (so may be a red herring or worse).  Didn't Argo put a lot of effort 
into a NetCDF history encoding for their format?

Cheers, Roy.

From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Nan Galbraith [ngalbra...@whoi.edu]
Sent: 06 September 2011 13:46
To: CF list
Subject: Re: [CF-metadata] Branching "history"

Hi all -

> Are there any existing practices (either established or experimental)
> for the use of the "history" attribute when dealing with complex,
> branching processing histories?
>
> Given the "history" attribute is only intended to be human readable, I
> suspect the answer is "no". In which case, what would be more palatable:
> inventing a new syntax, or throwing away everything prior to the last
> linear sequence?

I'm not sure how many existing practices there are, but a new
syntax is *definitely* preferable to loss of this information.

The standard lets you continuously append processing 'events' to
the global history, using a timestamp; in theory you can append
all the branching histories, adding information about which
component was modified (maybe going from datestamp/action to
datestamp/component/action ?). The component identifiers would
need to include enough information to let the user know what
slice, and what variable, was modified by each action.

John's SSDS example shows an interesting way to accumulate
this information, which could be expanded for branched provenance.
It's not quite the same as the NetCDF-defined history attribute,
but it looks like a great way to make this field machine readable.

By the way, here's the definition, from the NetCDF Users' Guide:

> history
>
> A global attribute for an audit trail. This is a character array with
> a line for each invocation of a program that has modified the
> dataset. Well-behaved generic netCDF applications should append a
> line containing: date, time of day, user name, program name and
> command arguments.

and a snippet from the CF standard:

> 2.6.2. Description of file contents
>
> The NUG defines title and history to be global attributes. We wish
> to allow the newly defined attributes, i.e., institution, source,
> references, and comment, to be either global or assigned to
> individual variables. When an attribute appears both globally and as
> a variable attribute, the variable's version has precedence. ...
>
> history
>
> Provides an audit trail for modifications to the original data.
> Well-behaved generic netCDF filters will automatically append their
> name and the parameters with which they were invoked to the global
> history attribute of an input netCDF file. We recommend that each
> line begin with a timestamp indicating the date and time of day that
> the program was executed.

- Nan

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Branching "history"

2011-09-06 Thread Nan Galbraith

Hi all -


Are there any existing practices (either established or experimental)
for the use of the "history" attribute when dealing with complex,
branching processing histories?

Given the "history" attribute is only intended to be human readable, I
suspect the answer is "no". In which case, what would be more palatable:
inventing a new syntax, or throwing away everything prior to the last
linear sequence?


I'm not sure how many existing practices there are, but a new
syntax is *definitely* preferable to loss of this information.

The standard lets you continuously append processing 'events' to
the global history, using a timestamp; in theory you can append
all the branching histories, adding information about which
component was modified (maybe going from datestamp/action to
datestamp/component/action ?). The component identifiers would
need to include enough information to let the user know what
slice, and what variable, was modified by each action.

John's SSDS example shows an interesting way to accumulate
this information, which could be expanded for branched provenance.
It's not quite the same as the NetCDF-defined history attribute,
but it looks like a great way to make this field machine readable.

By the way, here's the definition, from the NetCDF Users' Guide:


history

A global attribute for an audit trail. This is a character array with
a line for each invocation of a program that has modified the
dataset. Well-behaved generic netCDF applications should append a
line containing: date, time of day, user name, program name and
command arguments.


and a snippet from the CF standard:


2.6.2. Description of file contents

The NUG defines title and history to be global attributes. We wish
to allow the newly defined attributes, i.e., institution, source,
references, and comment, to be either global or assigned to
individual variables. When an attribute appears both globally and as
a variable attribute, the variable's version has precedence. ...

history

Provides an audit trail for modifications to the original data.
Well-behaved generic netCDF filters will automatically append their
name and the parameters with which they were invoked to the global
history attribute of an input netCDF file. We recommend that each
line begin with a timestamp indicating the date and time of day that
the program was executed.


- Nan

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Branching "history"

2011-09-05 Thread John Graybeal
Hi Richard,

We are considering similar questions for the OOI Cyberinfrastructure.  

I am wondering why you say the history attribute is only intended to be human 
readable? (I'm not an expert on netCDF, so this may be a doofus question.)  I 
couldn't find any language that says that, and some of the conventions suggest 
machine-readable is just fine, and I'd prefer a machine-readable history, so 
long as it's still human-readable.

In the COARDS profile 
  http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html
it says "Although not mandatory the attribute 'history' is recommended to 
record the evolution of the data contained within a netCDF file. Applications 
which process netCDF data can append their information to the history 
attribute."

In the netCDF Attribute Convention for Dataset Discovery, 
   
http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html
the history attribute is described as "Provides an audit trail for 
modifications to the original data." 

And by example, I know that MBARI has a fairly complete history that they put 
in NetCDF files, for example see
  
http://dods.mbari.org/opendap/data/ssdsdata/deployments/m0/200701/OS_MBARI-M0_20070130_R_TS.nc.info
and
  
https://confluence.oceanobservatories.org/display/CIDev/Define+use+case+for+data+provenance
In the sample, this history is not contained in the history attribute, but in 
an attribute called ssds_provenance.  I don't think that example is branched, 
but they may have some that are, and that format looks trivial to express 
branches to me. (And is both readable and machine-parseable, too.  Kudos to 
Mike McCann and the MBARI team.)

If the MBARI syntax is parseable and there are not competing syntaxes, that 
would not be a bad proposal in my book. 

Since people use all sorts of things for history, we might want the first line 
to specify the syntax/convention being used.

John


On Sep 5, 2011, at 05:38, Hattersley, Richard wrote:

> Dear all,
> 
> Are there any existing practices (either established or experimental) for the 
> use of the "history" attribute when dealing with complex, branching 
> processing histories?
> 
> Given the "history" attribute is only intended to be human readable, I 
> suspect the answer is "no". In which case, what would be more palatable: 
> inventing a new syntax, or throwing away everything prior to the last linear 
> sequence?
> 
> 
> Richard Hattersley  AVD  Iris Technical Lead 
> Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
> Tel: +44 (0)1392 885702  Fax: +44 (0)1392 885681
> Email: richard.hatters...@metoffice.gov.uk  Website: www.metoffice.gov.uk
> 
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



John Graybeal    
phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: 
http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org   

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Branching "history"

2011-09-05 Thread Hattersley, Richard
Dear all,

Are there any existing practices (either established or experimental) for the 
use of the "history" attribute when dealing with complex, branching processing 
histories?

Given the "history" attribute is only intended to be human readable, I suspect 
the answer is "no". In which case, what would be more palatable: inventing a 
new syntax, or throwing away everything prior to the last linear sequence?


Richard Hattersley  AVD  Iris Technical Lead
Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
Tel: +44 (0)1392 885702  Fax: +44 (0)1392 885681
Email: richard.hatters...@metoffice.gov.uk  Website: www.metoffice.gov.uk

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata