Re: [CF-metadata] high sample rate (seismic) data conventions

Lowry, Roy K. Mon, 10 Apr 2017 11:14:41 -0700

Dear Jonathan,


Couple of points to back up Seth. First is that the primary domain of the CF 
conventions is ocean-atmosphere modelling. Whilst this has expanded into 
observational data in the atmosphere and the ocean, it has never to my 
knowledge been extended to seismics. My only experience with seismic data has 
been through the EU GeoSeas project and there the policy was to adopt seismic 
industry standards for the data files such as SEG-Y.


Secondly, the whole point of CF from my perspective is to provide the 
foundation for generic tools to display data from multiple disciplines by 
establishing patterns of co-ordinates, data and semantics. Deviations from 
these principles for specific domains would, in my opinion, seriously degrade 
the value of CF as an interoperability tool.


Cheers, Roy.


Please note that I partially retired on 01/11/2015. I am now only working 7.5 
hours a week and can only guarantee e-mail response on Wednesdays, my day in 
the office. All vocabulary queries should be sent to enquir...@bodc.ac.uk. 
Please also use this e-mail if your requirement is urgent.


________________________________
From: CF-metadata <cf-metadata-boun...@cgd.ucar.edu> on behalf of Maccarthy, 
Jonathan K <jkm...@lanl.gov>
Sent: 10 April 2017 17:54
To: Seth McGinnis
Cc: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] high sample rate (seismic) data conventions

Hi Seth,

Thanks for the very helpful response.  I can understand the argument for 
explicit coordinates, as opposed to using formulae; I think it solves several 
problems.  The assumption of a uniform sample rate for the length of a 
continuous time series is deeply engrained in most seismic software, however.  
Changing that assumption may lead to other problems (but maybe not!).  Data 
volumes for a single channel can be 40-100 4-byte samples per second, which is 
something like 5-12 GB per channel per year uncompressed.  Commonly, dozens of 
channels are used at once, though some of them may share time coordinates.  It 
sounds like this use-case is similar in volume to what you've used, and may be 
worth trying out.

Just to be clear, however, would I be correct in saying that CF has no accepted 
way of representing the data as I've described?

Thanks again,
Jonathan

On Apr 7, 2017, at 4:43 PM, Seth McGinnis 
<mcgin...@ucar.edu<mailto:mcgin...@ucar.edu>> wrote:

Hi Jonathan,

I would interpret the CF stance as being that the value in having
explicit coordinate variables and other ancillary data to accompany the
data outweighs the cost of increased storage.

There are some cases where CF bends away from that for the sake of
practicality (see, e.g., the discussion about external file references
for cell_bounds in CMIP5), but overall, my sense is that the community
feels that it's better to have things explicitly written out in the file
than it is to provide them implicitly via a formula to calculate them.

Based on my personal experiences, I think this is the right approach.
(In fact, I take it even further: I prefer to avoid data compression
entirely and to keep like data with like as much as possible, rather
than splitting big files into smaller pieces.)

I have endured far, far more suffering and toil from (a) trying to
figure out what's wrong with a file that violates some implicit
assumption (like "there are never gaps in the time coordinate") and (b)
dealing with the complications of various tactics for keeping file sizes
small than I ever have from storing and working with very large files.

YMMV, of course.  What are your data volumes like?  I'm working at the
terabyte scale, and as long as my file sizes stay under a few dozen GB,
I don't really even bother thinking about anything that affects the file
size by less than an order of magnitude.

Cheers,

Seth McGinnis

----
NARCCAP / NA-CORDEX Data Manager
RISC - IMAGe - CISL - NCAR
----


On 4/7/17 9:55 AM, Maccarthy, Jonathan K wrote:
Hi all,

I’m curious about the suitability of CF metadata conventions for
seismic sensor data.  I’ve done a bit of searching, but can’t find
any mention of how CF conventions would store high sample-rate data
sensor data.  I do see descriptions of time series conventions, where
hourly or daily sensor data samples are stored along with their
timestamps, but storing individual timestamps for each sample of a
high sample rate sensor would unnecessarily double the storage.
Seismic formats typically don’t store time vectors, but instead just
store vectors of samples with an associated start time and sampling
rate.

Could someone please point me towards a discussion or existing
conventions on this topic?  Any help or suggestion is appreciated.

Best, Jon _______________________________________________ CF-metadata
mailing list CF-metadata@cgd.ucar.edu<mailto:CF-metadata@cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu<mailto:CF-metadata@cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

________________________________
This message (and any attachments) is for the recipient only. NERC is subject 
to the Freedom of Information Act 2000 and the contents of this email and any 
reply you make may be disclosed by NERC unless it is exempt from release under 
the Act. Any material supplied to NERC may be stored in an electronic records 
management system.
________________________________

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] high sample rate (seismic) data conventions

Reply via email to