Thanks, Jonathan and Roy. It seems we have several kinds of
'problem variables' to deal with:
- Components of geophysical variables (like voltage and temperatures
from a radiometer, or the sensor temperature from an oxygen probe;
these can be useful in troubleshooting or recalculating the geophysical
variables),
- QC kinds of parameters (like percent good, or error velocity, from an
ADCP),
- Raw instrument output that could be converted directly into
geophysical data. I'm not sure if something like rain level is an
example of this, or if the kinds of data that Roy mentioned are
different in some way.
I think that if it's raw data which can be described more precisely
as being the output of a particular kind of sensor, and it is in
physical units, we should give it its own standard name; in such
cases, the raw data would have more of a standard meaning, and
standard algorithms could be applied to derive geophysical
quantities from it, I imagine.
As it turns out, I think this is the case for my rain gauge example. A
more careful search of the standard names using "any of" precipitation,
rain, and evaporation turned up one that seems to work for the
instrument's raw output (the precipitation level in the gauge):
lwe_thickness_of_precipitation_amount: The construction
lwe_thickness_of_X_amount or _content means the vertical
extent of a layer of liquid water having the same mass per unit area.
I suspect that this name was not intended for raw rain gauge data,
but that it would be alright to use it anyway.
A possible way to deal with raw data would be to regard it as a kind
of ancillary data and use a standard name modifier to indicate it (CF
3.3 and appendix C) e.g. raw_data. In your case the standard_name
attribute would then contain "rainfall_rate raw_data". In Appendix C
we could specify that the units are 1 i.e. dimensionless if there is
a raw_data modifier.
I think this, or something like it, would have been a good way to
handle oxygen sensor temps, instead of assigning that the standard
name, temperature_of_sensor_for_oxygen_in_sea_water.
As Roy says, this would work only for variables that are in both raw
and processed form. And, for some instruments, there are multiple
components that should be carried along. We could use something
like this for longwave radiation sensor components, but would need
multiple modifiers, something like:
surface_downwelling_longwave_flux_in_air
... raw_data_thermopile_voltage
... raw_data_dome_temperature
... raw_data_case_temperature
Having sensor outputs from a radiometer "attached to" longwave
radiation would be useful, especially if it can be done in a way that
preserves the units of the temperatures and thermopile voltage.
Maybe all these variables will need standard names after all. I'd like
to know what anyone else thinks.
Cheers - Nan
Thanks Jonathan,
Another gap in my CF knowledge exposed. My reaction to your posting
was based on the perspective of somebody who is going to have to
semantically link a file of data in CF with data in another format.
RDF is my weapon of choice, which requires the ability to reference
concepts in both datasets as URLs. Whilst %20 is possible for a
space, it's best avoided. So, at some stage I think we need to
revisit the syntax of modified Standard Names (hyphen as a
separator??).
However, this side track isn't helping closure of Nan's issue, which
I fully understand from data files that land on our doorstep. I
think your view is based on the assumption that the raw data have
corresponding processed data. If only! Frequently we get data from
a complex package of sensors from scientists who are only interested
in a subset. The rest are exactly as they came off the data logger.
However, even in this state they have value for certain applications
and so need labelling.
As far as the raw_data qualifier goes, the $64,000 dollar question
(to Nan I guess) is whether the labelling support required for raw
data needs to support quantitative as well as qualitative use cases.
If the answer is no, then the raw_data qualifier specifying values to
be dimensionless is acceptable. Otherwise, we'll need to set up a
distinct Standard Names for each raw channel variant. The more I
think about, the more I see this as a safer option.
Cheers, Roy.
Jonathan Gregory <[email protected]> 04/13/09 8:28 AM >>>
Dear Roy
The space is deliberate. The standard_name attribute consists of a standard
name followed optionally by a modifier. We introduced this syntax to allow us
to define ancillary data of various sorts e.g. a quality flag or a standard
error, without requiring a new set of standard names. See CF standard section
3.3. Perhaps we should have put it in a different attribute; this decision was
made years ago and I can't remember the discussion.
I made the raw_data suggestion thinking that, in the case where you have a
geophysical quantity, and you also want to save the raw data (perhaps that is
the case Nan describes), the raw data could be regarded as ancillary
information (a bit like a standard error). This mechanism, with a standard
name modifier and perhaps using ancillary_variables to point to it (CF 3.4),
might be suitable then. The modifier implies the units of the standard name
have been transformed in a certain way. We could therefore specify them to be
dimensionless for raw data, as that is a special case of transformation i.e.
replace units u with 1. They could be in the same units as the geophysical
quantity; that would need a different standard name modifier, which might be
appropriate for uncalibrated data. However they could not be in different
units, not related to those of the geophysical quantity.
This suggestion doesn't give enough information for the data to be processed.
It's just a way of labelling raw data as such.
If you want to identify the raw data as being a specific output from a
particular kind of instrument, I think it's much better to give a standard name
that indicates precisely what it is i.e. more specific than "rain gauge
raw data". Then the user could work out how to process it.
Cheers
Jonathan
--
*******************************************************
* Nan Galbraith (508) 289-2444 *
* Upper Ocean Processes Group Mail Stop 29 *
* Woods Hole Oceanographic Institution *
* Woods Hole, MA 02543 *
*******************************************************
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata