On 4/2/2013 1:13 AM, Cameron-smith, Philip wrote:

Hi Steve,

I think your suggestion has merit. One question is : Would your suggestion make any other changes, eg to std_name modifiers or cell_methods?


Hi Philip,

Ken has pointed out that our discussions are clouding the specific guidance that he needs (sorry Ken :'( ), so I've modified the title of this email to indicate a different thread. In the interest of orderly email discussion (is that an oxymoron?) I suggest that we defer discussions of specific syntax to see if we have a consensus on the larger question: are the encodings of standard_name and cell_methods OK left as-is? or do we have a problem of sufficient magnitude that it requires further thought?

My vote is for seeking minimal tweaks that will reduce the chances that users are mislead or confused by the current encodings.

    - Steve

If nothing else, it would be good to put something in the CF documentation that explains what is going on, and why (perhaps along the lines of my email?) so that next time we have this discussion it will be shorter.

Best wishes,

Philip

-----------------------------------------------------------------------

Dr Philip Cameron-Smith, p...@llnl.gov, Lawrence Livermore National Lab.

-----------------------------------------------------------------------

*From:*CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] *On Behalf Of *Steve Hankin
*Sent:* Monday, April 01, 2013 9:13 AM
*To:* Jonathan Gregory
*Cc:* cf-metadata@cgd.ucar.edu
*Subject:* Re: [CF-metadata] Question from NODC about interplay of standard name modifiers, cell_methods, etc.

Hi All,

All interesting questions are questions of balance. This discussion raises interesting questions. What are the issues we are balancing.

  * On the one side is *technical precision*:  how to correctly
    describe the transformations that have been applied
  * Balancing this is *usability*:  end users need to easily
    understand and use the data in these files

Our current encoding (standard_name and cell_methods) does well on technical precision and poorly on usability. A user who selects a variable with standard_name "sea water temperature", downloads it, and then realizes only after looking at a plot that it is a variance of sea water temperature, will understandably feel that she has been mislead. Blaming the user for ignorance or the designer of search engines for neglect is not a balanced outlook imho. We can foresee this problem (as demonstrated by this thread). _It is our responsibility as designers is to minimize the opportunities for confusion._

How can we strike this balance? That's the (entirely constructive) topic that I'd lobby we should be addressing. I've included an off-the-cuff proposal below in a P.S. I'm sure there are better ideas out there.

    - Steve

P.S. One proposal: in all cases where a significant transformation (to be defined) has been applied to the data after is has been measured, the standard_name gets a generic modifier, say "(transformed)".
            ==> *"sea water temperature (transformed)"*
This will serve as a signal that forewarns users that the variable is not simply "sea water temperature".

------------------------------------------------------------------------

On 3/30/2013 6:22 AM, Jonathan Gregory wrote:

    Dear all

    I think Philip's posting points out that this disagreement is partly caused 
by

    a confusion. I agree with his distinction of two cases.

    Perhaps in Ken's use case, the standard deviation describes the spread of a

    number of measurements that are regarded as samples from a population. The

    difference between the samples is random error, not a dependence on time or

    space that is of interest to the user of the dataset. This also sounds like

    what Nan means by, "We collect in situ data, and I know that MANY of our

    instruments output the mean of several measurements, few do single spot

    samples." If the instrument itself does not output the individual

    measurements, the variation among them as a function of time or space is

    obviously of no geophysical interest.

    I agree that this standard deviation is a kind of measurement property. As 
Ken

    says, the standard error is usually calculated as the sample standard

    deviation divided by the square root of the number in the sample. However I

    appreciate that you might wish to report the standard deviation instead of 
the

    standard error. To do this, I agree that we would need a new standard_name

    modifier, which I suggest should be sample_standard_deviation to avoids its

    being confused with any other kind of standard deviation.

    Perhaps that is the answer I should have given to Ken's first question, 
instead

    of asking whether it was a temporal or a spatial standard deviation. In 
fact it

    is neither.

    Going on to the wider question, I agree with Ken that a mean is just as 
much a

    statistical operation as a standard deviation. Only a point measurement 
(which

    is also one of the cell_methods of Appendix E) is the "true" geophysical

    quantity. All the other methods are statistical ways of representing 
variation

    of that quantity within the cell. It probably doesn't seem surprising to

    regard a mean as the "same" quantity, nor the mode and median perhaps, but

    maybe you begin to feel uncomfortable when moving on to the maximum and

    minimum, the range (absolute difference between max and min, which is going 
to

    be added as a cell_method in the next version of CF,

    https://cf-pcmdi.llnl.gov/trac/ticket/65), and finally the standard 
deviation

    and the variance, the last of which has different units. All these methods

    belong to the same family, and it seems to me it would be arbitrary and

    therefore unsatisfactory to choose a certain level of surprise or discomfort

    in order to decide when it was no longer the "same" geophysical quantity. 
The

    only solution, I think, is for everyone to learn that the standard name is

    only a *part* of the description of the attribute, as John says.

    John asked whether the difference between cell_methods and standard_name

    modifiers could be clearly stated. My understanding is that standard_name

    modifiers denote ancillary variables (they were introduced to CF at the same

    time), whose purpose is to provide metadata about the individual values of

    another data variable (start of section 3.4), while cell methods indicate 
the

    statistical methods whereby the data values represent variation within 
cells.

    This is a difference, but it might not be sufficiently obvious to require 
them

    to be different features in CF.

    The sample standard deviation *could* be represented by cell_methods, if we

    introduced a notional axis of sample, to index the samples, and then 
collapsed

    it to size one, like we do for a time-mean or a zonal mean. The sample

    standard deviation would then be described in cell_methods as "sample:

    standard_deviation". Likewise, the number of observations could be regarded 
as

    a cell_method that counted the size of its (sample) axis. If we do not wish

    to maintain the difference, we could simplify the standard by abolishing

    standard_name modifiers and creating some new cell_methods, some of which

    might be of a different kind from before because they wouldn't refer to a

    particular axis.

    Cheers

    Jonathan

    _______________________________________________

    CF-metadata mailing list

    CF-metadata@cgd.ucar.edu  <mailto:CF-metadata@cgd.ucar.edu>

    http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to