Re: [CF Metadata] #105: Scalar Coordinates

Jonathan Gregory Wed, 03 Jul 2013 05:58:11 -0700

This message came from the CF Trac system.  Do not reply.  Instead, enter your 
comments in the CF Trac system at https://cf-pcmdi.llnl.gov/trac/.


#105: Scalar Coordinates
-----------------------------+----------------------------------------------
  Reporter:  markh           |       Owner:  [email protected]
      Type:  enhancement     |      Status:  new                          
  Priority:  medium          |   Milestone:                               
 Component:  cf-conventions  |     Version:                               
Resolution:                  |    Keywords:                               
-----------------------------+----------------------------------------------
Comment (by jonathan):

 Dear all

 I don't think anyone will be surprised to read that I disagree with this
 proposal insofar as it is inconsistent with the proposal of ticket 104,
 which was made by David and me. Like Jim, I appreciate Steve's clear
 questions.

 Perhaps unlike Jim, I agree with Steve on the answer to his third
 question. I do not think that CF needs to make any semantic distinction
 between a scalar and a vector of size one. The proposed text says, "A
 scalar coordinate variable defines a coordinate which applies to an entire
 data variable equally," but this is also true of a coordinate variable of
 size one, isn't it. For instance, a scalar coordinate variable of
 height=1.5 m means exactly the same as a size-one coordinate variable of
 height=1.5 m. The alternative of a scalar is offered because it is less
 effort to encode it, that's all. But they mean the same: both of them
 indicate that all the values in the data variable apply at a height of 1.5
 m.

 Moreover, I think that drawing a formal distinction introduces an
 unnecessary conceptual complexity for aggregation. If I have one data
 variable with a scalar coordinate variable of height=1.5 m, and another
 for the same geophysical quantity with a scalar coordinate variable of
 height=10 m, and the horizontal grid and all other coordinates are the
 same in the two variables, I should be able to aggregate them into a
 single data variable having a size-two dimension for height. I would
 expect to be able to do exactly the same thing if the two data variables
 each had a size-one coordinate variable of height instead of a scalar, or
 if one was a scalar and the other had size one. I see no need or value to
 having conceptual differences among these cases.

 Going the other way, if I extract a single height level from a data
 variable with a multivalued height coordinate, I expect to get a size-one
 height coordinate. However, I don't think this has a different meaning
 from a data variable which has a scalar height coordinate of the same
 value. Both of them are on the same single level. There is no semantic
 distinction, in my opinion, just a formal one, which is a matter of
 convenience.

 Steve's question 1 asks what is the semantic difference between a
 coordinate variable and an auxiliary coordinate variable. According to the
 [https://cf-pcmdi.llnl.gov/trac/ticket/95 draft CF data model], whose
 development has got stuck because we can't agree on the issue being
 discussed in this ticket and ticket 104, a coordinate variable is 1D,
 monotonic and numeric (I presume because if it's not numeric it's hard to
 make it reliably monotonic), while an auxiliary coordinate variable can be
 multidimensional, string-valued or numeric, and doesn't have to be
 monotonic (which wouldn't make sense if it was multidimensional anyway).
 In addition, there can be only one coordinate variable for any given
 dimension, but there can be any number of auxiliary coordinate variables
 with any given dimension.

 Steve's question 2 refers to the special case when you have both a
 coordinate variable and an auxiliary coordinate variable with a single
 size-one dimension. With size one, monotonicity is not an issue,
 obviously. If they are both numeric, either of them could be the
 coordinate variable, and the other the auxiliary coordinate variable. This
 freedom of choice is not limited to size one. If you have a 1D coordinate
 variable and several 1D auxiliary coordinate variables of the same
 dimension, and all of them are numeric and monotonic, any one of the
 auxiliary coordinate variables could equally serve as the coordinate
 variable. For instance, I might have a multivalued coordinate variable of
 height, and going along with it an auxiliary coordinate variable
 `model_level_number(height)`, which is also monotonic. It would be equally
 valid to switch them round and have a coordinate variable of model level
 number and an auxiliary coordinate variable `height(model_level_number)`.
 This is a choice I can freely make when encoding the dataset. It depends
 on whether I want to regard height or model level number as an independent
 spatial coordinate of the data. The other quantity is a dependent
 variable, a function of the independent spatial coordinate. The
 distinction is indicated formally in netCDF by making one of them a
 (Unidata) coordinate variable, whose name equals the name of its
 dimension, and the other a (CF) coordinate variable, named by the CF
 `coordinates` attribute of the data variable. Part of the answer to
 Steve's question 2 is that this distinction is made syntactically in just
 the same way for size-one variables as for multivalued variables.

 Since a numeric scalar coordinate variable doesn't have a dimension, it's
 not possible to tell formally whether it's semantically the same as a
 coordinate variable or an auxiliary coordinate variable. Therefore in
 ticket 104 we propose to make it clear that it is semantically a
 coordinate variable. (If it's not numeric, it must be semantically an
 auxiliary coordinate variable.) We think that's what was meant when scalar
 coordinate variables were invented. We also think this is the right choice
 because you only really need an auxiliary coordinate variable if you also
 have a coordinate variable (for instance, if you have both model level
 number and height), and in that situation you must have a netCDF dimension
 to show that they are linked. That's the only way CF-netCDF offers to
 indicate the connection between them. Hence you cannot use scalar
 coordinate variables in that situation. If you do have scalars of both
 height and model level number, the most obvious interpretation is that
 they are independent, we think. If that wasn't the intention of the data-
 writer, the file is defective, but not illegal. It needn't do much damage
 in practice, because it should be easy for the user of software to
 indicate that these two coordinates should actually be regarded as
 belonging together, if that is relevant to know (for instance, in
 aggregation).

 The bottom of line of this rather lengthy contribution, in response to
 Steve's nice short one, is that
   * Question 1 is already clear in CF.
   * Question 2 is clear for size one dimensions.
   * Question 2 is not clearly enough answered for scalars, and ticket 104
 proposes to make it clear (i.e. numeric scalars are coordinate variables,
 not auxiliaries).
   * My answer to question 3 is No. I can't see anything in CF currently
 which makes a distinction and I don't think it would be helpful to do so.

 Cheers

 Jonathan

-- 
Ticket URL: <https://cf-pcmdi.llnl.gov/trac/ticket/105#comment:3>
CF Metadata <http://cf-pcmdi.llnl.gov/>
CF Metadata

This message came from the CF Trac system.  To unsubscribe, without 
unsubscribing to the regular cf-metadata list, send a message to 
"[email protected]" with "unsubscribe cf-metadata" in the body of your 
message.

Re: [CF Metadata] #105: Scalar Coordinates

Reply via email to