Re: [CF Metadata] #152: Time mean over area fractions which vary with time

CF Metadata Wed, 19 Oct 2016 10:31:21 -0700

#152: Time mean over area fractions which vary with time
-----------------------------+------------------------------
  Reporter:  martin.juckes   |      Owner:  cf-conventions@…
      Type:  enhancement     |     Status:  new
  Priority:  medium          |  Milestone:
 Component:  cf-conventions  |    Version:
Resolution:                  |   Keywords:
-----------------------------+------------------------------


Comment (by taylor13):

 Dear Martin and Jonathan,

 I think it is confusing (wrong?) to consider the construct “where <type1>
 over <type2>” as applying only to a horizontal dimension (even if
 “area_type” seems to imply this).  In fact a time series taken at a single
 point location will have some sort of “area_type” defined at that point
 (and it won’t be fractional).  If the area_type varies over space or over
 time, then we can define “cell_methods = <standard_name1:
 [<standard_name2]>: mean where <type1> over <type2>” to mean:  “integrate
 quantity over the dimension(s) identified by their standard_names but
 conditional on the co-located existence of area-type1 and then divide by
 the integral of the unit scalar (i.e., 1) over that same dimension(s)
 conditional on the co-located existence of area-type2.”  I think that just
 because we restrict the names allowed for type1 and type2 to be those
 names found in the area_type controlled vocabulary doesn’t mean that in
 the cell_methods attribute they *must* represent an area fraction.   I
 think more generally they can be viewed as a binary indicator of the
 existence of the surface type at a particular point and a particular
 instant of time.  (After temporal or spatial averaging, of course they
 would in fact generally represent a fraction.)

 So, I think starting section 7.3.3 by referring to the “horizontal area of
 a cell” should be avoided.  I suggest rewriting the beginning of section
 7.3.3 as follows:

 ----

 “By default, the statistical method indicated by cell_methods is assumed
 to have been evaluated over the entire domain of a cell (where a cell is
 not necessarily restricted to spatial dimensions).  Sometimes, however, it
 is useful to limit consideration to only a subdomain of a cell (e.g. a
 mean over the sea-ice area, or a time mean considering only times when
 snow exists).  The subdomains are restricted to be identified by one of
 the strings permitted for a variable with the standard_name area_type.
 There are two options for indicating when a quantity represents a
 subdomain of a cell.

 “The first option is used for the common case that the quantity of
 interest has been recorded for a single subdomain. In this case, the
 cell_methods attribute may include a string of the form "name: method
 where type".  As an example, if cell_methods is “area: mean where
 sea_ice”, then the data would represent a mean over only the sea ice
 portion of the grid cell. On the other hand for a point location, if the
 cell_methods is “time: mean where sea_ice”, then the data would represent
 the time-mean at that point based on samples obtained when sea ice existed
 there.

 “When this first option (describe in the preceding paragraph) is adopted,
 none of the variables appearing in the netCDF file should be given a name
 identical to any string recording the area_type.  This restriction is
 imposed so that it will be clear that the data writer has not elected to
 adopt the second option (described in the next paragraph).

 "The second approach for indicating that a statistic applies to only a
 portion of a cell is more general because a single variable can contain
 statistics for multiple area-types. In this case, the cell_methods entry
 is of the form "name: method where typevar". Here typevar is a string-
 valued auxiliary coordinate variable or string-valued scalar coordinate
 variable (see Section 6.1, "Labels") with a standard_name of area_type.
 The variable typevar contains the name(s) of the selected portion(s) of
 the grid cell to which the method is applied. This convention can
 accommodate cases in which a method is applied to more than one area type
 and the result is stored in a single data variable (with a dimension which
 ranges across the various area types). It provides a convenient way to
 store output from land surface models, for example, since they deal with
 many area types within each surface gridbox (e.g., vegetation,
 bare_ground, snow, etc.)."


 ----

 The discussion of the “where … over” option after example 7.6 should also
 I think be rewritten:

 ----

 “If the method is mean, various ways of calculating the mean can be
 distinguished in the cell_methods attribute with a string of the form
 “mean where` type1 [over type2]".  Here, type1 can be any of the
 possibilities allowed for typevar or type (as specified in the paragraphs
 preceding the above example). The same options apply to type2, except it
 is not allowed to be the name of an auxiliary coordinate variable with a
 dimension greater than one (ignoring the dimension accommodating the
 maximum string length).

 "A cell_methods attribute with a string of the form "`area: mean where`
 type1 over type2" indicates the mean is calculated by integrating over the
 type1 portion of the cell and dividing by the area of the type2 portion.
 When “over type2” is omitted, it is assumed to be the same as type1.

 "When “area” is not the only “dimension” for which the “where… over”
 construct is used in a cell_methods, the interpretation more generally is
 that a “weighted” mean is being reported.  Specifically, the quantity of
 interest is integrated over the specified dimension(s) with weights
 proportional to the fraction of “type1” area_type that exists, and then
 this is divided by the integral of the fraction of “type2” area_type that
 exists.

 "Note that “all_area_types” is one of the valid strings permitted for a
 variable with the standard_name area_type, so a cell_methods string of the
 form "area: `mean where all_area_types over` type2" indicates the mean is
 calculated by integrating over all types of area and dividing by the area
 of the type2 portion.

 "The following three examples illustrate cases when one might want to use
 “where” or “where … over” in defining the cell_methods:

 1.      Suppose that in a grid cell the fractional sea ice varies over
 time, but there is interest in the time-mean surface temperature of the
 sea ice.  The time-samples, each representing a spatially-averaged sea ice
 temperature can be summed and then divided by the number of samples to
 obtain an unweighted mean where sea ice exists.  This would be indicated
 with:

           cell_methods = “area: mean where sea_ice time: mean”

 2.      Suppose there is interest in recording the mean fractional area
 covered by sea ice and the mean sea ice thickness in such a way that their
 product would equal the time-mean volume of sea ice in each grid cell.  In
 this case the sea ice area would be reported as an unweighted time-mean,
 while the mean sea ice thickness would be calculated with time samples
 weighted by the fractional area of sea ice. Thus, for sea ice thickness:

           cell_methods = “area: time: mean where sea_ice”

 3.      Suppose the time-mean contributions to total heat flux from
 different portions of a grid cell (e.g., ice-free and ice-covered) are of
 interest, and there are reasons to report these in such a way that the
 total heat flux is the sum of the individual contributions.  Then the
 cell_methods attribute would be defined:

           cell_methods=”area: mean where sea_ice over sea time: mean”


 ----
 best regards,
 Karl

--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/152#comment:9>
CF Metadata <http://cf-convention.github.io/>
CF Metadata

Re: [CF Metadata] #152: Time mean over area fractions which vary with time

Reply via email to