#152: Time mean over area fractions which vary with time
-----------------------------+------------------------------
Reporter: martin.juckes | Owner: cf-conventions@…
Type: enhancement | Status: new
Priority: medium | Milestone:
Component: cf-conventions | Version:
Resolution: | Keywords:
-----------------------------+------------------------------
Comment (by taylor13):
Dear Martin and Jonathan,
I think it is confusing (wrong?) to consider the construct “where <type1>
over <type2>” as applying only to a horizontal dimension (even if
“area_type” seems to imply this). In fact a time series taken at a single
point location will have some sort of “area_type” defined at that point
(and it won’t be fractional). If the area_type varies over space or over
time, then we can define “cell_methods = <standard_name1:
[<standard_name2]>: mean where <type1> over <type2>” to mean: “integrate
quantity over the dimension(s) identified by their standard_names but
conditional on the co-located existence of area-type1 and then divide by
the integral of the unit scalar (i.e., 1) over that same dimension(s)
conditional on the co-located existence of area-type2.” I think that just
because we restrict the names allowed for type1 and type2 to be those
names found in the area_type controlled vocabulary doesn’t mean that in
the cell_methods attribute they *must* represent an area fraction. I
think more generally they can be viewed as a binary indicator of the
existence of the surface type at a particular point and a particular
instant of time. (After temporal or spatial averaging, of course they
would in fact generally represent a fraction.)
So, I think starting section 7.3.3 by referring to the “horizontal area of
a cell” should be avoided. I suggest rewriting the beginning of section
7.3.3 as follows:
----
“By default, the statistical method indicated by cell_methods is assumed
to have been evaluated over the entire domain of a cell (where a cell is
not necessarily restricted to spatial dimensions). Sometimes, however, it
is useful to limit consideration to only a subdomain of a cell (e.g. a
mean over the sea-ice area, or a time mean considering only times when
snow exists). The subdomains are restricted to be identified by one of
the strings permitted for a variable with the standard_name area_type.
There are two options for indicating when a quantity represents a
subdomain of a cell.
“The first option is used for the common case that the quantity of
interest has been recorded for a single subdomain. In this case, the
cell_methods attribute may include a string of the form "name: method
where type". As an example, if cell_methods is “area: mean where
sea_ice”, then the data would represent a mean over only the sea ice
portion of the grid cell. On the other hand for a point location, if the
cell_methods is “time: mean where sea_ice”, then the data would represent
the time-mean at that point based on samples obtained when sea ice existed
there.
“When this first option (describe in the preceding paragraph) is adopted,
none of the variables appearing in the netCDF file should be given a name
identical to any string recording the area_type. This restriction is
imposed so that it will be clear that the data writer has not elected to
adopt the second option (described in the next paragraph).
"The second approach for indicating that a statistic applies to only a
portion of a cell is more general because a single variable can contain
statistics for multiple area-types. In this case, the cell_methods entry
is of the form "name: method where typevar". Here typevar is a string-
valued auxiliary coordinate variable or string-valued scalar coordinate
variable (see Section 6.1, "Labels") with a standard_name of area_type.
The variable typevar contains the name(s) of the selected portion(s) of
the grid cell to which the method is applied. This convention can
accommodate cases in which a method is applied to more than one area type
and the result is stored in a single data variable (with a dimension which
ranges across the various area types). It provides a convenient way to
store output from land surface models, for example, since they deal with
many area types within each surface gridbox (e.g., vegetation,
bare_ground, snow, etc.)."
----
The discussion of the “where … over” option after example 7.6 should also
I think be rewritten:
----
“If the method is mean, various ways of calculating the mean can be
distinguished in the cell_methods attribute with a string of the form
“mean where` type1 [over type2]". Here, type1 can be any of the
possibilities allowed for typevar or type (as specified in the paragraphs
preceding the above example). The same options apply to type2, except it
is not allowed to be the name of an auxiliary coordinate variable with a
dimension greater than one (ignoring the dimension accommodating the
maximum string length).
"A cell_methods attribute with a string of the form "`area: mean where`
type1 over type2" indicates the mean is calculated by integrating over the
type1 portion of the cell and dividing by the area of the type2 portion.
When “over type2” is omitted, it is assumed to be the same as type1.
"When “area” is not the only “dimension” for which the “where… over”
construct is used in a cell_methods, the interpretation more generally is
that a “weighted” mean is being reported. Specifically, the quantity of
interest is integrated over the specified dimension(s) with weights
proportional to the fraction of “type1” area_type that exists, and then
this is divided by the integral of the fraction of “type2” area_type that
exists.
"Note that “all_area_types” is one of the valid strings permitted for a
variable with the standard_name area_type, so a cell_methods string of the
form "area: `mean where all_area_types over` type2" indicates the mean is
calculated by integrating over all types of area and dividing by the area
of the type2 portion.
"The following three examples illustrate cases when one might want to use
“where” or “where … over” in defining the cell_methods:
1. Suppose that in a grid cell the fractional sea ice varies over
time, but there is interest in the time-mean surface temperature of the
sea ice. The time-samples, each representing a spatially-averaged sea ice
temperature can be summed and then divided by the number of samples to
obtain an unweighted mean where sea ice exists. This would be indicated
with:
cell_methods = “area: mean where sea_ice time: mean”
2. Suppose there is interest in recording the mean fractional area
covered by sea ice and the mean sea ice thickness in such a way that their
product would equal the time-mean volume of sea ice in each grid cell. In
this case the sea ice area would be reported as an unweighted time-mean,
while the mean sea ice thickness would be calculated with time samples
weighted by the fractional area of sea ice. Thus, for sea ice thickness:
cell_methods = “area: time: mean where sea_ice”
3. Suppose the time-mean contributions to total heat flux from
different portions of a grid cell (e.g., ice-free and ice-covered) are of
interest, and there are reasons to report these in such a way that the
total heat flux is the sum of the individual contributions. Then the
cell_methods attribute would be defined:
cell_methods=”area: mean where sea_ice over sea time: mean”
----
best regards,
Karl
--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/152#comment:9>
CF Metadata <http://cf-convention.github.io/>
CF Metadata