Re: [CF-metadata] Return periods

Hollis, Dan Thu, 18 Sep 2014 07:58:33 -0700

Hi all,

Following the various discussions in this thread I would like to request the 
following new standard name:


precipitation_amount_converted_to_cumulative_probability

With the following definition:

"Amount" means mass per unit area. A variable whose standard name has the form 
X_converted_to_cumulative_probability will contain a value of the cumulative 
distribution function of X i.e. the probability of observing a value of X less 
than or equal to the value of X defined by the cell bounds and cell methods. 
The variable must have a value between 0.0 and 1.0. The cell methods must 
describe the processing of quantity X prior to the conversion to probability.

The units would be '1'.

I'm not sure how good the above definition is, so I'd welcome suggestions for 
improvement.


I'd also like to propose the following addition to the Transformations section 
of the Guidelines for Construction of CF Standard Names:

X_converted_to_cumulative_probability

With units of '1' and the following meaning:

Cumulative distribution function of X i.e. a value between 0.0 and 1.0 giving 
the probability of observing a value of X less than or equal to the value of X 
defined by the cell bounds and cell methods.

Again, I'd be grateful for suggestions on how to improve the wording to make 
the meaning clearer and unambiguous.

Thanks,

Dan


-----Original Message-----
From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of 
Hollis, Dan
Sent: 11 September 2014 16:35
To: Gregory, Jonathan; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Return periods

Hi Jonathan,

Following our brief chat earlier this week I think I have a better 
understanding of the right way to tackle this. For the record here are the key 
points:

As described previously, the probability is a conversion of the precipitation 
amount. To store both quantities _could_ be seen as redundancy. However the 
conversion process is non-trivial hence it is justified to store both.

We _could_ store both quantities in the same file (as suggested below) however 
this, of itself, does not establish any special link between the variables 
(other than making it easy to see that they share coordinates). As we plan to 
store all our other variables (temperature, wind speed, sunshine etc) in 
separate files it makes sense to do the same for the precipitation probability 
(rather than create an exception for one variable).

Your proposed standard name of 
"precipitation_amount_converted_to_cumulative_probability" might lead the user 
to ask 'which precipitation amount?'. Your recommendation is for the 
precipitation probability variable to have the same time bounds and cell method 
as the precipitation amount variable e.g. bounds = "2014-08-01 09:00, 
2014-09-01 09:00" (for Aug 2014) and cell_methods = "time: sum". The idea is 
that this would be sufficient to define which precipitation amount the 
probability relates to (although the user would have to seek out the 
precipitation amount field itself if they needed to know the actual values). I 
guess it would be important to declare in the definition that the cell method 
is applied *before* the conversion to cumulative probability.

Does this agree with what you had in mind?

Regarding standard names, I shall request the name you suggested unless anyone 
else has other ideas. However, I also have two general questions related to 
this:

Given that cumulative probabilities may be of general interest to other users, 
would it be helpful to add "X_converted_to_cumulative_probability" to the list 
of transformations in the Guidelines for Standard Names?

Given that the meaning of each transformation is defined in the Guidelines, is 
it necessary to request a new standard name if I am simply combining an 
existing transformation with an existing standard name?

The reason for my second question is that I can see many examples of standard 
names that incorporate transformations (e.g. change_over_time_in_X, 
direction_of_X, divergence_of_X etc). Would it not be better practice to define 
only untransformed quantities, and then allow users to combine these with any 
of the defined transformations without needing to add to the standard name 
table?

Regards,

Dan



-----Original Message-----
From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of 
Jonathan Gregory
Sent: 09 September 2014 11:28
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Return periods

Dear Dan

Yes, I see what you mean regarding the aux coord, and it's a neat idea, but
it doesn't seem quite right to me. Aux coords are alternative or additional
information. The lat(x,y) and lon(x,y) coordinates provide an alternative way
to locate the point (x,y), in a different coordinate system. The precipitation
probability, however, would determine the precipitation entirely. There isn't
any coordinate information which would give you the precipitation amount. That
is why I don't think the probability can be an aux coord. Does that make sense?

> You are right regarding the calculation - we are using a statistical model of 
> the relationship between monthly rainfall and return period that was 
> developed many years ago by a colleague from an analysis of 60 years of 
> historical data. The model uses values of the coefficients of variation and 
> skewness to describe the distribution of monthly rainfall (assumed to be 
> log-normal). To capture how the shape of the distribution varies with 
> location we have pre-calculated values of these coefficients available at 
> each point on a 5 km grid.

Right. So it is reasonable to describe it as a conversion of precipitation
amount to probability, I think.

> If a new standard name is required then I'm happy to take your advice on a 
> suitable choice.

It would be useful to know if anyone else reading this has a view on my
suggestion of precipitation_amount_converted_to_cumulative_probability.

> What is still not clear to me is how I maintain a clear link between the two 
> fields without storing some of the information twice. Is it simply a case of 
> storing two variables in the same NetCDF file (so that they share 
> coordinates)?

If they are in the same file, indeed it is obvious if the fields have the same
spatiotemporal coordinates, because they share the coord vars, as you say. If
they are in different files, the data-user has to check whether the coords are
the same. There is no convention which would allow one to be sure about that
without checking. CF does not rely on variable names, for instance. This is a
very common situation, in fact. For instance, in the CMIP archives each
quantity is in a separate file, and the data variables in many files typically
have the same spatiotemporal coordinates, but analysis software cannot be sure
of that without checking.

Best wishes

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] Return periods

Reply via email to