Re: [CF-metadata] standards for probabilities

2011-12-13 Thread Jonathan Gregory
Dear Vegard

On further thought, I wonder whether the confidence_interval could be regarded
as a cell_method, if it relates to a (collapsed) axis of realization?

Best wishes

Jonathan

On Wed, Dec 07, 2011 at 06:55:47PM +, Jonathan Gregory wrote:
 Date: Wed, 7 Dec 2011 18:55:47 +
 From: Jonathan Gregory j.m.greg...@reading.ac.uk
 To: Vegard B??nes vegard.bo...@met.no
 Cc: cf-metadata@cgd.ucar.edu
 Subject: Re: [CF-metadata] standards for probabilities
 User-Agent: Mutt/1.5.21 (2010-09-15)
 
 Dear Vegard
 
 Thanks for your email. Now I understand what you mean by confidence i.e. a
 confidence level for a value which has uncertainty. I agree, this is like 
 other
 uses for standard_name modifiers, in particular the standard_error modifier.
 You need to link it to an extra dimension, and I suggest that the best way
 to do this would be through a standard_name. For instance, define the
 standard_name modifier of confidence_interval (confidence alone seems a bit
 vague - that's why I didn't understand what you meant), and state that if
 a variable has this modifier, it must have a coordinate variable or scalar
 coordinate variable whose standard_name is confidence_level. Both the modifier
 and the standard_name would be additions to CF.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-12-08 Thread Jonathan Gregory
Dear Lorenzo

Thank you for your email.

 I think the cell-methods mechanism has a partial overlapping with netCDF-U, 
 in that it can account for (some of the) UncertML Summary Statistics 
 concepts. However, it does not currently address Distributions and Samples.
 We could think of extending it, but we preferred to introduce a new 
 mechanism, based on the standard URI syntax and RDF semantics.
 
 On the other hand, the cell-methods mechanism is arguably more fine-grained 
 than netCDF-U, allowing to express different methods on multi-dimensional 
 variables, particular as far as the semantics of dimension intervals is 
 concerned.

Yes, I agree with your last point. An important aspect of cell_methods is that
it relates to particular axes. Describing a quantity just as a variance, for
instance, can be rather vague: it may be necessary to know if it's a variance
over space, over time or over ensemble members, for example. Possibly you
could consider including your URIs and some other extra information as comments
in cell_methods. These would be legal but unstandardised as far as CF is
concerned, but you could standardise them in your convention e.g.

  double biotemperature_variance(lat,lon);
biotemperature_variance:units = degC; // shouldn't it be degC^2 for a 
variance?
biotemperature_variance:cell_methods=realization: variance (ref 
http://www.uncertml.org/distributions/normal#variance)

The cell_methods here refers to realization as a standard name, which is
allowed even though realization isn't a dimension. If you do have a dimension
for realization, as in one of your examples, the coordinate variable for that
dimension could have a standard_name=realization attribute. If the variance
was over an existing dimension, that could be used e.g.

  double biotemperature_mean(time,lat,lon):
biotemperature_mean:units = degC;
biotemperature_mean:cell_methods=time: mean (ref 
http://www.uncertml.org/distributions/normal#mean)

Of course, this will only work for those statistical methods which are
allowed by cell_methods. However, you could propose others to include in
Appendix E if they are ways of computing statistics like those.

Looking at your examples, I wonder why you have, for instance
  lon:_CoordinateAxisType = Lon;
What is the need for this new attribute? CF already offers these two methods
to indicate such an axis:
  lon:axis=X;
  lon:standard_name=longitude;
and in addition, the units of degrees_east imply that it is longitude.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-12-05 Thread Lorenzo Bigagli
Dear Vegard, all,

Please find enclosed the draft netCDF-U conventions, as presented to the Earth 
System Science DWG at the last OGC TC Meeting in Bruxelles last wednesday.

We are continuing the discussion on the draft within the relevant OGC Working 
Groups, to address and incorporate comments and improve the document for the 
next Meeting (march 2012, Austin, Texas).
The contribution of the CF-metadata community would be very welcome.

To facilitate communication and avoid cross-posting, I would ask the interested 
readers to contact me, so we can keep them posted (possibly we may set up a 
dedicated mailing list, if need be).

As you will read in the attachment, we have tried to be convention-neutral, in 
particular making sure that netCDF-U fully integrates with the netCDF-CF 
Conventions, using the same constructs when possible (e.g. the 
ancillary_variables attribute). 
Ideally, we think of datasets that would conform to _both_ the conventions:
:Conventions = CF-1.5 UW-1.0
NetCDF-U is based on  a generic mechanism for annotating netCDF variables 
according to the UncertML conceptual model.

The rationale for this it that we argue a probabilistic description of 
scientific quantities is a cross-cutting aspect that may be modularized.
In particular, we wouldn't clutter the application-level dictionaries with 
probabilistic concepts and jargon.

This said, I think the cell-methods mechanism has a partial overlapping with 
netCDF-U, in that it can account for (some of the) UncertML Summary Statistics 
concepts. However, it does not currently address Distributions and Samples.
We could think of extending it, but we preferred to introduce a new mechanism, 
based on the standard URI syntax and RDF semantics.

On the other hand, the cell-methods mechanism is arguably more fine-grained 
than netCDF-U, allowing to express different methods on multi-dimensional 
variables, particular as far as the semantics of dimension intervals is 
concerned.
I am unclear at the moment how much this is application dependent, and hence 
strictly pertaining to the CF conventions.
In other words, it may be that some application would still need the more 
fine-grained semantics of CF cell-methods to make sense of the data.

This is in line with our aim of netCDF-U being complementary to netCDF-CF, and 
not replacing it.
On the other hand, a generic mechanism for associating distinct summary 
statistics semantics to distinct dimension variables may be useful in general.
We are looking forward to investigate this further and we welcome your 
expression of interest in being involved in this discussion.

Regards,
  LB


Il giorno 29/nov/2011, alle ore 10:17, Vegard Bønes ha scritto:

 Hi,
 
 Unfortunately, I did not see your email until today, so I have not taken it 
 into consideration when choosing how to handle probabilities for myself.
 
 Like Roy, however, I believe that expressing probabilities as dimensions 
 rather than attributes is the best approach for this. 
 
 The problem with attributes is that if you have many variables and many 
 probabilities or percentiles,  you will get a huge amount of separate 
 variables in your data set, and the relationship between them is not as easy 
 to see as when (for example) all percentiles for air temperature are lumped 
 into the same variable. If many users are like me, and use ncdump and ncview 
 to look at the data, this will be a huge advantage.
 
 
 VG
 
 
 - Original Message -
 Fra: Lorenzo Bigagli lorenzo.biga...@pin.unifi.it
 Til: cf-metadata@cgd.ucar.edu, vegard bones vegard.bo...@met.no
 Kopi: Nativi Stefano stefano.nat...@cnr.it
 Sendt: 24. november 2011 18:07:07
 Emne: [CF-metadata] standards for probabilities
 
 
 Dear Vegard, all, 
 
 
 I take the opportunity to inform you that we are drafting a proposal for a 
 netCDF convention on uncertainty (NetCDF-U). 
 This work is partly developed in the framework of the FP7 UncertWeb project. 
 
 
 We are going to present it next wednesday at the Open Geospatial Consortium 
 TC Meeting in Bruxelles, to circulate it shortly after. 
 
 
 We have tried to be convention-neutral, in particular making sure that 
 netCDF-U fully integrates with the netCDF-CF Conventions, even using the same 
 constructs when possible (e.g. the ancillary_variables attribute). 
 Ideally, we think of datasets that would conform to both the conventions: 
 :Conventions = CF-1.5 UW-1.0 
 
 
 
 
 NetCDF-U is based on a generic mechanism for annotating netCDF variables 
 according to the UncertML conceptual model. 
 The first example in your use-case would read something like (note that CF 
 attributes are unchanged): 
 
 
 float precipitation_25(time, x, y) ; 
 precipitation_25:standard_name = precipitation_amount ; 
 precipitation_25:long_name = precipitation_amount 25th percentile ; 
 precipitation_25:ref =  http://www.uncertml.org/statistics/percentile  ; 
 precipitation_25:level = 25 ; 
 
 
 
 
 The second, provided we have a variable

Re: [CF-metadata] standards for probabilities

2011-12-02 Thread Jonathan Gregory
Dear Vegard

 A dimension (and variable) for specifying percentiles:
 float percentile(percentile) ;
 percentile:units = 1 ;
 percentile:standard_name = cumulative_distribution_function 
 ; 
 float air_temperature_percentiles(time, percentile, latitude, 
 longitude) ;
 air_temperature_percentiles:units = K ;
 air_temperature_percentiles:standard_name = air_temperature 
 ;

This looks sensible to me. Are you proposing cumulative_distribution_function
as a new standard name?

 ...an alternative for percentile could be 
 cumulative_distribution_function_over_realization.

Yes. That would be more informative, and therefore preferable, I think.

 Then, there is the problem of certainty that a temperature will be within a 
 given range.

Could you do that like this:

float air_temperature(air_temperature);
  air_temperature:bounds=air_temperature_bounds;
  air_temperature:units=K;
float air_temperature_bounds(air_temperature,2);
float air_temperature_confidence(time,air_temperature,latitude,longitude);
  air_temperature_confidence:standard_name=probability;

Then the air_temperature_bounds specify the ranges of air temperature for
which the probability is evaluated. probability would be a new standard name
as well. Again it could be made more informative as something like
probability_over_realization. This is instead of a standard_name modifier
and seems more consistent to me with the treatment of percentile. It is a
kind of transpose.

 Also, which I did not mention in the previous emails, I also wanted to 
 express the probability of at least x mm of precipitation.

This can be done in the same way as the probability of air temperature ranges,
with the upper bounds for precipitation ranges set high enough that they are
effectively infinite.

 When working with this, I found that expressing  percentiles and 
 probabilities as dimensions, instead of attributes, made the relationship 
 between them more intuitive.

I agree.

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-29 Thread Vegard Bønes
Hi,

Unfortunately, I did not see your email until today, so I have not taken it 
into consideration when choosing how to handle probabilities for myself.

Like Roy, however, I believe that expressing probabilities as dimensions rather 
than attributes is the best approach for this. 

The problem with attributes is that if you have many variables and many 
probabilities or percentiles,  you will get a huge amount of separate variables 
in your data set, and the relationship between them is not as easy to see as 
when (for example) all percentiles for air temperature are lumped into the same 
variable. If many users are like me, and use ncdump and ncview to look at the 
data, this will be a huge advantage.


VG


- Original Message -
Fra: Lorenzo Bigagli lorenzo.biga...@pin.unifi.it
Til: cf-metadata@cgd.ucar.edu, vegard bones vegard.bo...@met.no
Kopi: Nativi Stefano stefano.nat...@cnr.it
Sendt: 24. november 2011 18:07:07
Emne: [CF-metadata] standards for probabilities


Dear Vegard, all, 


I take the opportunity to inform you that we are drafting a proposal for a 
netCDF convention on uncertainty (NetCDF-U). 
This work is partly developed in the framework of the FP7 UncertWeb project. 


We are going to present it next wednesday at the Open Geospatial Consortium TC 
Meeting in Bruxelles, to circulate it shortly after. 


We have tried to be convention-neutral, in particular making sure that netCDF-U 
fully integrates with the netCDF-CF Conventions, even using the same constructs 
when possible (e.g. the ancillary_variables attribute). 
Ideally, we think of datasets that would conform to both the conventions: 
:Conventions = CF-1.5 UW-1.0 




NetCDF-U is based on a generic mechanism for annotating netCDF variables 
according to the UncertML conceptual model. 
The first example in your use-case would read something like (note that CF 
attributes are unchanged): 


float precipitation_25(time, x, y) ; 
precipitation_25:standard_name = precipitation_amount ; 
precipitation_25:long_name = precipitation_amount 25th percentile ; 
precipitation_25:ref =  http://www.uncertml.org/statistics/percentile  ; 
precipitation_25:level = 25 ; 




The second, provided we have a variable difference(Lat=100, Lon=100) that 
contains the difference between the observed value and the forecast: 


float probability(Lat=100, Lon=100) ; 
probability:ref =  http://www.uncertml.org/statistics/probability  ; 
probability:gt = -2.5 ; 
probability:lt = 2.5 ; 




I apologize if this is not clear enough, for the moment, and I hope it can be 
of prospective interest. 
Any comment is very appreciated. 


Best regards, 
Lorenzo Bigagli 








--- 
Dott. Lorenzo Bigagli 

Consiglio Nazionale delle Ricerche 
Istituto di Metodologie per l'Analisi Ambientale (CNR-IMAA) 


i: Area della Ricerca di Potenza, Contrada Santa Loja 
Zona Industriale, 85050 Tito Scalo (PZ), Italia 
t: +39 0971 427221 
f: +39 0971 427222 
m: lorenzo.biga...@cnr.it 
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] standards for probabilities

2011-11-25 Thread Lorenzo Bigagli
Dear Vegard, all,

I take the opportunity to inform you that we are drafting a proposal for a 
netCDF convention on uncertainty (NetCDF-U).
This work is partly developed in the framework of the FP7 UncertWeb project.

We are going to present it next wednesday at the Open Geospatial Consortium TC 
Meeting in Bruxelles, to circulate it shortly after.

We have tried to be convention-neutral, in particular making sure that netCDF-U 
fully integrates with the netCDF-CF Conventions, even using the same constructs 
when possible (e.g. the ancillary_variables attribute). 
Ideally, we think of datasets that would conform to both the conventions:
:Conventions = CF-1.5 UW-1.0


NetCDF-U is based on  a generic mechanism for annotating netCDF variables 
according to the UncertML conceptual model.
The first example in your use-case would read something like (note that CF 
attributes are unchanged):

float precipitation_25(time, x, y) ;
   precipitation_25:standard_name = precipitation_amount ;
   precipitation_25:long_name = precipitation_amount 25th percentile ;
   precipitation_25:ref = http://www.uncertml.org/statistics/percentile; ;
   precipitation_25:level = 25 ;


The second, provided we have a variable difference(Lat=100, Lon=100) that 
contains the difference between the observed value and the forecast:

float probability(Lat=100, Lon=100) ;
probability:ref = http://www.uncertml.org/statistics/probability; ;
probability:gt = -2.5 ;
probability:lt = 2.5 ;


I apologize if this is not clear enough, for the moment, and I hope it can be 
of prospective interest.
Any comment is very appreciated.

Best regards,
  Lorenzo Bigagli


---
Dott. Lorenzo Bigagli
Consiglio Nazionale delle Ricerche
Istituto di Metodologie per l'Analisi Ambientale (CNR-IMAA)

i: Area della Ricerca di Potenza, Contrada Santa Loja
   Zona Industriale, 85050 Tito Scalo (PZ), Italia
t: +39 0971 427221
f: +39 0971 427222
m: lorenzo.biga...@cnr.it

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-18 Thread Kettleborough, Jamie
Hello Vegard,

Sorry it looks like my memory is more flawed than I thought.  I can't find 
realization_weight in the standard name table, although I thought I remembered 
it being added.  I think a possible entry point into the discussion at the time 
around this can be found at

http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2007/019229.html

Though the whole discussion of probabilities and ensembles is one of those 
things that it seems hard to keep the momentum going on (at least for me) given 
how hard the problem is and the available brain cycles.

Jamie

 -Original Message-
 From: Vegard Bønes [mailto:vegard.bo...@met.no] 
 Sent: 16 November 2011 12:19
 To: Kettleborough, Jamie
 Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
 Subject: Re: [CF-metadata] standards for probabilities
 
 Hi,
 
 The methods for estimating probabililties are non-trivial, 
 and may change over time. Because of that I will prefer to 
 keep information about the exact process outside of the 
 generated file.
 
 I have not been able able to find any references to 
 realization_weight in the standard documents. Could you 
 please refer me to the right place?
 
 VG
 
 
 
 - Original Message -
 Fra: Jamie Kettleborough jamie.kettleboro...@metoffice.gov.uk
 Til: Vegard Bønes vegard.bo...@met.no, Jonathan Gregory 
 j.m.greg...@reading.ac.uk
 Kopi: cf-metadata@cgd.ucar.edu, Jamie Kettleborough 
 jamie.kettleboro...@metoffice.gov.uk
 Sendt: 16. november 2011 11:53:22
 Emne: RE: [CF-metadata] standards for probabilities
 
 Hello Vegard,
 
 How do you generate your cdf from your realisations?  Do you 
 simply weight each ensemble member equally?  I think there 
 are cases where you may weight by some measure of how 'good' 
 you think the ensemble member is (some sort of measure of its 
 error - you downweight those with high errors).  If you are 
 storing the output from ensemble members in the file then 
 think cf allows for this using the 'realzation_weight' 
 standard name - to store your errors/weights in the file.
 
 Furthermore you may want to know the sensitivity of your cdf 
 to your error estimates so you could have more than one cdf 
 for the same variable, but based on different ways of 
 deriving the errors/weights.
 
 Is this something CF needs to worry about, or is it a case of 
 trying to add something that's not really needed yet?  Or 
 maybe this is not in scope for CF anyway, and it should be 
 left to something more like 'audit/history/provenance' meta data?
 
 Jamie 
 
  -Original Message-
  From: cf-metadata-boun...@cgd.ucar.edu 
  [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Vegard Bønes
  Sent: 15 November 2011 13:15
  To: Jonathan Gregory
  Cc: cf-metadata@cgd.ucar.edu
  Subject: Re: [CF-metadata] standards for probabilities
  
  Thank you, Jonathan! :)
  
  So, a bit more concrete, this is option 1:
  
  float rain_25(time, y, x);
   rain_25:standard_name = precipitation_amount; 
 rain_25:cell_methods 
  = realization: percentile(25);
  
  The only problem I see with this is that in the resulting cdm 
  realization is not used anywhere, apart from possibly in 
 cell methods. 
  But maybe this is ok?
  
  
  If I understand the second option correctly, this would lead to 
  something like this:
  
  float precipitation_amount(time, percentile, y, x);  ...
  float percentile(percentile);
   percentile:units = 1;
   percentile:standard_name =
  cumulative_distribution_function_of_precipitation_amount;
  
  But what is the purpose of explicitly refering to 
 precipitation_amount 
  in the standard name? would not cumulative_distribution_function be 
  better? Then the same dimension could be used for other 
 data, such as 
  air_temperature.
  
  Or, if we want to add something about the nature of the source data 
  for the function, it could be called something like 
  cumulative_distribution_function_due_to_realization?
  
  
  I am still a bit uncertain about what is the best, though.
  
  
  -- Vegard
  
  
  
  
  - Original Message -
  Fra: Jonathan Gregory j.m.greg...@reading.ac.uk
  Til: Vegard B??nes vegard.bo...@met.no
  Kopi: cf-metadata@cgd.ucar.edu
  Sendt: 15. november 2011 11:11:52
  Emne: Re: [CF-metadata] standards for probabilities
  
  Dear Vegard
  
   I want to express such things as 25th percentile
  precipitation amount (based on ensemble data), and 
 probability that 
  air temperature will be within 2.5 degrees of the forecast. 
 How should 
  I do this?
  
  You are right, this case has not yet been dealt with, although the 
  guidelines for construction of standard names foresee that 
 needs like 
  this might arise!
  
  If the quantity is a precipitation_amount, it's fine to use that 
  standard name. The question is how to record that is the 25th 
  percentile. Two possible ways to do this would be:
  
  * To extend the possible syntax of cell_methods so that it can 
  describe percentiles. It is already possible to indicate a 
 median

Re: [CF-metadata] standards for probabilities

2011-11-18 Thread Jonathan Gregory
Dear Vegard

Sorry for slow response. I've been very busy this week.

 So, a bit more concrete, this is option 1:
 
 float rain_25(time, y, x);
  rain_25:standard_name = precipitation_amount;
  rain_25:cell_methods = realization: percentile(25);

Yes, except that cell_method only refers to variables and doesn't contain
constants, at the moment. Therefore I was thinking it could be something like

  float rain(time,y,x);
rain:cell_methods=realization: percentile pvar;
  float pvar(pvar);

and pvar is a coordinate variable which specifies the percentile(s). If there
is only one percentile, the dimension pvar=1, or pvar could be a scalar. This
syntax is the like the second one in the CF standard 7.3.3, for statistics
apply to different area-types, for the same reason: it needs to refer to a
coordinate variable in evaluating a statistic.

 The only problem I see with this is that in the resulting cdm realization is
 notused anywhere, apart from possibly in cell methods. But maybe this is ok?

Yes, it is OK, because standard_names can be included in cell_methods, and
realization is a standard_name.

Option 2:

 float precipitation_amount(time, percentile, y, x);
  ...
 float percentile(percentile);
  percentile:units = 1;
  percentile:standard_name = 
 cumulative_distribution_function_of_precipitation_amount;

To make this method as informative as option 1, the standard_name would be
cumulative_distribution_function_of_precipitation_amount_over_realization.
In option 1, over realization is indicated by the cell_methods.

You ask, But what is the purpose of explicitly refering to
precipitation_amount in the standard name? would not
cumulative_distribution_function be better? Then the same dimension could be
used for other data, such as air_temperature. I agree that would be an
advantage. I suggested that precipitation_amount should be stated by analogy
with the guidelines for probability_density_function_of_X. For a PDF, the
units depend on what X is, so you must have a standard_name which includes X.
A CDF and a PDF are related concepts. However, this is not a strong argument.
If you had a PDF, it would probably be a data variable, not a coordinate
variable like your CDF is here.

Regarding Roy's comment, I agree with his concern about profileration of
percentiles, but I think both of these options allow that generality, as in
both cases the percentile value(s) are in variables.

The advantage of option 2 is that it only requires new standard names, whereas
option 1 requires an alteration to the CF convention, and it's a bit simpler.
The advantage of option 1 is that it's more compact, and it is natural to
regard percentiles as a cell_method, I would argue. I'm not sure which is
better.

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-16 Thread Kettleborough, Jamie
Hello Vegard,

How do you generate your cdf from your realisations?  Do you simply weight each 
ensemble member equally?  I think there are cases where you may weight by some 
measure of how 'good' you think the ensemble member is (some sort of measure of 
its error - you downweight those with high errors).  If you are storing the 
output from ensemble members in the file then think cf allows for this using 
the 'realzation_weight' standard name - to store your errors/weights in the 
file.

Furthermore you may want to know the sensitivity of your cdf to your error 
estimates so you could have more than one cdf for the same variable, but based 
on different ways of deriving the errors/weights.

Is this something CF needs to worry about, or is it a case of trying to add 
something that's not really needed yet?  Or maybe this is not in scope for CF 
anyway, and it should be left to something more like 'audit/history/provenance' 
meta data?

Jamie 

 -Original Message-
 From: cf-metadata-boun...@cgd.ucar.edu 
 [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Vegard Bønes
 Sent: 15 November 2011 13:15
 To: Jonathan Gregory
 Cc: cf-metadata@cgd.ucar.edu
 Subject: Re: [CF-metadata] standards for probabilities
 
 Thank you, Jonathan! :)
 
 So, a bit more concrete, this is option 1:
 
 float rain_25(time, y, x);
  rain_25:standard_name = precipitation_amount;  
 rain_25:cell_methods = realization: percentile(25);
 
 The only problem I see with this is that in the resulting cdm 
 realization is not used anywhere, apart from possibly in cell 
 methods. But maybe this is ok?
 
 
 If I understand the second option correctly, this would lead 
 to something like this:
 
 float precipitation_amount(time, percentile, y, x);  ...
 float percentile(percentile);
  percentile:units = 1;
  percentile:standard_name = 
 cumulative_distribution_function_of_precipitation_amount;
 
 But what is the purpose of explicitly refering to 
 precipitation_amount in the standard name? would not 
 cumulative_distribution_function be better? Then the same 
 dimension could be used for other data, such as air_temperature.
 
 Or, if we want to add something about the nature of the 
 source data for the function, it could be called something 
 like cumulative_distribution_function_due_to_realization?
 
 
 I am still a bit uncertain about what is the best, though.
 
 
 -- Vegard
 
 
 
 
 - Original Message -
 Fra: Jonathan Gregory j.m.greg...@reading.ac.uk
 Til: Vegard B??nes vegard.bo...@met.no
 Kopi: cf-metadata@cgd.ucar.edu
 Sendt: 15. november 2011 11:11:52
 Emne: Re: [CF-metadata] standards for probabilities
 
 Dear Vegard
 
  I want to express such things as 25th percentile 
 precipitation amount (based on ensemble data), and 
 probability that air temperature will be within 2.5 degrees 
 of the forecast. How should I do this? 
 
 You are right, this case has not yet been dealt with, 
 although the guidelines for construction of standard names 
 foresee that needs like this might arise!
 
 If the quantity is a precipitation_amount, it's fine to use 
 that standard name. The question is how to record that is the 
 25th percentile. Two possible ways to do this would be:
 
 * To extend the possible syntax of cell_methods so that it 
 can describe percentiles. It is already possible to indicate 
 a median in cell_methods, and that is a particular 
 percentile. The advantage of this way of doing it would be 
 that you would record whether the distribution of 
 precipitation amounts being considered was for 
 time-variation, or spatial variation, or some other kind of 
 variation. Obviously you could have a probability 
 distribution with percentiles for many different independent 
 variables.
 
 * To use a size-1 or scalar coordinate variable to record the 
 probability, with a new standard_name, perhaps 
 cumulative_distribution_function_of_precipitation_amount.
 The value of this coordinate would be 0.25 for the 25th 
 percentile. The advantage of this method would be that you 
 could have several different percentiles in the same 
 variable, by having a multivalued probability coord.
 If you wanted to be specific about what the independent 
 variable was, that would have to be included in the standard 
 name as well e.g.
 cumulative_distribution_function_of_precipitation_amount_over_time.
 
 What do you think?
 
 Cheers
 
 Jonathan
 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu
 http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
 
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-16 Thread Vegard Bønes
Thank you!

I will take a little time to work on the input I have recieved, and post back 
what I have done, possibly along with a suggestion for some new standard names.

VG


- Original Message -
Fra: Evan Manning evan.m.mann...@jpl.nasa.gov
Til: cf-metadata@cgd.ucar.edu
Sendt: 15. november 2011 15:35:10
Emne: Re: [CF-metadata] standards for probabilities

Most of the discussion so far has centered on (1).  But this raises
some other issues:

 2) probability that air temperature will be within 2.5 degrees of the forecast

This is clearly trying to get at something akin to what we do with the
standard_error standard name modifier and the standard_error_multiplier
ancillary variable.  The differences are that the units are flipped and
the assumption of a normal distribution is removed.

Can we use something analogous?  Maybe a new standard name modifier
like distribution or probability or confidence which requires an ancillary
variable?

The best fit for this particular case is confidence with an ancillary variable
confidence_interval with value 2.5.

But something more like distribution is more general and could be stretched to
handle

 1)  25th percentile precipitation amount (based on ensemble data)

Here we might use the distribution (or cumulative_distribution?)
standard name
modifier, ancillary variable cumulative_probability, value 0.25.

  -- Evan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] standards for probabilities

2011-11-15 Thread Vegard Bønes
Hi!

I am trying to create a document containing various probability values for 
weather forecasts. But I do have some problems finding out how to express what 
I want to say using the cf metadata standard.

I want to express such things as 25th percentile precipitation amount (based 
on ensemble data), and probability that air temperature will be within 2.5 
degrees of the forecast. How should I do this? 

For percentiles, may I do something like this?

float precipitation_25(time, x, y) ;
precipitation_25:standard_name = precipitation_amount ;
precipitation_25:long_name = precipitation_amount 25th percentile ;
...

Also, as far as I can tell, there is no standardized names like 
probability_of_x or probability_of_x_within_y. How can I express this?



-- Vegard
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread Jonathan Gregory
Dear Vegard

 I want to express such things as 25th percentile precipitation amount 
 (based on ensemble data), and probability that air temperature will be within 
 2.5 degrees of the forecast. How should I do this? 

You are right, this case has not yet been dealt with, although the guidelines
for construction of standard names foresee that needs like this might arise!

If the quantity is a precipitation_amount, it's fine to use that standard
name. The question is how to record that is the 25th percentile. Two possible
ways to do this would be:

* To extend the possible syntax of cell_methods so that it can describe
percentiles. It is already possible to indicate a median in cell_methods, and
that is a particular percentile. The advantage of this way of doing it would
be that you would record whether the distribution of precipitation amounts
being considered was for time-variation, or spatial variation, or some other
kind of variation. Obviously you could have a probability distribution with
percentiles for many different independent variables.

* To use a size-1 or scalar coordinate variable to record the probability,
with a new standard_name, perhaps
cumulative_distribution_function_of_precipitation_amount.
The value of this coordinate would be 0.25 for the 25th percentile. The
advantage of this method would be that you could have several different
percentiles in the same variable, by having a multivalued probability coord.
If you wanted to be specific about what the independent variable was, that
would have to be included in the standard name as well e.g.
cumulative_distribution_function_of_precipitation_amount_over_time.

What do you think?

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread Vegard Bønes
Thank you, Jonathan! :)

So, a bit more concrete, this is option 1:

float rain_25(time, y, x);
 rain_25:standard_name = precipitation_amount;
 rain_25:cell_methods = realization: percentile(25);

The only problem I see with this is that in the resulting cdm realization is 
not used anywhere, apart from possibly in cell methods. But maybe this is ok?


If I understand the second option correctly, this would lead to something like 
this:

float precipitation_amount(time, percentile, y, x);
 ...
float percentile(percentile);
 percentile:units = 1;
 percentile:standard_name = 
cumulative_distribution_function_of_precipitation_amount;

But what is the purpose of explicitly refering to precipitation_amount in the 
standard name? would not cumulative_distribution_function be better? Then the 
same dimension could be used for other data, such as air_temperature.

Or, if we want to add something about the nature of the source data for the 
function, it could be called something like 
cumulative_distribution_function_due_to_realization?


I am still a bit uncertain about what is the best, though.


-- Vegard




- Original Message -
Fra: Jonathan Gregory j.m.greg...@reading.ac.uk
Til: Vegard B??nes vegard.bo...@met.no
Kopi: cf-metadata@cgd.ucar.edu
Sendt: 15. november 2011 11:11:52
Emne: Re: [CF-metadata] standards for probabilities

Dear Vegard

 I want to express such things as 25th percentile precipitation amount 
 (based on ensemble data), and probability that air temperature will be within 
 2.5 degrees of the forecast. How should I do this? 

You are right, this case has not yet been dealt with, although the guidelines
for construction of standard names foresee that needs like this might arise!

If the quantity is a precipitation_amount, it's fine to use that standard
name. The question is how to record that is the 25th percentile. Two possible
ways to do this would be:

* To extend the possible syntax of cell_methods so that it can describe
percentiles. It is already possible to indicate a median in cell_methods, and
that is a particular percentile. The advantage of this way of doing it would
be that you would record whether the distribution of precipitation amounts
being considered was for time-variation, or spatial variation, or some other
kind of variation. Obviously you could have a probability distribution with
percentiles for many different independent variables.

* To use a size-1 or scalar coordinate variable to record the probability,
with a new standard_name, perhaps
cumulative_distribution_function_of_precipitation_amount.
The value of this coordinate would be 0.25 for the 25th percentile. The
advantage of this method would be that you could have several different
percentiles in the same variable, by having a multivalued probability coord.
If you wanted to be specific about what the independent variable was, that
would have to be included in the standard name as well e.g.
cumulative_distribution_function_of_precipitation_amount_over_time.

What do you think?

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread Vegard Bønes
Dear Roy, 

Can you be a bit more concrete about why you prefer the second alternative?


-- Vegard


- Original Message -
Fra: Roy K. Lowry r...@bodc.ac.uk
Til: Jonathan Gregory j.m.greg...@reading.ac.uk, Vegard B??nes 
vegard.bo...@met.no
Kopi: cf-metadata@cgd.ucar.edu
Sendt: 15. november 2011 11:17:01
Emne: RE: [CF-metadata] standards for probabilities

Dear Jonathan,

I prefer your second alternative.  It's not what I do, but it's what I wish I 
did!!

Cheers, Roy.


From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Jonathan Gregory [j.m.greg...@reading.ac.uk]
Sent: 15 November 2011 10:11
To: Vegard B??nes
Cc: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] standards for probabilities

Dear Vegard

 I want to express such things as 25th percentile precipitation amount 
 (based on ensemble data), and probability that air temperature will be within 
 2.5 degrees of the forecast. How should I do this?

You are right, this case has not yet been dealt with, although the guidelines
for construction of standard names foresee that needs like this might arise!

If the quantity is a precipitation_amount, it's fine to use that standard
name. The question is how to record that is the 25th percentile. Two possible
ways to do this would be:

* To extend the possible syntax of cell_methods so that it can describe
percentiles. It is already possible to indicate a median in cell_methods, and
that is a particular percentile. The advantage of this way of doing it would
be that you would record whether the distribution of precipitation amounts
being considered was for time-variation, or spatial variation, or some other
kind of variation. Obviously you could have a probability distribution with
percentiles for many different independent variables.

* To use a size-1 or scalar coordinate variable to record the probability,
with a new standard_name, perhaps
cumulative_distribution_function_of_precipitation_amount.
The value of this coordinate would be 0.25 for the 25th percentile. The
advantage of this method would be that you could have several different
percentiles in the same variable, by having a multivalued probability coord.
If you wanted to be specific about what the independent variable was, that
would have to be included in the standard name as well e.g.
cumulative_distribution_function_of_precipitation_amount_over_time.

What do you think?

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread Lowry, Roy K.
Hello Vergand,

One of my jobs is running a parameter vocabulary that currently has over 27,000 
entries.  Much of its bulk is due to the assignment of multiple parameter names 
for each step in a numeric sequence - such as radiation wavelengths or sediment 
grain-size expressed as percentiles.

Consider a scenario where you start with a small group of standard percentiles 
- say 5, 25, 50, 75, 95.  You set up a parameter name for each of these in the 
first instance, which is easy. Then along comes another user who wants to 
describe data with percentiles at a resolution of 1 per cent.  So another 95 
parameter names need to be set up.  Then along comes another user who wants a 
resolution of 0.1 per cent.  I start drowning in names and nobody can find 
anything.

However, had I followed Jonathan's second solution all I would need to do as a 
vocabulary manager is set up one concept to describe the percentile axis, which 
covers every user from those who use a handful of percentiles to those whose 
percentile resolution requirements are beyond the bounds of my imagination.

I know Jonathan's first option was based on propogation of cell methods and not 
standard names.  However, these still need managing and if they become 
excessively abundant they also become difficult to navigate.

Cheers, Roy.


From: Vegard Bønes [vegard.bo...@met.no]
Sent: 15 November 2011 13:17
To: Lowry, Roy K.
Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory
Subject: Re: [CF-metadata] standards for probabilities

Dear Roy,

Can you be a bit more concrete about why you prefer the second alternative?


-- Vegard


- Original Message -
Fra: Roy K. Lowry r...@bodc.ac.uk
Til: Jonathan Gregory j.m.greg...@reading.ac.uk, Vegard B??nes 
vegard.bo...@met.no
Kopi: cf-metadata@cgd.ucar.edu
Sendt: 15. november 2011 11:17:01
Emne: RE: [CF-metadata] standards for probabilities

Dear Jonathan,

I prefer your second alternative.  It's not what I do, but it's what I wish I 
did!!

Cheers, Roy.


From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On 
Behalf Of Jonathan Gregory [j.m.greg...@reading.ac.uk]
Sent: 15 November 2011 10:11
To: Vegard B??nes
Cc: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] standards for probabilities

Dear Vegard

 I want to express such things as 25th percentile precipitation amount 
 (based on ensemble data), and probability that air temperature will be within 
 2.5 degrees of the forecast. How should I do this?

You are right, this case has not yet been dealt with, although the guidelines
for construction of standard names foresee that needs like this might arise!

If the quantity is a precipitation_amount, it's fine to use that standard
name. The question is how to record that is the 25th percentile. Two possible
ways to do this would be:

* To extend the possible syntax of cell_methods so that it can describe
percentiles. It is already possible to indicate a median in cell_methods, and
that is a particular percentile. The advantage of this way of doing it would
be that you would record whether the distribution of precipitation amounts
being considered was for time-variation, or spatial variation, or some other
kind of variation. Obviously you could have a probability distribution with
percentiles for many different independent variables.

* To use a size-1 or scalar coordinate variable to record the probability,
with a new standard_name, perhaps
cumulative_distribution_function_of_precipitation_amount.
The value of this coordinate would be 0.25 for the 25th percentile. The
advantage of this method would be that you could have several different
percentiles in the same variable, by having a multivalued probability coord.
If you wanted to be specific about what the independent variable was, that
would have to be included in the standard name as well e.g.
cumulative_distribution_function_of_precipitation_amount_over_time.

What do you think?

Cheers

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata--
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread John Caron

Hi Vegard:

I see some of these kinds of things from NCEP, encoded in GRIB, and Im 
still trying to understand what they are. So, some questions from a 
non-modeler:


On 11/15/2011 2:10 AM, Vegard Bønes wrote:

Hi!

I am trying to create a document containing various probability values for 
weather forecasts. But I do have some problems finding out how to express what 
I want to say using the cf metadata standard.

I want to express such things as 25th percentile precipitation amount (based 
on ensemble data), and probability that air temperature will be within 2.5 degrees of the 
forecast. How should I do this?


these are 2 different things, i guess?

1)  25th percentile precipitation amount (based on ensemble data)

* so here the data values are precip amounts? calculated from the 
cumulative distribution function (cdf) from an ensemble?
* do you typically have other percentile amounts in the same file, eg 50 
and 75?
* presumably this is some distillation of the cdf, used when the 
individual ensemble values are not in the file?
* is there any special handling that a generic tool could do, or is it a 
matter a just making this data available to some specialized application 
that you write?



2) probability that air temperature will be within 2.5 degrees of the 
forecast


* so here the data values are probabilities between 0 and 1 ?
* do you typically have other probabilities in the same file, eg within 
1 degree, or 5 degrees?

* is there any special handling that a generic tool could do with such info?

john
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread Evan Manning
Most of the discussion so far has centered on (1).  But this raises
some other issues:

 2) probability that air temperature will be within 2.5 degrees of the forecast

This is clearly trying to get at something akin to what we do with the
standard_error standard name modifier and the standard_error_multiplier
ancillary variable.  The differences are that the units are flipped and
the assumption of a normal distribution is removed.

Can we use something analogous?  Maybe a new standard name modifier
like distribution or probability or confidence which requires an ancillary
variable?

The best fit for this particular case is confidence with an ancillary variable
confidence_interval with value 2.5.

But something more like distribution is more general and could be stretched to
handle

 1)  25th percentile precipitation amount (based on ensemble data)

Here we might use the distribution (or cumulative_distribution?)
standard name
modifier, ancillary variable cumulative_probability, value 0.25.

  -- Evan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] standards for probabilities

2011-11-15 Thread Vegard Bønes
Hi John,

All the assumptions you state are correct. 

Regarding the usage of the data, there exists a specialized application that 
uses this data. The problem is that others are interested in the same data, and 
I have no control over how they will use it. Because of that, I want the 
generated file to follow any standards as closely as possible.


-- Vegard




- Original Message -
Fra: John Caron ca...@unidata.ucar.edu
Til: cf-metadata@cgd.ucar.edu
Sendt: 15. november 2011 14:38:24
Emne: Re: [CF-metadata] standards for probabilities

Hi Vegard:

I see some of these kinds of things from NCEP, encoded in GRIB, and Im 
still trying to understand what they are. So, some questions from a 
non-modeler:

On 11/15/2011 2:10 AM, Vegard Bønes wrote:
 Hi!

 I am trying to create a document containing various probability values for 
 weather forecasts. But I do have some problems finding out how to express 
 what I want to say using the cf metadata standard.

 I want to express such things as 25th percentile precipitation amount 
 (based on ensemble data), and probability that air temperature will be within 
 2.5 degrees of the forecast. How should I do this?

these are 2 different things, i guess?

1)  25th percentile precipitation amount (based on ensemble data)

* so here the data values are precip amounts? calculated from the 
cumulative distribution function (cdf) from an ensemble?
* do you typically have other percentile amounts in the same file, eg 50 
and 75?
* presumably this is some distillation of the cdf, used when the 
individual ensemble values are not in the file?
* is there any special handling that a generic tool could do, or is it a 
matter a just making this data available to some specialized application 
that you write?


2) probability that air temperature will be within 2.5 degrees of the 
forecast

* so here the data values are probabilities between 0 and 1 ?
* do you typically have other probabilities in the same file, eg within 
1 degree, or 5 degrees?
* is there any special handling that a generic tool could do with such info?

john
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata