Re: [CF-metadata] standards for probabilities
Dear Vegard On further thought, I wonder whether the confidence_interval could be regarded as a cell_method, if it relates to a (collapsed) axis of realization? Best wishes Jonathan On Wed, Dec 07, 2011 at 06:55:47PM +, Jonathan Gregory wrote: Date: Wed, 7 Dec 2011 18:55:47 + From: Jonathan Gregory j.m.greg...@reading.ac.uk To: Vegard B??nes vegard.bo...@met.no Cc: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] standards for probabilities User-Agent: Mutt/1.5.21 (2010-09-15) Dear Vegard Thanks for your email. Now I understand what you mean by confidence i.e. a confidence level for a value which has uncertainty. I agree, this is like other uses for standard_name modifiers, in particular the standard_error modifier. You need to link it to an extra dimension, and I suggest that the best way to do this would be through a standard_name. For instance, define the standard_name modifier of confidence_interval (confidence alone seems a bit vague - that's why I didn't understand what you meant), and state that if a variable has this modifier, it must have a coordinate variable or scalar coordinate variable whose standard_name is confidence_level. Both the modifier and the standard_name would be additions to CF. ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Dear Lorenzo Thank you for your email. I think the cell-methods mechanism has a partial overlapping with netCDF-U, in that it can account for (some of the) UncertML Summary Statistics concepts. However, it does not currently address Distributions and Samples. We could think of extending it, but we preferred to introduce a new mechanism, based on the standard URI syntax and RDF semantics. On the other hand, the cell-methods mechanism is arguably more fine-grained than netCDF-U, allowing to express different methods on multi-dimensional variables, particular as far as the semantics of dimension intervals is concerned. Yes, I agree with your last point. An important aspect of cell_methods is that it relates to particular axes. Describing a quantity just as a variance, for instance, can be rather vague: it may be necessary to know if it's a variance over space, over time or over ensemble members, for example. Possibly you could consider including your URIs and some other extra information as comments in cell_methods. These would be legal but unstandardised as far as CF is concerned, but you could standardise them in your convention e.g. double biotemperature_variance(lat,lon); biotemperature_variance:units = degC; // shouldn't it be degC^2 for a variance? biotemperature_variance:cell_methods=realization: variance (ref http://www.uncertml.org/distributions/normal#variance) The cell_methods here refers to realization as a standard name, which is allowed even though realization isn't a dimension. If you do have a dimension for realization, as in one of your examples, the coordinate variable for that dimension could have a standard_name=realization attribute. If the variance was over an existing dimension, that could be used e.g. double biotemperature_mean(time,lat,lon): biotemperature_mean:units = degC; biotemperature_mean:cell_methods=time: mean (ref http://www.uncertml.org/distributions/normal#mean) Of course, this will only work for those statistical methods which are allowed by cell_methods. However, you could propose others to include in Appendix E if they are ways of computing statistics like those. Looking at your examples, I wonder why you have, for instance lon:_CoordinateAxisType = Lon; What is the need for this new attribute? CF already offers these two methods to indicate such an axis: lon:axis=X; lon:standard_name=longitude; and in addition, the units of degrees_east imply that it is longitude. Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Dear Vegard, all, Please find enclosed the draft netCDF-U conventions, as presented to the Earth System Science DWG at the last OGC TC Meeting in Bruxelles last wednesday. We are continuing the discussion on the draft within the relevant OGC Working Groups, to address and incorporate comments and improve the document for the next Meeting (march 2012, Austin, Texas). The contribution of the CF-metadata community would be very welcome. To facilitate communication and avoid cross-posting, I would ask the interested readers to contact me, so we can keep them posted (possibly we may set up a dedicated mailing list, if need be). As you will read in the attachment, we have tried to be convention-neutral, in particular making sure that netCDF-U fully integrates with the netCDF-CF Conventions, using the same constructs when possible (e.g. the ancillary_variables attribute). Ideally, we think of datasets that would conform to _both_ the conventions: :Conventions = CF-1.5 UW-1.0 NetCDF-U is based on a generic mechanism for annotating netCDF variables according to the UncertML conceptual model. The rationale for this it that we argue a probabilistic description of scientific quantities is a cross-cutting aspect that may be modularized. In particular, we wouldn't clutter the application-level dictionaries with probabilistic concepts and jargon. This said, I think the cell-methods mechanism has a partial overlapping with netCDF-U, in that it can account for (some of the) UncertML Summary Statistics concepts. However, it does not currently address Distributions and Samples. We could think of extending it, but we preferred to introduce a new mechanism, based on the standard URI syntax and RDF semantics. On the other hand, the cell-methods mechanism is arguably more fine-grained than netCDF-U, allowing to express different methods on multi-dimensional variables, particular as far as the semantics of dimension intervals is concerned. I am unclear at the moment how much this is application dependent, and hence strictly pertaining to the CF conventions. In other words, it may be that some application would still need the more fine-grained semantics of CF cell-methods to make sense of the data. This is in line with our aim of netCDF-U being complementary to netCDF-CF, and not replacing it. On the other hand, a generic mechanism for associating distinct summary statistics semantics to distinct dimension variables may be useful in general. We are looking forward to investigate this further and we welcome your expression of interest in being involved in this discussion. Regards, LB Il giorno 29/nov/2011, alle ore 10:17, Vegard Bønes ha scritto: Hi, Unfortunately, I did not see your email until today, so I have not taken it into consideration when choosing how to handle probabilities for myself. Like Roy, however, I believe that expressing probabilities as dimensions rather than attributes is the best approach for this. The problem with attributes is that if you have many variables and many probabilities or percentiles, you will get a huge amount of separate variables in your data set, and the relationship between them is not as easy to see as when (for example) all percentiles for air temperature are lumped into the same variable. If many users are like me, and use ncdump and ncview to look at the data, this will be a huge advantage. VG - Original Message - Fra: Lorenzo Bigagli lorenzo.biga...@pin.unifi.it Til: cf-metadata@cgd.ucar.edu, vegard bones vegard.bo...@met.no Kopi: Nativi Stefano stefano.nat...@cnr.it Sendt: 24. november 2011 18:07:07 Emne: [CF-metadata] standards for probabilities Dear Vegard, all, I take the opportunity to inform you that we are drafting a proposal for a netCDF convention on uncertainty (NetCDF-U). This work is partly developed in the framework of the FP7 UncertWeb project. We are going to present it next wednesday at the Open Geospatial Consortium TC Meeting in Bruxelles, to circulate it shortly after. We have tried to be convention-neutral, in particular making sure that netCDF-U fully integrates with the netCDF-CF Conventions, even using the same constructs when possible (e.g. the ancillary_variables attribute). Ideally, we think of datasets that would conform to both the conventions: :Conventions = CF-1.5 UW-1.0 NetCDF-U is based on a generic mechanism for annotating netCDF variables according to the UncertML conceptual model. The first example in your use-case would read something like (note that CF attributes are unchanged): float precipitation_25(time, x, y) ; precipitation_25:standard_name = precipitation_amount ; precipitation_25:long_name = precipitation_amount 25th percentile ; precipitation_25:ref = http://www.uncertml.org/statistics/percentile ; precipitation_25:level = 25 ; The second, provided we have a variable
Re: [CF-metadata] standards for probabilities
Dear Vegard A dimension (and variable) for specifying percentiles: float percentile(percentile) ; percentile:units = 1 ; percentile:standard_name = cumulative_distribution_function ; float air_temperature_percentiles(time, percentile, latitude, longitude) ; air_temperature_percentiles:units = K ; air_temperature_percentiles:standard_name = air_temperature ; This looks sensible to me. Are you proposing cumulative_distribution_function as a new standard name? ...an alternative for percentile could be cumulative_distribution_function_over_realization. Yes. That would be more informative, and therefore preferable, I think. Then, there is the problem of certainty that a temperature will be within a given range. Could you do that like this: float air_temperature(air_temperature); air_temperature:bounds=air_temperature_bounds; air_temperature:units=K; float air_temperature_bounds(air_temperature,2); float air_temperature_confidence(time,air_temperature,latitude,longitude); air_temperature_confidence:standard_name=probability; Then the air_temperature_bounds specify the ranges of air temperature for which the probability is evaluated. probability would be a new standard name as well. Again it could be made more informative as something like probability_over_realization. This is instead of a standard_name modifier and seems more consistent to me with the treatment of percentile. It is a kind of transpose. Also, which I did not mention in the previous emails, I also wanted to express the probability of at least x mm of precipitation. This can be done in the same way as the probability of air temperature ranges, with the upper bounds for precipitation ranges set high enough that they are effectively infinite. When working with this, I found that expressing percentiles and probabilities as dimensions, instead of attributes, made the relationship between them more intuitive. I agree. Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Hi, Unfortunately, I did not see your email until today, so I have not taken it into consideration when choosing how to handle probabilities for myself. Like Roy, however, I believe that expressing probabilities as dimensions rather than attributes is the best approach for this. The problem with attributes is that if you have many variables and many probabilities or percentiles, you will get a huge amount of separate variables in your data set, and the relationship between them is not as easy to see as when (for example) all percentiles for air temperature are lumped into the same variable. If many users are like me, and use ncdump and ncview to look at the data, this will be a huge advantage. VG - Original Message - Fra: Lorenzo Bigagli lorenzo.biga...@pin.unifi.it Til: cf-metadata@cgd.ucar.edu, vegard bones vegard.bo...@met.no Kopi: Nativi Stefano stefano.nat...@cnr.it Sendt: 24. november 2011 18:07:07 Emne: [CF-metadata] standards for probabilities Dear Vegard, all, I take the opportunity to inform you that we are drafting a proposal for a netCDF convention on uncertainty (NetCDF-U). This work is partly developed in the framework of the FP7 UncertWeb project. We are going to present it next wednesday at the Open Geospatial Consortium TC Meeting in Bruxelles, to circulate it shortly after. We have tried to be convention-neutral, in particular making sure that netCDF-U fully integrates with the netCDF-CF Conventions, even using the same constructs when possible (e.g. the ancillary_variables attribute). Ideally, we think of datasets that would conform to both the conventions: :Conventions = CF-1.5 UW-1.0 NetCDF-U is based on a generic mechanism for annotating netCDF variables according to the UncertML conceptual model. The first example in your use-case would read something like (note that CF attributes are unchanged): float precipitation_25(time, x, y) ; precipitation_25:standard_name = precipitation_amount ; precipitation_25:long_name = precipitation_amount 25th percentile ; precipitation_25:ref = http://www.uncertml.org/statistics/percentile ; precipitation_25:level = 25 ; The second, provided we have a variable difference(Lat=100, Lon=100) that contains the difference between the observed value and the forecast: float probability(Lat=100, Lon=100) ; probability:ref = http://www.uncertml.org/statistics/probability ; probability:gt = -2.5 ; probability:lt = 2.5 ; I apologize if this is not clear enough, for the moment, and I hope it can be of prospective interest. Any comment is very appreciated. Best regards, Lorenzo Bigagli --- Dott. Lorenzo Bigagli Consiglio Nazionale delle Ricerche Istituto di Metodologie per l'Analisi Ambientale (CNR-IMAA) i: Area della Ricerca di Potenza, Contrada Santa Loja Zona Industriale, 85050 Tito Scalo (PZ), Italia t: +39 0971 427221 f: +39 0971 427222 m: lorenzo.biga...@cnr.it ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
[CF-metadata] standards for probabilities
Dear Vegard, all, I take the opportunity to inform you that we are drafting a proposal for a netCDF convention on uncertainty (NetCDF-U). This work is partly developed in the framework of the FP7 UncertWeb project. We are going to present it next wednesday at the Open Geospatial Consortium TC Meeting in Bruxelles, to circulate it shortly after. We have tried to be convention-neutral, in particular making sure that netCDF-U fully integrates with the netCDF-CF Conventions, even using the same constructs when possible (e.g. the ancillary_variables attribute). Ideally, we think of datasets that would conform to both the conventions: :Conventions = CF-1.5 UW-1.0 NetCDF-U is based on a generic mechanism for annotating netCDF variables according to the UncertML conceptual model. The first example in your use-case would read something like (note that CF attributes are unchanged): float precipitation_25(time, x, y) ; precipitation_25:standard_name = precipitation_amount ; precipitation_25:long_name = precipitation_amount 25th percentile ; precipitation_25:ref = http://www.uncertml.org/statistics/percentile; ; precipitation_25:level = 25 ; The second, provided we have a variable difference(Lat=100, Lon=100) that contains the difference between the observed value and the forecast: float probability(Lat=100, Lon=100) ; probability:ref = http://www.uncertml.org/statistics/probability; ; probability:gt = -2.5 ; probability:lt = 2.5 ; I apologize if this is not clear enough, for the moment, and I hope it can be of prospective interest. Any comment is very appreciated. Best regards, Lorenzo Bigagli --- Dott. Lorenzo Bigagli Consiglio Nazionale delle Ricerche Istituto di Metodologie per l'Analisi Ambientale (CNR-IMAA) i: Area della Ricerca di Potenza, Contrada Santa Loja Zona Industriale, 85050 Tito Scalo (PZ), Italia t: +39 0971 427221 f: +39 0971 427222 m: lorenzo.biga...@cnr.it ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Hello Vegard, Sorry it looks like my memory is more flawed than I thought. I can't find realization_weight in the standard name table, although I thought I remembered it being added. I think a possible entry point into the discussion at the time around this can be found at http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2007/019229.html Though the whole discussion of probabilities and ensembles is one of those things that it seems hard to keep the momentum going on (at least for me) given how hard the problem is and the available brain cycles. Jamie -Original Message- From: Vegard Bønes [mailto:vegard.bo...@met.no] Sent: 16 November 2011 12:19 To: Kettleborough, Jamie Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory Subject: Re: [CF-metadata] standards for probabilities Hi, The methods for estimating probabililties are non-trivial, and may change over time. Because of that I will prefer to keep information about the exact process outside of the generated file. I have not been able able to find any references to realization_weight in the standard documents. Could you please refer me to the right place? VG - Original Message - Fra: Jamie Kettleborough jamie.kettleboro...@metoffice.gov.uk Til: Vegard Bønes vegard.bo...@met.no, Jonathan Gregory j.m.greg...@reading.ac.uk Kopi: cf-metadata@cgd.ucar.edu, Jamie Kettleborough jamie.kettleboro...@metoffice.gov.uk Sendt: 16. november 2011 11:53:22 Emne: RE: [CF-metadata] standards for probabilities Hello Vegard, How do you generate your cdf from your realisations? Do you simply weight each ensemble member equally? I think there are cases where you may weight by some measure of how 'good' you think the ensemble member is (some sort of measure of its error - you downweight those with high errors). If you are storing the output from ensemble members in the file then think cf allows for this using the 'realzation_weight' standard name - to store your errors/weights in the file. Furthermore you may want to know the sensitivity of your cdf to your error estimates so you could have more than one cdf for the same variable, but based on different ways of deriving the errors/weights. Is this something CF needs to worry about, or is it a case of trying to add something that's not really needed yet? Or maybe this is not in scope for CF anyway, and it should be left to something more like 'audit/history/provenance' meta data? Jamie -Original Message- From: cf-metadata-boun...@cgd.ucar.edu [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Vegard Bønes Sent: 15 November 2011 13:15 To: Jonathan Gregory Cc: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] standards for probabilities Thank you, Jonathan! :) So, a bit more concrete, this is option 1: float rain_25(time, y, x); rain_25:standard_name = precipitation_amount; rain_25:cell_methods = realization: percentile(25); The only problem I see with this is that in the resulting cdm realization is not used anywhere, apart from possibly in cell methods. But maybe this is ok? If I understand the second option correctly, this would lead to something like this: float precipitation_amount(time, percentile, y, x); ... float percentile(percentile); percentile:units = 1; percentile:standard_name = cumulative_distribution_function_of_precipitation_amount; But what is the purpose of explicitly refering to precipitation_amount in the standard name? would not cumulative_distribution_function be better? Then the same dimension could be used for other data, such as air_temperature. Or, if we want to add something about the nature of the source data for the function, it could be called something like cumulative_distribution_function_due_to_realization? I am still a bit uncertain about what is the best, though. -- Vegard - Original Message - Fra: Jonathan Gregory j.m.greg...@reading.ac.uk Til: Vegard B??nes vegard.bo...@met.no Kopi: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 11:11:52 Emne: Re: [CF-metadata] standards for probabilities Dear Vegard I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? You are right, this case has not yet been dealt with, although the guidelines for construction of standard names foresee that needs like this might arise! If the quantity is a precipitation_amount, it's fine to use that standard name. The question is how to record that is the 25th percentile. Two possible ways to do this would be: * To extend the possible syntax of cell_methods so that it can describe percentiles. It is already possible to indicate a median
Re: [CF-metadata] standards for probabilities
Dear Vegard Sorry for slow response. I've been very busy this week. So, a bit more concrete, this is option 1: float rain_25(time, y, x); rain_25:standard_name = precipitation_amount; rain_25:cell_methods = realization: percentile(25); Yes, except that cell_method only refers to variables and doesn't contain constants, at the moment. Therefore I was thinking it could be something like float rain(time,y,x); rain:cell_methods=realization: percentile pvar; float pvar(pvar); and pvar is a coordinate variable which specifies the percentile(s). If there is only one percentile, the dimension pvar=1, or pvar could be a scalar. This syntax is the like the second one in the CF standard 7.3.3, for statistics apply to different area-types, for the same reason: it needs to refer to a coordinate variable in evaluating a statistic. The only problem I see with this is that in the resulting cdm realization is notused anywhere, apart from possibly in cell methods. But maybe this is ok? Yes, it is OK, because standard_names can be included in cell_methods, and realization is a standard_name. Option 2: float precipitation_amount(time, percentile, y, x); ... float percentile(percentile); percentile:units = 1; percentile:standard_name = cumulative_distribution_function_of_precipitation_amount; To make this method as informative as option 1, the standard_name would be cumulative_distribution_function_of_precipitation_amount_over_realization. In option 1, over realization is indicated by the cell_methods. You ask, But what is the purpose of explicitly refering to precipitation_amount in the standard name? would not cumulative_distribution_function be better? Then the same dimension could be used for other data, such as air_temperature. I agree that would be an advantage. I suggested that precipitation_amount should be stated by analogy with the guidelines for probability_density_function_of_X. For a PDF, the units depend on what X is, so you must have a standard_name which includes X. A CDF and a PDF are related concepts. However, this is not a strong argument. If you had a PDF, it would probably be a data variable, not a coordinate variable like your CDF is here. Regarding Roy's comment, I agree with his concern about profileration of percentiles, but I think both of these options allow that generality, as in both cases the percentile value(s) are in variables. The advantage of option 2 is that it only requires new standard names, whereas option 1 requires an alteration to the CF convention, and it's a bit simpler. The advantage of option 1 is that it's more compact, and it is natural to regard percentiles as a cell_method, I would argue. I'm not sure which is better. Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Hello Vegard, How do you generate your cdf from your realisations? Do you simply weight each ensemble member equally? I think there are cases where you may weight by some measure of how 'good' you think the ensemble member is (some sort of measure of its error - you downweight those with high errors). If you are storing the output from ensemble members in the file then think cf allows for this using the 'realzation_weight' standard name - to store your errors/weights in the file. Furthermore you may want to know the sensitivity of your cdf to your error estimates so you could have more than one cdf for the same variable, but based on different ways of deriving the errors/weights. Is this something CF needs to worry about, or is it a case of trying to add something that's not really needed yet? Or maybe this is not in scope for CF anyway, and it should be left to something more like 'audit/history/provenance' meta data? Jamie -Original Message- From: cf-metadata-boun...@cgd.ucar.edu [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Vegard Bønes Sent: 15 November 2011 13:15 To: Jonathan Gregory Cc: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] standards for probabilities Thank you, Jonathan! :) So, a bit more concrete, this is option 1: float rain_25(time, y, x); rain_25:standard_name = precipitation_amount; rain_25:cell_methods = realization: percentile(25); The only problem I see with this is that in the resulting cdm realization is not used anywhere, apart from possibly in cell methods. But maybe this is ok? If I understand the second option correctly, this would lead to something like this: float precipitation_amount(time, percentile, y, x); ... float percentile(percentile); percentile:units = 1; percentile:standard_name = cumulative_distribution_function_of_precipitation_amount; But what is the purpose of explicitly refering to precipitation_amount in the standard name? would not cumulative_distribution_function be better? Then the same dimension could be used for other data, such as air_temperature. Or, if we want to add something about the nature of the source data for the function, it could be called something like cumulative_distribution_function_due_to_realization? I am still a bit uncertain about what is the best, though. -- Vegard - Original Message - Fra: Jonathan Gregory j.m.greg...@reading.ac.uk Til: Vegard B??nes vegard.bo...@met.no Kopi: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 11:11:52 Emne: Re: [CF-metadata] standards for probabilities Dear Vegard I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? You are right, this case has not yet been dealt with, although the guidelines for construction of standard names foresee that needs like this might arise! If the quantity is a precipitation_amount, it's fine to use that standard name. The question is how to record that is the 25th percentile. Two possible ways to do this would be: * To extend the possible syntax of cell_methods so that it can describe percentiles. It is already possible to indicate a median in cell_methods, and that is a particular percentile. The advantage of this way of doing it would be that you would record whether the distribution of precipitation amounts being considered was for time-variation, or spatial variation, or some other kind of variation. Obviously you could have a probability distribution with percentiles for many different independent variables. * To use a size-1 or scalar coordinate variable to record the probability, with a new standard_name, perhaps cumulative_distribution_function_of_precipitation_amount. The value of this coordinate would be 0.25 for the 25th percentile. The advantage of this method would be that you could have several different percentiles in the same variable, by having a multivalued probability coord. If you wanted to be specific about what the independent variable was, that would have to be included in the standard name as well e.g. cumulative_distribution_function_of_precipitation_amount_over_time. What do you think? Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Thank you! I will take a little time to work on the input I have recieved, and post back what I have done, possibly along with a suggestion for some new standard names. VG - Original Message - Fra: Evan Manning evan.m.mann...@jpl.nasa.gov Til: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 15:35:10 Emne: Re: [CF-metadata] standards for probabilities Most of the discussion so far has centered on (1). But this raises some other issues: 2) probability that air temperature will be within 2.5 degrees of the forecast This is clearly trying to get at something akin to what we do with the standard_error standard name modifier and the standard_error_multiplier ancillary variable. The differences are that the units are flipped and the assumption of a normal distribution is removed. Can we use something analogous? Maybe a new standard name modifier like distribution or probability or confidence which requires an ancillary variable? The best fit for this particular case is confidence with an ancillary variable confidence_interval with value 2.5. But something more like distribution is more general and could be stretched to handle 1) 25th percentile precipitation amount (based on ensemble data) Here we might use the distribution (or cumulative_distribution?) standard name modifier, ancillary variable cumulative_probability, value 0.25. -- Evan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
[CF-metadata] standards for probabilities
Hi! I am trying to create a document containing various probability values for weather forecasts. But I do have some problems finding out how to express what I want to say using the cf metadata standard. I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? For percentiles, may I do something like this? float precipitation_25(time, x, y) ; precipitation_25:standard_name = precipitation_amount ; precipitation_25:long_name = precipitation_amount 25th percentile ; ... Also, as far as I can tell, there is no standardized names like probability_of_x or probability_of_x_within_y. How can I express this? -- Vegard ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Dear Vegard I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? You are right, this case has not yet been dealt with, although the guidelines for construction of standard names foresee that needs like this might arise! If the quantity is a precipitation_amount, it's fine to use that standard name. The question is how to record that is the 25th percentile. Two possible ways to do this would be: * To extend the possible syntax of cell_methods so that it can describe percentiles. It is already possible to indicate a median in cell_methods, and that is a particular percentile. The advantage of this way of doing it would be that you would record whether the distribution of precipitation amounts being considered was for time-variation, or spatial variation, or some other kind of variation. Obviously you could have a probability distribution with percentiles for many different independent variables. * To use a size-1 or scalar coordinate variable to record the probability, with a new standard_name, perhaps cumulative_distribution_function_of_precipitation_amount. The value of this coordinate would be 0.25 for the 25th percentile. The advantage of this method would be that you could have several different percentiles in the same variable, by having a multivalued probability coord. If you wanted to be specific about what the independent variable was, that would have to be included in the standard name as well e.g. cumulative_distribution_function_of_precipitation_amount_over_time. What do you think? Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Thank you, Jonathan! :) So, a bit more concrete, this is option 1: float rain_25(time, y, x); rain_25:standard_name = precipitation_amount; rain_25:cell_methods = realization: percentile(25); The only problem I see with this is that in the resulting cdm realization is not used anywhere, apart from possibly in cell methods. But maybe this is ok? If I understand the second option correctly, this would lead to something like this: float precipitation_amount(time, percentile, y, x); ... float percentile(percentile); percentile:units = 1; percentile:standard_name = cumulative_distribution_function_of_precipitation_amount; But what is the purpose of explicitly refering to precipitation_amount in the standard name? would not cumulative_distribution_function be better? Then the same dimension could be used for other data, such as air_temperature. Or, if we want to add something about the nature of the source data for the function, it could be called something like cumulative_distribution_function_due_to_realization? I am still a bit uncertain about what is the best, though. -- Vegard - Original Message - Fra: Jonathan Gregory j.m.greg...@reading.ac.uk Til: Vegard B??nes vegard.bo...@met.no Kopi: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 11:11:52 Emne: Re: [CF-metadata] standards for probabilities Dear Vegard I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? You are right, this case has not yet been dealt with, although the guidelines for construction of standard names foresee that needs like this might arise! If the quantity is a precipitation_amount, it's fine to use that standard name. The question is how to record that is the 25th percentile. Two possible ways to do this would be: * To extend the possible syntax of cell_methods so that it can describe percentiles. It is already possible to indicate a median in cell_methods, and that is a particular percentile. The advantage of this way of doing it would be that you would record whether the distribution of precipitation amounts being considered was for time-variation, or spatial variation, or some other kind of variation. Obviously you could have a probability distribution with percentiles for many different independent variables. * To use a size-1 or scalar coordinate variable to record the probability, with a new standard_name, perhaps cumulative_distribution_function_of_precipitation_amount. The value of this coordinate would be 0.25 for the 25th percentile. The advantage of this method would be that you could have several different percentiles in the same variable, by having a multivalued probability coord. If you wanted to be specific about what the independent variable was, that would have to be included in the standard name as well e.g. cumulative_distribution_function_of_precipitation_amount_over_time. What do you think? Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Dear Roy, Can you be a bit more concrete about why you prefer the second alternative? -- Vegard - Original Message - Fra: Roy K. Lowry r...@bodc.ac.uk Til: Jonathan Gregory j.m.greg...@reading.ac.uk, Vegard B??nes vegard.bo...@met.no Kopi: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 11:17:01 Emne: RE: [CF-metadata] standards for probabilities Dear Jonathan, I prefer your second alternative. It's not what I do, but it's what I wish I did!! Cheers, Roy. From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory [j.m.greg...@reading.ac.uk] Sent: 15 November 2011 10:11 To: Vegard B??nes Cc: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] standards for probabilities Dear Vegard I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? You are right, this case has not yet been dealt with, although the guidelines for construction of standard names foresee that needs like this might arise! If the quantity is a precipitation_amount, it's fine to use that standard name. The question is how to record that is the 25th percentile. Two possible ways to do this would be: * To extend the possible syntax of cell_methods so that it can describe percentiles. It is already possible to indicate a median in cell_methods, and that is a particular percentile. The advantage of this way of doing it would be that you would record whether the distribution of precipitation amounts being considered was for time-variation, or spatial variation, or some other kind of variation. Obviously you could have a probability distribution with percentiles for many different independent variables. * To use a size-1 or scalar coordinate variable to record the probability, with a new standard_name, perhaps cumulative_distribution_function_of_precipitation_amount. The value of this coordinate would be 0.25 for the 25th percentile. The advantage of this method would be that you could have several different percentiles in the same variable, by having a multivalued probability coord. If you wanted to be specific about what the independent variable was, that would have to be included in the standard name as well e.g. cumulative_distribution_function_of_precipitation_amount_over_time. What do you think? Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata-- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Hello Vergand, One of my jobs is running a parameter vocabulary that currently has over 27,000 entries. Much of its bulk is due to the assignment of multiple parameter names for each step in a numeric sequence - such as radiation wavelengths or sediment grain-size expressed as percentiles. Consider a scenario where you start with a small group of standard percentiles - say 5, 25, 50, 75, 95. You set up a parameter name for each of these in the first instance, which is easy. Then along comes another user who wants to describe data with percentiles at a resolution of 1 per cent. So another 95 parameter names need to be set up. Then along comes another user who wants a resolution of 0.1 per cent. I start drowning in names and nobody can find anything. However, had I followed Jonathan's second solution all I would need to do as a vocabulary manager is set up one concept to describe the percentile axis, which covers every user from those who use a handful of percentiles to those whose percentile resolution requirements are beyond the bounds of my imagination. I know Jonathan's first option was based on propogation of cell methods and not standard names. However, these still need managing and if they become excessively abundant they also become difficult to navigate. Cheers, Roy. From: Vegard Bønes [vegard.bo...@met.no] Sent: 15 November 2011 13:17 To: Lowry, Roy K. Cc: cf-metadata@cgd.ucar.edu; Jonathan Gregory Subject: Re: [CF-metadata] standards for probabilities Dear Roy, Can you be a bit more concrete about why you prefer the second alternative? -- Vegard - Original Message - Fra: Roy K. Lowry r...@bodc.ac.uk Til: Jonathan Gregory j.m.greg...@reading.ac.uk, Vegard B??nes vegard.bo...@met.no Kopi: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 11:17:01 Emne: RE: [CF-metadata] standards for probabilities Dear Jonathan, I prefer your second alternative. It's not what I do, but it's what I wish I did!! Cheers, Roy. From: cf-metadata-boun...@cgd.ucar.edu [cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory [j.m.greg...@reading.ac.uk] Sent: 15 November 2011 10:11 To: Vegard B??nes Cc: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] standards for probabilities Dear Vegard I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? You are right, this case has not yet been dealt with, although the guidelines for construction of standard names foresee that needs like this might arise! If the quantity is a precipitation_amount, it's fine to use that standard name. The question is how to record that is the 25th percentile. Two possible ways to do this would be: * To extend the possible syntax of cell_methods so that it can describe percentiles. It is already possible to indicate a median in cell_methods, and that is a particular percentile. The advantage of this way of doing it would be that you would record whether the distribution of precipitation amounts being considered was for time-variation, or spatial variation, or some other kind of variation. Obviously you could have a probability distribution with percentiles for many different independent variables. * To use a size-1 or scalar coordinate variable to record the probability, with a new standard_name, perhaps cumulative_distribution_function_of_precipitation_amount. The value of this coordinate would be 0.25 for the 25th percentile. The advantage of this method would be that you could have several different percentiles in the same variable, by having a multivalued probability coord. If you wanted to be specific about what the independent variable was, that would have to be included in the standard name as well e.g. cumulative_distribution_function_of_precipitation_amount_over_time. What do you think? Cheers Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata-- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Hi Vegard: I see some of these kinds of things from NCEP, encoded in GRIB, and Im still trying to understand what they are. So, some questions from a non-modeler: On 11/15/2011 2:10 AM, Vegard Bønes wrote: Hi! I am trying to create a document containing various probability values for weather forecasts. But I do have some problems finding out how to express what I want to say using the cf metadata standard. I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? these are 2 different things, i guess? 1) 25th percentile precipitation amount (based on ensemble data) * so here the data values are precip amounts? calculated from the cumulative distribution function (cdf) from an ensemble? * do you typically have other percentile amounts in the same file, eg 50 and 75? * presumably this is some distillation of the cdf, used when the individual ensemble values are not in the file? * is there any special handling that a generic tool could do, or is it a matter a just making this data available to some specialized application that you write? 2) probability that air temperature will be within 2.5 degrees of the forecast * so here the data values are probabilities between 0 and 1 ? * do you typically have other probabilities in the same file, eg within 1 degree, or 5 degrees? * is there any special handling that a generic tool could do with such info? john ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Most of the discussion so far has centered on (1). But this raises some other issues: 2) probability that air temperature will be within 2.5 degrees of the forecast This is clearly trying to get at something akin to what we do with the standard_error standard name modifier and the standard_error_multiplier ancillary variable. The differences are that the units are flipped and the assumption of a normal distribution is removed. Can we use something analogous? Maybe a new standard name modifier like distribution or probability or confidence which requires an ancillary variable? The best fit for this particular case is confidence with an ancillary variable confidence_interval with value 2.5. But something more like distribution is more general and could be stretched to handle 1) 25th percentile precipitation amount (based on ensemble data) Here we might use the distribution (or cumulative_distribution?) standard name modifier, ancillary variable cumulative_probability, value 0.25. -- Evan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] standards for probabilities
Hi John, All the assumptions you state are correct. Regarding the usage of the data, there exists a specialized application that uses this data. The problem is that others are interested in the same data, and I have no control over how they will use it. Because of that, I want the generated file to follow any standards as closely as possible. -- Vegard - Original Message - Fra: John Caron ca...@unidata.ucar.edu Til: cf-metadata@cgd.ucar.edu Sendt: 15. november 2011 14:38:24 Emne: Re: [CF-metadata] standards for probabilities Hi Vegard: I see some of these kinds of things from NCEP, encoded in GRIB, and Im still trying to understand what they are. So, some questions from a non-modeler: On 11/15/2011 2:10 AM, Vegard Bønes wrote: Hi! I am trying to create a document containing various probability values for weather forecasts. But I do have some problems finding out how to express what I want to say using the cf metadata standard. I want to express such things as 25th percentile precipitation amount (based on ensemble data), and probability that air temperature will be within 2.5 degrees of the forecast. How should I do this? these are 2 different things, i guess? 1) 25th percentile precipitation amount (based on ensemble data) * so here the data values are precip amounts? calculated from the cumulative distribution function (cdf) from an ensemble? * do you typically have other percentile amounts in the same file, eg 50 and 75? * presumably this is some distillation of the cdf, used when the individual ensemble values are not in the file? * is there any special handling that a generic tool could do, or is it a matter a just making this data available to some specialized application that you write? 2) probability that air temperature will be within 2.5 degrees of the forecast * so here the data values are probabilities between 0 and 1 ? * do you typically have other probabilities in the same file, eg within 1 degree, or 5 degrees? * is there any special handling that a generic tool could do with such info? john ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata