Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables

2019-07-26 Thread Andrew Barna
Hi Everyone,

I've been re reading all these emails and having some long conversations
with colleagues about this proposal and still I can't seem to convince
myself that it is a good idea.

The initial request seemed to be motivated by wanting to distinguish
"quality" from "status" based on standard name alone. This distinction can
currently be accomplished by using the "flag_meanings" attribute. This name
is hardly unique in needing additional information, many of the radiation
names need (often optional) wavelength coordinates. If you are doing any
custom calendars or grids, all these need additional attributes or
information to properly interpret the data in the variable.

Having multiple flag variables in a file shouldn't be a problem, WOCE did
it for "originator" vs  "expert" QC. If you really don't want more than one
flag variable, the flag_masks attribute allows for combining all these
states together, combing that further with flag_meanings even allows you to
define which combinations are valid. My group is considering having
multiple flag schemes in the file (WOCE and ARGO), so you can just use the
one that you like best.

My colleagues expressed concern that this would cause significant confusion
to new users who are trying to adopt CF as to which "flag" name to use for
their data. And also the added complication of needing to look for more
than one name when looking for flag information.

I feel that this issue is best resolved with some clarifying updates to the
CF document itself, especially some new examples to show how flags can be
used, and not with a new name for this metadata variable.

-Barna



On Wed, Jul 24, 2019 at 2:20 PM Kehoe, Kenneth E.  wrote:

> Barna,
>
> I plan to propose some updates to CF document once this name is in the
> standard name list. It will be a lot easier to have my proposed changed
> accepted if the standard name is already accepted.
>
> Ken
>
>
>
> On 2019-7-24 13:21, Andrew Barna wrote:
>
> I've never personally liked the name "status_flag" and have always
> interpreted it to be the "CF way" of saying "these values are either an
> associative array or bit field or some combination of both". It is also a
> special case of standard names in that two variables with the standard name
> "status_flag" may not be comparable, a situation which will not change with
> an added "quality_flag", that is, two variables with the standard name of
> "quality_flag" also may not be comparable.
>
> Since the actual meaning of the values contained in a variable with the
> standard name "status_flag" would need to be derived from the various other
> flag_* attributes, I saw this proposal as an added complication. When
> looking at a variable with the standard name "quality_flag", I would still
> would not know the meanings of the values until interpreting the various
> other flag_* attributes.
>
> I think adding this new name would also require some updates to the CF
> document itself, section 3.5 and probably Appendix C, to note that there
> are would now be multiple names which trigger the interpretation of the
> values as per that section.
>
>
> On Wed, Jul 24, 2019 at 11:40 AM John Graybeal 
> wrote:
>
>> I support the point about defining 'status' and 'quality'. Yes, there are
>> cases when we define terms that are re-used, but I don't think these terms
>> are reused, they appear only in these flags. Just defining the standard
>> name should do.
>>
>> Ken, I did like the qualifying text about status_flag but maybe that's
>> because I always thought status_flag could be used that way, as a status of
>> instruments. Looking at the definition (
>> http://mmisw.org/cfsn/#/search/status
>> )
>>  it
>> doesn't say that, does it?  It's all about the data. I even searched the
>> archives, I was so sure people talked about it in another way, but I can't
>> find any evidence of that.
>>
>> So I conclude equipment status is not included in the model currently
>> supported by status flag, and we shouldn't try to fix that here. What do
>> you think?
>>
>> John
>>
>> On Jul 24, 2019, at 10:34 AM, Kehoe, Kenneth E.  wrote:
>>
>> Daniel,
>>
>> Thanks for the information. At some point we should chat about how our
>> two organizations think about and perform quality analysis.
>>
>> Martin,
>>
>> I'm confused about your suggestion to include definitions of status and
>> quality. I guess we could define those terms better in the general standard
>> name table, but that is not my intention. My concern is that the definition
>> of those terms is larger than the scope of what I wanted to propose. I
>> would prefer to just work on the definitions of the status_flag and
>> quality_flag.
>>
>> Looking at your suggestion to numerically order the values 

Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables

2019-07-26 Thread Kehoe, Kenneth E.
John,

I don't see status_flag excluding someone from providing information about 
status of instruments or equipment. It would be information about data if it is 
information about the equipment. I think leaving status_flag as a general catch 
all right now is good. My concern with trying to fix that is where do we stop? 
It could require 10's to 100's of new names to cover every possible case.

Ken



On 2019-7-26 12:03, Andrew Barna wrote:
Hi John,

The examples provided in the CF Document are almost all about instrument state: 
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#flags
 so it seems to support your use case.

-Barna

On Wed, Jul 24, 2019 at 11:40 AM John Graybeal 
mailto:jgrayb...@stanford.edu>> wrote:
I support the point about defining 'status' and 'quality'. Yes, there are cases 
when we define terms that are re-used, but I don't think these terms are 
reused, they appear only in these flags. Just defining the standard name should 
do.

Ken, I did like the qualifying text about status_flag but maybe that's because 
I always thought status_flag could be used that way, as a status of 
instruments. Looking at the definition 
(http://mmisw.org/cfsn/#/search/status)
 it doesn't say that, does it?  It's all about the data. I even searched the 
archives, I was so sure people talked about it in another way, but I can't find 
any evidence of that.

So I conclude equipment status is not included in the model currently supported 
by status flag, and we shouldn't try to fix that here. What do you think?

John

On Jul 24, 2019, at 10:34 AM, Kehoe, Kenneth E. 
mailto:kke...@ou.edu>> wrote:

Daniel,

Thanks for the information. At some point we should chat about how our two 
organizations think about and perform quality analysis.

Martin,

I'm confused about your suggestion to include definitions of status and 
quality. I guess we could define those terms better in the general standard 
name table, but that is not my intention. My concern is that the definition of 
those terms is larger than the scope of what I wanted to propose. I would 
prefer to just work on the definitions of the status_flag and quality_flag.

Looking at your suggestion to numerically order the values suggests I think we 
have a different notion of how to use quality_flag. A quality_flag is not 
intend to indicate severity or ranking of tests. It is just a state field. My 
program had discussions to do something like that in the past and it did not 
end well.

If we want to add terminology along the lines of "The variable with standard 
name quality_flag refers to an assessed quality of the corresponding data." 
that is OK with me. Your expanded definition of status does not help me to 
better understand status. I think it's the statement of "may" that confuses me. 
I see a definition needing to be more definitive.

I don't see the addition of quality_flag as changing status_flag. I see 
quality_flag as a more narrow sub-class of status_flag. I would prefer to not 
change much with status_flag since it has such a long history with CF.

I think we have these definitions:

status_flag: A variable with the standard name of status_flag contains an 
indication of quality or other status of another data variable. The linkage 
between the data variable and the variable with the standard name of 
status_flag is achieved using the ancillary_variables attribute. A variable 
which contains purely quality information may use the standard name of 
quality_flag to provided an assessed quality of the corresponding data.

quality_flag = A variable with the standard name of quality_flag contains an 
indication of assessed quality information of another data variable. The 
linkage between the data variable and the variable or variables with the 
standard_name of quality_flag is achieved using the ancillary_variables 
attribute.

Thanks,

Ken





On 2019-7-24 03:40, Daniel Neumann wrote:
Dear Ken, Martin, John, Roy and Barna,

I/we thought about submitting a similar proposal to add some extended model 
quality information to netCDF files. The suggested description of 
"quality_flag" and the modified description of "status_flag" fit well into our 
project.

I am just writing this to show that there are more people in the community who 
are interested in this.

Cheers,
Daniel


Am 24.07.2019 um 10:49 schrieb Martin Juckes - UKRI STFC:
Dear John, Roy,


OK, I'm happy to 

Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables

2019-07-26 Thread Andrew Barna
Hi John,

The examples provided in the CF Document are almost all about instrument
state:
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#flags
so
it seems to support your use case.

-Barna

On Wed, Jul 24, 2019 at 11:40 AM John Graybeal 
wrote:

> I support the point about defining 'status' and 'quality'. Yes, there are
> cases when we define terms that are re-used, but I don't think these terms
> are reused, they appear only in these flags. Just defining the standard
> name should do.
>
> Ken, I did like the qualifying text about status_flag but maybe that's
> because I always thought status_flag could be used that way, as a status of
> instruments. Looking at the definition (
> http://mmisw.org/cfsn/#/search/status) it doesn't say that, does it?
> It's all about the data. I even searched the archives, I was so sure people
> talked about it in another way, but I can't find any evidence of that.
>
> So I conclude equipment status is not included in the model currently
> supported by status flag, and we shouldn't try to fix that here. What do
> you think?
>
> John
>
> On Jul 24, 2019, at 10:34 AM, Kehoe, Kenneth E.  wrote:
>
> Daniel,
>
> Thanks for the information. At some point we should chat about how our two
> organizations think about and perform quality analysis.
>
> Martin,
>
> I'm confused about your suggestion to include definitions of status and
> quality. I guess we could define those terms better in the general standard
> name table, but that is not my intention. My concern is that the definition
> of those terms is larger than the scope of what I wanted to propose. I
> would prefer to just work on the definitions of the status_flag and
> quality_flag.
>
> Looking at your suggestion to numerically order the values suggests I
> think we have a different notion of how to use quality_flag. A quality_flag
> is not intend to indicate severity or ranking of tests. It is just a state
> field. My program had discussions to do something like that in the past and
> it did not end well.
>
> If we want to add terminology along the lines of "The variable with
> standard name quality_flag refers to an assessed quality of the
> corresponding data." that is OK with me. Your expanded definition of status
> does not help me to better understand status. I think it's the statement of
> "may" that confuses me. I see a definition needing to be more definitive.
>
> I don't see the addition of quality_flag as changing status_flag. I see
> quality_flag as a more narrow sub-class of status_flag. I would prefer to
> not change much with status_flag since it has such a long history with CF.
>
> I think we have these definitions:
>
> status_flag: A variable with the standard name of status_flag contains an
> indication of quality or other status of another data variable. The linkage
> between the data variable and the variable with the standard name of
> status_flag is achieved using the ancillary_variables attribute. A variable
> which contains purely quality information may use the standard name of
> quality_flag to provided an assessed quality of the corresponding data.
>
> quality_flag = A variable with the standard name of quality_flag contains
> an indication of assessed quality information of another data variable. The
> linkage between the data variable and the variable or variables with the
> standard_name of quality_flag is achieved using the ancillary_variables
> attribute.
>
> Thanks,
>
> Ken
>
>
>
>
>
> On 2019-7-24 03:40, Daniel Neumann wrote:
>
> Dear Ken, Martin, John, Roy and Barna,
>
> I/we thought about submitting a similar proposal to add some extended
> model quality information to netCDF files. The suggested description of
> "quality_flag" and the modified description of "status_flag" fit well into
> our project.
>
> I am just writing this to show that there are more people in the community
> who are interested in this.
>
> Cheers,
> Daniel
>
>
> Am 24.07.2019 um 10:49 schrieb Martin Juckes - UKRI STFC:
>
> Dear John, Roy,
>
>
> OK, I'm happy to drop the line about ordering of quality flags if it
> doesn't work. This is consistent with Roy's suggested definitions (posted 2
> minutes before John's reply), which also drop this sentence, and add a
> broader description of valid usage of the status flag (I've copied them her
> to get the discussion back in a single thread):
>
>
> Status: The value of a variable with standard name status_flag may refer
> to the status of the instrument or process which generated the
> corresponding data, or it may refer to the data itself. This may include
> information about data quality, particularly in legacy data sets.
> 'quality_flag' should be used if data quality is the only type of
> information contained in the variable.
>
> Quality: The value of a variable with standard name quality_flag refers to
> an assessed quality of the corresponding data.
>
>
> regards,
>
> Martin
>
> 
> From: John