Folks:

Depending on how loose we define vector in this context, this umbrella variable 
concept can achieve my objective where multiple data variables share the same 
quality flags.

In the problem domain I am working, there are multiple related data variables 
in the same coordinate space for which the quality flags are the same.  It 
would be a stretch to call these multiple related data variables "vectors" in 
the true mathematical sense.

Given this looser definition of "vector", the notion of having standard names 
associated with the umbrella data variable does not seem to make sense as 
different projects would potentially have project-unique groupings of variables 
where it is desirable to share quality flags, etc.

very respectfully,

randy

++++
[CF-metadata] Proposal for better handling vector quantities in CF
Thomas Lavergne xxxxx
Thu Nov 24 14:53:52 MST 2011 

Previous message: [CF-metadata] standards for probabilities 
Next message: [CF-metadata] Proposal for better handling vector quantities in 
CF 
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] 

--------------------------------------------------------------------------------

Dear all,

This email is a proposal to strenghthen the storage and exploitation of 
vector/tensor data in CF. Thanks to Jonathan for commenting an early version of 
this note.

As far as I can tell, vectors are not handled as such by CF, only their 
components (via the standard names defining them, e.g. sea_ice_x_velocity, 
northward_sea_ice_velocity, eastern_wind, etc...). Life and some applications 
(e.g. plotting) would be easier if it was possible to group all components of a 
vector field into a single "vector" object. 

Here is my use case: I have an ice drift product, thus two datasets to define 
my vectors: sea_ice_x_displacement, and sea_ice_y_displacement. Note that it 
could be any combination of x/y, north/east, module/direction. It is moreover 
not limited to ice drift, but rather applies to any 2D (3D as well) variables 
such as vectors. As far as I know, the current CF does not provide me a way to 
"group" these two components an re-unite them into a vector. Two consequences: 
1) I cannot define a third variable (say status_flag) that would apply to the 
vector object (thus to both its components). And 2) computer programmes (that 
for example want to draw vectors instead of colour contours) have to "guess" 
that my CF file contains a vector. The software has to skim through my 
variables, check that any two pairs of standard names define a vector, and 
propose a "vector plot" option to the user. This might work in simple files, 
but will fail if my CF files contains 2 sets of vectors, say one 
 from model, the other from satellite: X_model, Y_model, X_sat, Y_sat. Will a 
software be smart enough to avoid proposing a (X_model,Y_sat) vector plots when 
all the 4 share the same standard_names: sea_ice_(x|y)_displacements? 

Here, an approach could be that the X dataset defines its corresponding Y 
dataset as an "auxiliary variable" (and the Y dataset would do the same with 
X). This would probably work, but does not solve my concern number 1 to share a 
3rd variable with both X and Y.

The solution I propose for discussion is to allow an umbrella "dummy" dataset 
(like the proj/mapping ones: no dimension, no data, just attributes). This 
umbrella variable would have a valid standard name 
"sea_ice_displacement_vector" (definition of "vector"). We would then define a 
new standard attribute pattern: components = <space separated list of 
components>, e.g. "dX dY dir". The string values in  the list are the name of 
the datasets containing the components of the vector. Note that even for a 2D 
vector, I could choose to have both x/y and speed/dir in the same CF file, 
hence the need to allow more than just 2 "components", even for a 2D vector. We 
must have at least 2.

So in my case:

The two X and Y datasets and the direction:

float dX(time, yc, xc) ;
 dX:long_name = "component of the displacement along the x axis of the grid" ;
 dX:standard_name = "sea_ice_x_displacement" ;
 dX:units = "km" ;
 dX:_FillValue = -1.e+10f ;
 dX:coordinates = "lat lon" ;
 dX:grid_mapping = "Polar_Stereographic_Grid" ;

float dY(time, yc, xc) ;
 dY:long_name = "component of the displacement along the y axis of the grid" ;
 dY:standard_name = "sea_ice_y_displacement" ;
 dY:units = "km" ;
 dY:_FillValue = -1.e+10f ;
 dX:coordinates = "lat lon" ;
 dX:grid_mapping = "Polar_Stereographic_Grid" ;

float dir(time, yc, xc) ;
 dY:long_name = "direction of the displacement" ;
 dY:standard_name = "direction_of_sea_ice_displacement" ;
 dY:units = "degrees" ;
 dY:_FillValue = -1.e+10f ;
 dX:coordinates = "lat lon" ;
 dX:grid_mapping = "Polar_Stereographic_Grid" ;


The new dummy umbrella:

int ice_drift_vector;
 drift_vector:standard_name = "sea_ice_displacement" ;
 drift_vector:long_name = "sea ice drift vector" ;
 drift_vector:components = "dX dY dir" ;

A status flag for the vector:

byte status_flag(time, yc, xc) ;
 status_flag:standard_name = "sea_ice_displacement status_flag" ;
 status_flag:long_name = "rejection and quality level flag" ;
 status_flag:valid_min = 0b ;
 status_flag:valid_max = 30b ;
 status_flag:grid_mapping = "Polar_Stereographic_Grid" ;
 status_flag:coordinates = "lat lon" ;
 status_flag:flag_values = 0b, 1b,..., 22b, 30b ;
 status_flag:flag_meanings = "missing_input_data over_land ... interpolated 
nominal_quality" ;

When browsing through the file, a software would immediately see that there are 
vectors available (e.g. for display) and which datasets hold the components. It 
still have to read the component datasets to know how to use them (by reading 
the standard_name). We could even imagine that tools are able to automatically 
compute (for display or comparison)the vector length although only the x/y 
components are available. 

Some new standard names would be needed, of course. As well as the "components" 
attribute. A revisit/cleanup of many definitions of existing standard names of 
component variables could also be envisaged.

This proposal needs to be more thouroughly discussed. Those CF users handling 
vector quantities could check against their file if this approach breaks 
anything. Even if it does not break anything: what is the added value that this 
change would bring? Do you see other applications that would benefit from this? 
There is little point in implementing it if it does not help others...

Regards,
Thomas


 

 
..............End of Message ...............................-->


 
                   
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to