Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Chris Barker
Let the bike shedding continue!

On Wed, Jan 15, 2014 at 1:14 PM, John Graybeal wrote:

> I don't think multiple use cases from different individuals and
> communities should be categorized as "no reason other than maybe taste".
>  Just sayin'...
>
> multiple use-cases are examples not reasons -- "I'd like to do that", or
"I've been doing that" doesn't give a why... though you do below, thanks.


(and it certainly shouldn't be removed completely -- variable names with
arbitrary bytes in them would really be a mess). Is it ascii-only now? it
probably should stay that way.


This prompts me to observe that somehow, in this brave new age of computer
> programming, people are developing netCDF software that supports Unicode
> characters -- Unicode!! -- in variable (attribute etc) names.
>

I'm a fan of unicode, actually, but despite it being around a long time,
now, it's still a pain in the *&%&^ in C, C++, and, I'm guessing, Fortran.
Not so bad in more modern languages, though apparently some use UTF-16 and
don't always handle the larger code points correctly. So still a pain.

And as you can tell, I'm a fan of restricting names to particular classes
of characters, and unicode includes a lot of concepts that are pretty hard
to define: e.g. "alphanumeric". I can see how it owuld be really nice for
non-english speakers or math and science geeks to use all sorts of great
variable names, but Im afraid opening up fully might more of a nightmae
than it is worth.

My pet programming language, python, currently allows unicode variable
names, with restrictions, but his is a heck of a list to keep track of!

http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html


> There will be netCDF files in the wild, used by scientists and normal
> people (especially normal people from non-English-speaking countries) that
> use all sorts of wild and crazy characters in their variable names.
> (Perhaps CF thinks these are "alphanumeric", in which case I've found a
> solution! The standard certainly is not explicitly ASCII-only.)  By the
> way, I was amazed to learn that using Unicode in programming languages is
> starting to take hold.
>

but still only starting

At some point, we in the CF-supporting community are going to have to
> support the standard practices in this aspect that are going on everywhere
> else in the software world, or decide we want a permanent back-water for
> the 'scientists who are not interested in or capable of supporting these
> practices' (not my claim).
>

I think unicode is a red herring for this issue -- not that it isn't
interesting, but for sure full unicode options would allow nice expressive
variable names, but I'd still rather have variable names that don't look
like math expressions, and aren't legal names in programing languages.

The current CF document says
"Variable, dimension and attribute names should begin with a letter and be
composed of letters, digits, and underscores."

but "letters" is not very well defined when you get outside of ascii -- it
seems we have work to do.


>
>
> Perhaps there are some reasons to want less-restrictive variable names --
> I'm not always that imaginative, but if so, then present them.
>
>
> Let's just make the list so far, to get everyone up to speed with the
> discussion:
> * easier visual parsing (taste, yes, but practical also if you work with
> lots of data sets from different communities)
> * embedding semantic meaning (taste)
> * clearly isolating the context (namespace, hierarchy)
>

I'm having trouble seeing how adding math symbols, etc will help these --
they can be done pretty well with underscores...


> * matching attribute names that come from the source data
> * consistency with netCDF usage/files -> easier onboarding of those files
>

mixed bag here -- CF is intended to be more restricted than netcdf


* Unicode/internationalization support

orthogonal question, I think. unless there's a language that uses "+" as a
letter

I think we've only heard from me and Steve saying we didn't like this
proposal -- don't take our work on it!


-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Russ Rew
John Graybeal wrote:
> This prompts me to observe that somehow, in this brave new age of computer 
> programming, people are
> developing netCDF software that supports Unicode characters -- Unicode!! -- 
> in variable (attribute
> etc) names. There will be netCDF files in the wild, used by scientists and 
> normal people (especially
> normal people from non-English-speaking countries) that use all sorts of wild 
> and crazy characters
> in their variable names. (Perhaps CF thinks these are "alphanumeric", in 
> which case I've found a
> solution! The standard certainly is not explicitly ASCII-only.)  By the way, 
> I was amazed to learn
> that using Unicode in programming languages is starting to take hold.

Yes, since June 2008 we have supported use of Unicode characters in
names in both netCDF-3 and netCDF-4 software.  The intent was to make
netCDF more suitable for international use, rather than to encode
mathematical operations in variable names.  But we were also responding
to needs of some user communities, for example atmospheric chemists who
wanted to be able to use standard notations for chemical species in
variable names.

Here's a small non-sensical example of ncdump output for a file
containing Unicode names:

  
http://www.unidata.ucar.edu/netcdf/workshops/most-recent/utilities/Unicode.html

The precise rules for netCDF names are in the format documentation, but
the short version is:

  ... The first character of a name must be alphanumeric, a multi-byte
  UTF-8 character, or '_' (reserved for special names with meaning to
  implementations, such as the “_FillValue” attribute). Subsequent
  characters may also include printing special characters, except for
  '/' which is not allowed in names. Names that have trailing space
  characters are also not permitted.

That document also warns:

  Note that by using special characters in names, you may make your data
  not compliant with conventions that have more stringent requirements
  on valid names for netCDF components, for example the CF Conventions.

> At some point, we in the CF-supporting community are going to have to support 
> the standard practices
> in this aspect that are going on everywhere else in the software world, or 
> decide we want a
> permanent back-water for the 'scientists who are not interested in or capable 
> of supporting these
> practices' (not my claim).
> 
> Perhaps there are some reasons to want less-restrictive variable names -- 
> I'm not always
> that imaginative, but if so, then present them.
> 
> Let's just make the list so far, to get everyone up to speed with the 
> discussion:
> * easier visual parsing (taste, yes, but practical also if you work with lots 
> of data sets from
> different communities)
> * embedding semantic meaning (taste)
> * clearly isolating the context (namespace, hierarchy)
> * matching attribute names that come from the source data
> * consistency with netCDF usage/files -> easier onboarding of those files
> * Unicode/internationalization support

--Russ
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread John Graybeal
And I wasn't going to say anything else, but this crystallized an issue or two 
from past mails. I promise to (try to) let it go after this. 

On Jan 15, 2014, at 12:37, Chris Barker  wrote:

> There is an existing rule about what charactors can be used for variable 
> names, that's it -- and we've given a couple not-all-that compelling reasons 
> why that rule is good, and no reason other than maybe taste, why that rule 
> would be extended.

I don't think multiple use cases from different individuals and communities 
should be categorized as "no reason other than maybe taste".  Just sayin'...

> (and it certainly shouldn't be removed completely -- variable names with 
> arbitrary bytes in them would really be a mess). Is it ascii-only now? it 
> probably should stay that way.

This prompts me to observe that somehow, in this brave new age of computer 
programming, people are developing netCDF software that supports Unicode 
characters -- Unicode!! -- in variable (attribute etc) names. There will be 
netCDF files in the wild, used by scientists and normal people (especially 
normal people from non-English-speaking countries) that use all sorts of wild 
and crazy characters in their variable names. (Perhaps CF thinks these are 
"alphanumeric", in which case I've found a solution! The standard certainly is 
not explicitly ASCII-only.)  By the way, I was amazed to learn that using 
Unicode in programming languages is starting to take hold.

At some point, we in the CF-supporting community are going to have to support 
the standard practices in this aspect that are going on everywhere else in the 
software world, or decide we want a permanent back-water for the 'scientists 
who are not interested in or capable of supporting these practices' (not my 
claim).

> Perhaps there are some reasons to want less-restrictive variable names -- I'm 
> not always that imaginative, but if so, then present them.

Let's just make the list so far, to get everyone up to speed with the 
discussion:
* easier visual parsing (taste, yes, but practical also if you work with lots 
of data sets from different communities)
* embedding semantic meaning (taste)
* clearly isolating the context (namespace, hierarchy)
* matching attribute names that come from the source data
* consistency with netCDF usage/files -> easier onboarding of those files
* Unicode/internationalization support

John


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Chris Barker
On Wed, Jan 15, 2014 at 9:24 AM, Jim Biard  wrote:

> The point is, the Conventions themselves state that there is *no standard*.
>  People are all the time trying to add meaning to variable names, but the
> standard actually states that the meaning is to reside in the attributes.
>

but we aren't talking about assigning meaning, or telling anyone what names
they can use.

There is an existing rule about what charactors can be used for variable
names, that's it -- and we've given a couple not-all-that compelling
reasons why that rule is good, and no reason other than maybe taste, why
that rule would be extended.

(and it certainly shouldn't be removed completely -- variable names with
arbitrary bytes in them would really be a mess). Is it ascii-only now? it
probably should stay that way.

Perhaps there are some reasons to want less-restrictive variable names --
I'm not always that imaginative, but if so, then present them.

 The variable names are just keys for differentiating the variables.  (I
> could name all my variables “vNN”, where N is a digit, and I would
> be completely valid according to the standard.)
>

yup, but you couldn't name them: "vNNN-NNN" -- and why do you need to?

Given your point about the real meaning being encoded in the attributes,
then a prime reason to choose a given variable name is that it matches a
name you are using elsewhere in your process -- which is why I like them
being restricted to names that are valid variable names in programming
languages. Bu tit also may be a reason to be more flexible -- if you call
something "this+that" elsewhere in your process, you may want to use it in
your netcdf files, too.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Steve Hankin


On 1/15/2014 10:28 AM, John Graybeal wrote:

Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a few 
other related terms) should be added to the CF spec

Yes please, since discussion on this thread has already varied in its 
understanding/application of those terms.

The ambiguity in the sentence "No variable or dimension names are standardized by this convention." 
is also relevant. It could mean "This convention defines no requirements about variable or dimension 
names." or "This convention does not specify any particular variable or dimension names." The 
former meaning obviously reinforces the interpretation that 'should' is not a requirement.


It feels like we are veering towards hair-splitting, no?  CF contains a 
clear (if overlay polite) statement about the proper way to create a 
variable name: "/Variable, dimension and attribute names should begin 
with a letter and be composed of letters, digits, and underscores/".   
Clarification of the word "should" would be useful, yes, but the 
discussion would be highly unlikely to end up changing foundation 
compliance guidelines that have been in CF since the COARDS days.


Since it follows the preceding sentence,  "/This convention does not 
standardize any variable or dimension names./" also seems quite clear.   
The loophole that is implied here -- that CF does not standardize 
variable and dimension names, but other groups may do so -- has been 
usefully exploited by groups like OceanSites, who have chosen to 
standardize their own names and naming patterns sitting atop CF as a 
normative standard.


While the arguments pushing for the restrictive naming convention (_ as the 
only special character) are perhaps not strong, for my own use I don't have a 
compelling use case on the need for more characters either. Mostly this is a 
matter of personal taste -- I like being able to use . and - to help with 
visual parsing and + and @ for semantic reasons, and they help reduce the 
number of likely prefix collisions (which a single separator doesn't help with 
at all).
Agree.  There are factors sitting in the balance pans on both the pro 
and con side.  Special syntax names allow one to create very concise 
names with (we hope) self-evident meanings.  When you are the person 
engaged in the act of defining a new file, this is especially 
attractive.  But over the lifecycle of the data -- considering data 
discovery and data usage in a wide range of contexts -- the special 
syntax characters come back to bite you time and again.


Mike's example of an embedded "dot" is an interesting one because it 
cuts both ways.  Yes, there are times when creating CF files where it 
seems convenient to embed "." into a name in order to preserve a 
hierarchy from the software of origin.  But there will then be 
downstream situations that we make a muddle of when those applications 
want to use the same approach to designate a different hierarchy.  For 
example, downstream applications that want to refer to 
varname.attributename are forced into ugly hacks like 
"var.name.with.dots".attributename.  (Admittedly, this Pandora's box has 
already been opened.  We are already forced to contend with this today.)


A point I feel we ought to remind ourselves of, is that in an issue like 
the naming of variables we should try to put ourselves into the head 
space of the users of the data -- scientists.  Funky looking camel-case 
strings are bread and butter to software developers, but not so much to 
the sensibilities of scientists (particularly older ones).


There is also a social benefit from relaxing the CF almost-standard: 
on-boarding. We want to encourage netCDF users to transition to CF. Minimizing 
the number of inconsistencies seems practical and forward-thinking. Forcing a 
netCDF user (which may include lots of HDF users too, these days) to abandon 
established attribute names is a significant cost for the affected users, now 
and going forward.
I agree that this is a valid consideration.  There is gray surrounding 
this issue.


- Steve


John


On Jan 15, 2014, at 10:00, Ethan Davis  wrote:


Hi all,

The use of "should" may, by many, be interpreted as a recommendation
rather than as a requirement.

Though the terms "must", "should", and "may" are used throughout the CF
spec, I am not finding any text that defines those terms.

Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
few other related terms) should be added to the CF spec. Though it seems
that might require a fairly full review of the uses in CF of the terms
defined in RFC 2119.

Ethan

[1] http://www.ietf.org/rfc/rfc2119.txt

On 1/15/2014 10:46 AM, Karl Taylor wrote:

All,

Yes, that statement seems quite definitive and unambiguous, and for the
reasons stated in other emails, I support retaining it.

regards,
Karl

On 1/15/14 9:37 AM, Steve Hankin wrote:

On 1/15/2014 9:24 AM, Jim Biard wrote:

Chris,

The point is, the Conventions themselves state that t

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Mike McCann
I have two use cases:
MBARI has data from an underwater vehicle that contains hundreds of engineering 
variables that are automatically logged using the onboard software's names for 
the variables.  Those variables include the '.' character. We tried to use our 
existing NetCDF TDS/Hyrax infrastructure to handle these data but ran into 
several frustrating inconsistencies in how various packages handled the '.'.  
Unfortunately, we are not using the infrastructure for these data.
The ESIP Federation documentation group discussed creating a flattened object 
serialization convention for hierarchical metadata and wanted to use '.' as a 
delineator but needed to abandon that consideration to stay CF compliant.
-Mike 

--
Mike McCann
Software Engineer
Monterey Bay Aquarium Research Institute
7700 Sandholdt Road
Moss Landing, CA 95039-9644
Voice: 831.775.1769  Fax: 831.775.1736 http://www.mbari.org

On Jan 15, 2014, at 10:28 AM, John Graybeal wrote:

>> Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a few 
>> other related terms) should be added to the CF spec
> 
> Yes please, since discussion on this thread has already varied in its 
> understanding/application of those terms.
> 
> The ambiguity in the sentence "No variable or dimension names are 
> standardized by this convention." is also relevant. It could mean "This 
> convention defines no requirements about variable or dimension names." or 
> "This convention does not specify any particular variable or dimension 
> names." The former meaning obviously reinforces the interpretation that 
> 'should' is not a requirement.
> 
> While the arguments pushing for the restrictive naming convention (_ as the 
> only special character) are perhaps not strong, for my own use I don't have a 
> compelling use case on the need for more characters either. Mostly this is a 
> matter of personal taste -- I like being able to use . and - to help with 
> visual parsing and + and @ for semantic reasons, and they help reduce the 
> number of likely prefix collisions (which a single separator doesn't help 
> with at all). 
> 
> There is also a social benefit from relaxing the CF almost-standard: 
> on-boarding. We want to encourage netCDF users to transition to CF. 
> Minimizing the number of inconsistencies seems practical and 
> forward-thinking. Forcing a netCDF user (which may include lots of HDF users 
> too, these days) to abandon established attribute names is a significant cost 
> for the affected users, now and going forward.
> 
> John
> 
> 
> On Jan 15, 2014, at 10:00, Ethan Davis  wrote:
> 
>> Hi all,
>> 
>> The use of "should" may, by many, be interpreted as a recommendation
>> rather than as a requirement.
>> 
>> Though the terms "must", "should", and "may" are used throughout the CF
>> spec, I am not finding any text that defines those terms.
>> 
>> Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
>> few other related terms) should be added to the CF spec. Though it seems
>> that might require a fairly full review of the uses in CF of the terms
>> defined in RFC 2119.
>> 
>> Ethan
>> 
>> [1] http://www.ietf.org/rfc/rfc2119.txt
>> 
>> On 1/15/2014 10:46 AM, Karl Taylor wrote:
>>> All,
>>> 
>>> Yes, that statement seems quite definitive and unambiguous, and for the
>>> reasons stated in other emails, I support retaining it.
>>> 
>>> regards,
>>> Karl
>>> 
>>> On 1/15/14 9:37 AM, Steve Hankin wrote:
 
 On 1/15/2014 9:24 AM, Jim Biard wrote:
> Chris,
> 
> The point is, the Conventions themselves state that there is *no
> standard*.  People are all the time trying to add meaning to variable
> names, but the standard actually states that the meaning is to reside
> in the attributes.  The variable names are just keys for
> differentiating the variables.  (I could name all my variables
> “vNN”, where N is a digit, and I would be completely valid
> according to the standard.)  The long_name and standard_name
> attributes are the places where descriptors of the variable content
> are to be found.
> 
> So I’m raising a question. _ Is there actually anything other than
> sentiment (i.e., an actual rule) that anyone can point to that
> prevents someone from using “new” characters in their variable names?_
 
 How about the lines from the CF document that you cut-pasted (thank you):
 
   /Variable, dimension and attribute names should begin with a
   letter and be composed of letters, digits, and underscores. Note
   that this is in conformance with the COARDS conventions, but is
   more restrictive than the netCDF interface which allows use of the
   hyphen character. The netCDF interface also allows leading
   underscores in names, but the NUG states that this is reserved for
   system use./
 
   - Steve
> 
> Grace and peace,
> 
> Jim
> 
> CICS-NC Visit us on
>>

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread John Graybeal
> Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a few 
> other related terms) should be added to the CF spec

Yes please, since discussion on this thread has already varied in its 
understanding/application of those terms.

The ambiguity in the sentence "No variable or dimension names are standardized 
by this convention." is also relevant. It could mean "This convention defines 
no requirements about variable or dimension names." or "This convention does 
not specify any particular variable or dimension names." The former meaning 
obviously reinforces the interpretation that 'should' is not a requirement.

While the arguments pushing for the restrictive naming convention (_ as the 
only special character) are perhaps not strong, for my own use I don't have a 
compelling use case on the need for more characters either. Mostly this is a 
matter of personal taste -- I like being able to use . and - to help with 
visual parsing and + and @ for semantic reasons, and they help reduce the 
number of likely prefix collisions (which a single separator doesn't help with 
at all). 

There is also a social benefit from relaxing the CF almost-standard: 
on-boarding. We want to encourage netCDF users to transition to CF. Minimizing 
the number of inconsistencies seems practical and forward-thinking. Forcing a 
netCDF user (which may include lots of HDF users too, these days) to abandon 
established attribute names is a significant cost for the affected users, now 
and going forward.

John


On Jan 15, 2014, at 10:00, Ethan Davis  wrote:

> Hi all,
> 
> The use of "should" may, by many, be interpreted as a recommendation
> rather than as a requirement.
> 
> Though the terms "must", "should", and "may" are used throughout the CF
> spec, I am not finding any text that defines those terms.
> 
> Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
> few other related terms) should be added to the CF spec. Though it seems
> that might require a fairly full review of the uses in CF of the terms
> defined in RFC 2119.
> 
> Ethan
> 
> [1] http://www.ietf.org/rfc/rfc2119.txt
> 
> On 1/15/2014 10:46 AM, Karl Taylor wrote:
>> All,
>> 
>> Yes, that statement seems quite definitive and unambiguous, and for the
>> reasons stated in other emails, I support retaining it.
>> 
>> regards,
>> Karl
>> 
>> On 1/15/14 9:37 AM, Steve Hankin wrote:
>>> 
>>> On 1/15/2014 9:24 AM, Jim Biard wrote:
 Chris,
 
 The point is, the Conventions themselves state that there is *no
 standard*.  People are all the time trying to add meaning to variable
 names, but the standard actually states that the meaning is to reside
 in the attributes.  The variable names are just keys for
 differentiating the variables.  (I could name all my variables
 “vNN”, where N is a digit, and I would be completely valid
 according to the standard.)  The long_name and standard_name
 attributes are the places where descriptors of the variable content
 are to be found.
 
 So I’m raising a question. _ Is there actually anything other than
 sentiment (i.e., an actual rule) that anyone can point to that
 prevents someone from using “new” characters in their variable names?_
>>> 
>>> How about the lines from the CF document that you cut-pasted (thank you):
>>> 
>>>/Variable, dimension and attribute names should begin with a
>>>letter and be composed of letters, digits, and underscores. Note
>>>that this is in conformance with the COARDS conventions, but is
>>>more restrictive than the netCDF interface which allows use of the
>>>hyphen character. The netCDF interface also allows leading
>>>underscores in names, but the NUG states that this is reserved for
>>>system use./
>>> 
>>>- Steve
 
 Grace and peace,
 
 Jim
 
 CICS-NC Visit us on
 Facebook   *Jim Biard*
 *Research Scholar*
 Cooperative Institute for Climate and Satellites NC 
 North Carolina State University 
 NOAA's National Climatic Data Center 
 151 Patton Ave, Asheville, NC 28801
 e: jbi...@cicsnc.org 
 o: +1 828 271 4900
 
 
 
 
 
 On Jan 15, 2014, at 12:00 PM, Chris Barker >>> > wrote:
 
> On Wed, Jan 15, 2014 at 7:39 AM, jbiard  > wrote:
> 
>I don't think we should use ease of mapping variable names to a
>programming language as a reason for allowing (or not allowing)
>any particular character in variable names. 
> 
> Why not? maybe not a compelling reason, but I can't imagine a
> compelling reason to have more flexible naming conventions, either.
> 
>CF has, as I understood it, considered variable names as
>completely up to the pr

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Ethan Davis
Hi all,

The use of "should" may, by many, be interpreted as a recommendation
rather than as a requirement.

Though the terms "must", "should", and "may" are used throughout the CF
spec, I am not finding any text that defines those terms.

Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
few other related terms) should be added to the CF spec. Though it seems
that might require a fairly full review of the uses in CF of the terms
defined in RFC 2119.

Ethan

[1] http://www.ietf.org/rfc/rfc2119.txt

On 1/15/2014 10:46 AM, Karl Taylor wrote:
> All,
> 
> Yes, that statement seems quite definitive and unambiguous, and for the
> reasons stated in other emails, I support retaining it.
> 
> regards,
> Karl
> 
> On 1/15/14 9:37 AM, Steve Hankin wrote:
>>
>> On 1/15/2014 9:24 AM, Jim Biard wrote:
>>> Chris,
>>>
>>> The point is, the Conventions themselves state that there is *no
>>> standard*.  People are all the time trying to add meaning to variable
>>> names, but the standard actually states that the meaning is to reside
>>> in the attributes.  The variable names are just keys for
>>> differentiating the variables.  (I could name all my variables
>>> “vNN”, where N is a digit, and I would be completely valid
>>> according to the standard.)  The long_name and standard_name
>>> attributes are the places where descriptors of the variable content
>>> are to be found.
>>>
>>> So I’m raising a question. _ Is there actually anything other than
>>> sentiment (i.e., an actual rule) that anyone can point to that
>>> prevents someone from using “new” characters in their variable names?_
>>
>> How about the lines from the CF document that you cut-pasted (thank you):
>>
>> /Variable, dimension and attribute names should begin with a
>> letter and be composed of letters, digits, and underscores. Note
>> that this is in conformance with the COARDS conventions, but is
>> more restrictive than the netCDF interface which allows use of the
>> hyphen character. The netCDF interface also allows leading
>> underscores in names, but the NUG states that this is reserved for
>> system use./
>>
>> - Steve
>>>
>>> Grace and peace,
>>>
>>> Jim
>>>
>>> CICS-NC Visit us on
>>> Facebook    *Jim Biard*
>>> *Research Scholar*
>>> Cooperative Institute for Climate and Satellites NC 
>>> North Carolina State University 
>>> NOAA's National Climatic Data Center 
>>> 151 Patton Ave, Asheville, NC 28801
>>> e: jbi...@cicsnc.org 
>>> o: +1 828 271 4900
>>>
>>>
>>>
>>>
>>>
>>> On Jan 15, 2014, at 12:00 PM, Chris Barker >> > wrote:
>>>
 On Wed, Jan 15, 2014 at 7:39 AM, jbiard >>> > wrote:

 I don't think we should use ease of mapping variable names to a
 programming language as a reason for allowing (or not allowing)
 any particular character in variable names. 

 Why not? maybe not a compelling reason, but I can't imagine a
 compelling reason to have more flexible naming conventions, either.

 CF has, as I understood it, considered variable names as
 completely up to the producer, relying on attributes to provide
 meaning.  So, I can name a temperature variable "fluffy_bunny"
 if I want to, and it is completely valid.

 valid yes, a good idea? probably not.

 Section 1.3 of the Conventions states, "No variable or dimension
 names are standardized by this convention." 

 so there are no standard variable names -- that's not the same as
 standards for variable names

 Personally, I wish there were standards for variable names, it would
 make it easier to code against -- but that cat's out of the bag. But
 this cat isn't: the restiricitons have been there for a long time,
 so the question now is:

 what are the reasons for easing those restrictions?

 and

 what are the reasons for keeping those restrictions?

 we've given a few reasons for keeping them (maybe not all  that
 compeling toyou, but reasons none the less) -- what are the reasons
 for relaxing them, other than "I like this naming convention that is
 currently not allowed" ?

 I'm not convinced that "fluffy-bunny" is any more readable or
 anything else than "fluffy_bunny"

 -Chris


 -- 

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/OR&R(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov 
 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Karl Taylor

All,

Yes, that statement seems quite definitive and unambiguous, and for the 
reasons stated in other emails, I support retaining it.


regards,
Karl

On 1/15/14 9:37 AM, Steve Hankin wrote:


On 1/15/2014 9:24 AM, Jim Biard wrote:

Chris,

The point is, the Conventions themselves state that there is *no 
standard*.  People are all the time trying to add meaning to variable 
names, but the standard actually states that the meaning is to reside 
in the attributes.  The variable names are just keys for 
differentiating the variables.  (I could name all my variables 
"vNN", where N is a digit, and I would be completely valid 
according to the standard.)  The long_name and standard_name 
attributes are the places where descriptors of the variable content 
are to be found.


So I'm raising a question. _ Is there actually anything other than 
sentiment (i.e., an actual rule) that anyone can point to that 
prevents someone from using "new" characters in their variable names?_


How about the lines from the CF document that you cut-pasted (thank you):

/Variable, dimension and attribute names should begin with a
letter and be composed of letters, digits, and underscores. Note
that this is in conformance with the COARDS conventions, but is
more restrictive than the netCDF interface which allows use of the
hyphen character. The netCDF interface also allows leading
underscores in names, but the NUG states that this is reserved for
system use./

- Steve


Grace and peace,

Jim

CICS-NC Visit us on
Facebook  *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC 
North Carolina State University 
NOAA's National Climatic Data Center 
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org 
o: +1 828 271 4900





On Jan 15, 2014, at 12:00 PM, Chris Barker > wrote:


On Wed, Jan 15, 2014 at 7:39 AM, jbiard > wrote:


I don't think we should use ease of mapping variable names to a
programming language as a reason for allowing (or not allowing)
any particular character in variable names.

Why not? maybe not a compelling reason, but I can't imagine a 
compelling reason to have more flexible naming conventions, either.


CF has, as I understood it, considered variable names as
completely up to the producer, relying on attributes to provide
meaning.  So, I can name a temperature variable "fluffy_bunny"
if I want to, and it is completely valid.

valid yes, a good idea? probably not.

Section 1.3 of the Conventions states, "No variable or dimension
names are standardized by this convention."

so there are no standard variable names -- that's not the same as 
standards for variable names


Personally, I wish there were standards for variable names, it would 
make it easier to code against -- but that cat's out of the bag. But 
this cat isn't: the restiricitons have been there for a long time, 
so the question now is:


what are the reasons for easing those restrictions?

and

what are the reasons for keeping those restrictions?

we've given a few reasons for keeping them (maybe not all  that 
compeling toyou, but reasons none the less) -- what are the reasons 
for relaxing them, other than "I like this naming convention that is 
currently not allowed" ?


I'm not convinced that "fluffy-bunny" is any more readable or 
anything else than "fluffy_bunny"


-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov 
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu 
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Steve Hankin


On 1/15/2014 9:24 AM, Jim Biard wrote:

Chris,

The point is, the Conventions themselves state that there is *no 
standard*.  People are all the time trying to add meaning to variable 
names, but the standard actually states that the meaning is to reside 
in the attributes.  The variable names are just keys for 
differentiating the variables.  (I could name all my variables 
"vNN", where N is a digit, and I would be completely valid 
according to the standard.)  The long_name and standard_name 
attributes are the places where descriptors of the variable content 
are to be found.


So I'm raising a question. _ Is there actually anything other than 
sentiment (i.e., an actual rule) that anyone can point to that 
prevents someone from using "new" characters in their variable names?_


How about the lines from the CF document that you cut-pasted (thank you):

   /Variable, dimension and attribute names should begin with a letter
   and be composed of letters, digits, and underscores. Note that this
   is in conformance with the COARDS conventions, but is more
   restrictive than the netCDF interface which allows use of the hyphen
   character. The netCDF interface also allows leading underscores in
   names, but the NUG states that this is reserved for system use./

- Steve


Grace and peace,

Jim

CICS-NC Visit us on
Facebook  *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC 
North Carolina State University 
NOAA's National Climatic Data Center 
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org 
o: +1 828 271 4900





On Jan 15, 2014, at 12:00 PM, Chris Barker > wrote:


On Wed, Jan 15, 2014 at 7:39 AM, jbiard > wrote:


I don't think we should use ease of mapping variable names to a
programming language as a reason for allowing (or not allowing)
any particular character in variable names.

Why not? maybe not a compelling reason, but I can't imagine a 
compelling reason to have more flexible naming conventions, either.


CF has, as I understood it, considered variable names as
completely up to the producer, relying on attributes to provide
meaning.  So, I can name a temperature variable "fluffy_bunny" if
I want to, and it is completely valid.

valid yes, a good idea? probably not.

Section 1.3 of the Conventions states, "No variable or dimension
names are standardized by this convention."

so there are no standard variable names -- that's not the same as 
standards for variable names


Personally, I wish there were standards for variable names, it would 
make it easier to code against -- but that cat's out of the bag. But 
this cat isn't: the restiricitons have been there for a long time, so 
the question now is:


what are the reasons for easing those restrictions?

and

what are the reasons for keeping those restrictions?

we've given a few reasons for keeping them (maybe not all  that 
compeling toyou, but reasons none the less) -- what are the reasons 
for relaxing them, other than "I like this naming convention that is 
currently not allowed" ?


I'm not convinced that "fluffy-bunny" is any more readable or 
anything else than "fluffy_bunny"


-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov 
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu 
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Jim Biard
Chris,

The point is, the Conventions themselves state that there is no standard.  
People are all the time trying to add meaning to variable names, but the 
standard actually states that the meaning is to reside in the attributes.  The 
variable names are just keys for differentiating the variables.  (I could name 
all my variables “vNN”, where N is a digit, and I would be completely 
valid according to the standard.)  The long_name and standard_name attributes 
are the places where descriptors of the variable content are to be found.

So I’m raising a question.  Is there actually anything other than sentiment 
(i.e., an actual rule) that anyone can point to that prevents someone from 
using “new” characters in their variable names?

Grace and peace,

Jim

Visit us on
FacebookJim Biard
Research Scholar
Cooperative Institute for Climate and Satellites NC
North Carolina State University
NOAA's National Climatic Data Center
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org
o: +1 828 271 4900




On Jan 15, 2014, at 12:00 PM, Chris Barker  wrote:

> On Wed, Jan 15, 2014 at 7:39 AM, jbiard  wrote:
> I don't think we should use ease of mapping variable names to a programming 
> language as a reason for allowing (or not allowing) any particular character 
> in variable names. 
> 
> Why not? maybe not a compelling reason, but I can't imagine a compelling 
> reason to have more flexible naming conventions, either.
> CF has, as I understood it, considered variable names as completely up to the 
> producer, relying on attributes to provide meaning.  So, I can name a 
> temperature variable "fluffy_bunny" if I want to, and it is completely valid.
> 
> valid yes, a good idea? probably not.
> 
> Section 1.3 of the Conventions states, "No variable or dimension names are 
> standardized by this convention." 
> 
> so there are no standard variable names -- that's not the same as standards 
> for variable names
> 
> Personally, I wish there were standards for variable names, it would make it 
> easier to code against -- but that cat's out of the bag. But this cat isn't: 
> the restiricitons have been there for a long time, so the question now is:
> 
> what are the reasons for easing those restrictions?
> 
> and
> 
> what are the reasons for keeping those restrictions?
> 
> we've given a few reasons for keeping them (maybe not all  that compeling 
> toyou, but reasons none the less) -- what are the reasons for relaxing them, 
> other than "I like this naming convention that is currently not allowed" ?
> 
> I'm not convinced that "fluffy-bunny" is any more readable or anything else 
> than "fluffy_bunny"
> 
> -Chris
> 
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
> 
> chris.bar...@noaa.gov
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Chris Barker
On Wed, Jan 15, 2014 at 7:39 AM, jbiard  wrote:

>  I don't think we should use ease of mapping variable names to a
> programming language as a reason for allowing (or not allowing) any
> particular character in variable names.
>
Why not? maybe not a compelling reason, but I can't imagine a compelling
reason to have more flexible naming conventions, either.

> CF has, as I understood it, considered variable names as completely up to
> the producer, relying on attributes to provide meaning.  So, I can name a
> temperature variable "fluffy_bunny" if I want to, and it is completely
> valid.
>
valid yes, a good idea? probably not.

Section 1.3 of the Conventions states, "No variable or dimension names are
> standardized by this convention."
>
so there are no standard variable names -- that's not the same as standards
for variable names

Personally, I wish there were standards for variable names, it would make
it easier to code against -- but that cat's out of the bag. But this cat
isn't: the restiricitons have been there for a long time, so the question
now is:

what are the reasons for easing those restrictions?

and

what are the reasons for keeping those restrictions?

we've given a few reasons for keeping them (maybe not all  that compeling
toyou, but reasons none the less) -- what are the reasons for relaxing
them, other than "I like this naming convention that is currently not
allowed" ?

I'm not convinced that "fluffy-bunny" is any more readable or anything else
than "fluffy_bunny"

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread jbiard
 

Hi. 

I don't think we should use ease of mapping variable names to
a programming language as a reason for allowing (or not allowing) any
particular character in variable names. CF has, as I understood it,
considered variable names as completely up to the producer, relying on
attributes to provide meaning. So, I can name a temperature variable
"fluffy_bunny" if I want to, and it is completely valid. 

Section 1.3
of the Conventions states, "No variable or dimension names are
standardized by this convention." 

Section 2.3 states: 

Variable,
dimension and attribute names should begin with a letter and be composed
of letters, digits, and underscores. Note that this is in conformance
with the COARDS conventions, but is more restrictive than the netCDF
interface which allows use of the hyphen character. The netCDF interface
also allows leading underscores in names, but the NUG states that this
is reserved for system use. 

Case is significant in netCDF names, but
it is recommended that names should not be distinguished purely by case,
i.e., if case is disregarded, no two names should be the same. It is
also recommended that names should be obviously meaningful, if possible,
as this renders the file more effectively self-describing. 

This
convention does not standardize any variable or dimension names. 

While
the Conventions makes recommendations about variable names, NO STANDARDS
are set by the Conventions. 

So, why were non-alphanumeric characters
other than '_' excluded by practice back in the day? Are these reasons
still valid? In fact, given the statements in the Conventions, is there
actually anything other than opinion constraining people from using any
characters they like in variable (and dimension) names (as long as they
are OK with netCDF and maybe NUG)? 

Grace and peace, 

Jim 

On
2014-01-14 12:08, Chris Barker wrote: 

> There is another reason: 
>
mapping CF variable names directly to programming language variable
names is pretty handy -- so it's nice if those are legal. 
> I'm sure
not all programming languages have the same restrictions on names, but
there is surely a subset that's pretty common (i.e. none of the usual
math characters). 
> -Chris 
> 
> On Mon, Jan 13, 2014 at 12:57 PM,
Steve Hankin  wrote:
> 
>> Hi John,
>> 
>>
Philosophically I am aligned with Bryan: the purpose of the CF standard
is to constrain (simplify and make predictable) the use of a highly
general file creation toolkit like netCDF. The question of limitations
placed on name strings should be evaluated on this yard stick.
>> 
>>
There is a class of problems that are created by embedding special
syntax characters willy-nilly into name strings. Namely, that the use of
such characters can render mathematical expressions ambiguous. Here's a
simple example. Suppose a file contains 3 surface marine variables --
lets say atmospheric CO2, ocean CO2 and an artfully computed delta
across the surface. Further say that the file creator chooses to name
the delta variable using a "-", as in
>> atmosCO2
>> waterCO2
>> and
>>
_ _ atmosCO2-waterCO2
>> 
>> Then the meaning of the mathematical
expression "atmosCO2-waterCO2" has been rendered ambiguous. Is it a
single variable name, or the difference of two? One is forced to use
arbitrary tricks that are alien to the scientific users we are trying to
serve -- say disambiguating the expression by insisting on surrounding
quotes, "atmosCO2"-"waterCO2", white space, "atmosCO2 - waterCO2".
(Would any scientist read "atmosCO2 - waterCO2" and "atmosCO2-waterCO2"
to have distinct meanings?)
>> 
>> As you say we have already headed
down this (slippery) slope. Characters like "+", "-", "." and
case-sensitivity have leaked through into fairly common practice. For
better or worse. :-( (Should the publishers of science textbooks start
using case-sensitive variable names?) So the question that you've posed
is in a sense, _now that the horse is out of the barn, is there any
merit to keeping the other animals penned?_ Like Brian, I would argue
that the way to answer this is to insist that at least there be
significant gains from letting them out.
>> 
>> Another unintended
negative consequence: the impact on free text searches when our variable
names include special syntax characters. Are our metadata procedures on
an arc so promising that we have no need to rely on general Google-style
tools for discovery? 
>> 
>> - Steve
>> 
>>
= 
>> 
>> On 1/13/2014 12:12
PM, John Graybeal wrote: 
>> 
>>> Not sure I am following you --
constraints are presumably there for a reason, I wasn't sure what the
reason was for these particular constraints, but thought they might have
simply echoed earlier netCDF constraints. 
>>> To your 'use case'
question, we were thinking about alternatives to mx_ as prefix for our
own attributes, to minimize the chance of collisions (e.g., with some
maintenance variables someone might name mx_). 
>>> john 
>>> 
>>> On
Jan 13, 2014, at 11:

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-14 Thread Chris Barker
There is another reason:

mapping CF variable names directly to programming language variable names
is pretty handy -- so it's nice if those are legal.

I'm sure not all programming languages have the same restrictions on names,
but there is surely a subset that's pretty common (i.e. none of the usual
math characters).

-Chris



On Mon, Jan 13, 2014 at 12:57 PM, Steve Hankin wrote:

>  Hi John,
>
> Philosophically I am aligned with Bryan:  the purpose of the CF standard
> is to constrain (simplify and make predictable) the use of a highly general
> file creation toolkit like netCDF.   The question of limitations placed on
> name strings should be evaluated on this yard stick.
>
> There is a class of problems that are created by embedding special syntax
> characters willy-nilly into name strings.  Namely, that the use of such
> characters can render mathematical expressions ambiguous.  Here's a simple
> example.  Suppose a file contains 3 surface marine variables -- lets say
> atmospheric CO2, ocean CO2 and an artfully computed delta across the
> surface.  Further say that the file creator chooses to name the delta
> variable using a "-", as in
> atmosCO2
> waterCO2
> and
> atmosCO2-waterCO2
>
> Then the meaning of the mathematical expression  "atmosCO2-waterCO2" has
> been rendered ambiguous.  Is it a single variable name, or the difference
> of two?   One is forced to use arbitrary tricks that are alien to the
> scientific users we are trying to serve -- say disambiguating  the
> expression by insisting on surrounding quotes, "atmosCO2"-"waterCO2",
> white space, "atmosCO2 - waterCO2".  (Would any scientist read "atmosCO2
> - waterCO2" and "atmosCO2-waterCO2" to have distinct meanings?)
>
> As you say we have already headed down this (slippery) slope.  Characters
> like "+", "-", "." and case-sensitivity have leaked through into fairly
> common practice.  For better or worse.  :-(   (Should the publishers of
> science textbooks start using case-sensitive variable names?)  So the
> question that you've posed is in a sense, *now that the horse is out of
> the barn, is there any merit to keeping the other animals penned?*   Like
> Brian, I would argue that the way to answer this is to insist that at least
> there be significant gains from letting them out.
>
> Another unintended negative consequence:  the impact on free text searches
> when our variable names include special syntax characters.   Are our
> metadata procedures on an arc so promising that we have no need to rely on
> general Google-style tools for discovery?
>
>   - Steve
>
> =
>
>
> On 1/13/2014 12:12 PM, John Graybeal wrote:
>
> Not sure I am following you -- constraints are presumably there for a
> reason, I wasn't sure what the reason was for these particular constraints,
> but thought they might have simply echoed earlier netCDF constraints.
>
>  To your 'use case' question, we were thinking about alternatives to mx_
> as prefix for our own attributes, to minimize the chance of collisions
> (e.g., with some maintenance variables someone might name mx_).
>
>  john
>
>  On Jan 13, 2014, at 11:27, Bryan Lawrence 
> wrote:
>
>  Hi John
>
>  In the spirit of CF being *constrained* netCDF, it seems that we
> wouldn't, unless we had a specific use case ... do you?
>
> Cheers
> Bryan
>
>
> On 13 January 2014 18:54,  wrote:
>
>> As netCDF is growing to allow @, +, hyphen, and period in
>> variable/dimension/attribute names, is there any likelihood CF will grow to
>> allow some or all of those characters?
>>
>> I seem to recall some tools have conflicts with some of those characters
>> (aside from them being non-conformant). But consistency and flexibility
>> would be nice.
>>
>> john
>> 
>> John Graybeal
>> Sr. Data Manager, Metadata & Semantics
>>
>> M +1 408 675-5445
>> skype: graybealski
>> Marinexplore
>> 920 Stewart Drive
>> Sunnyvale 94085
>> California, USA
>> www.marinexplore.com
>>
>>
>> --
>> Scanned by iCritical.
>>
>>
>
>
>  --
>
>  Bryan Lawrence
> University of Reading: Professor of Weather and Climate Computing.
> National Centre for Atmospheric Science: Director of Models and Data.
> STFC: Director of the Centre for Environmental Data Archival.
> Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence
>
>
>
> *John Graybeal*
> Sr. Data Manager, Metadata & Semantics
>
> M +1 408 675-5445
> skype: graybealski
> Marinexplore
> 920 Stewart Drive
> Sunnyvale 94085
> California, USA
> www.marinexplore.com 
>
>
>
> ___
> CF-metadata mailing 
> listcf-metad...@cgd.ucar.eduhttp://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-13 Thread Steve Hankin

Hi John,

Philosophically I am aligned with Bryan:  the purpose of the CF standard 
is to constrain (simplify and make predictable) the use of a highly 
general file creation toolkit like netCDF.   The question of limitations 
placed on name strings should be evaluated on this yard stick.


There is a class of problems that are created by embedding special 
syntax characters willy-nilly into name strings.  Namely, that the use 
of such characters can render mathematical expressions ambiguous.  
Here's a simple example.  Suppose a file contains 3 surface marine 
variables -- lets say atmospheric CO2, ocean CO2 and an artfully 
computed delta across the surface.  Further say that the file creator 
chooses to name the delta variable using a "-", as in

atmosCO2
waterCO2
and
//  atmosCO2-waterCO2

Then the meaning of the mathematical expression  "atmosCO2-waterCO2" has 
been rendered ambiguous.  Is it a single variable name, or the 
difference of two?   One is forced to use arbitrary tricks that are 
alien to the scientific users we are trying to serve -- say 
disambiguating  the expression by insisting on surrounding quotes, 
"atmosCO2"-"waterCO2", white space, "atmosCO2 - waterCO2".  (Would any 
scientist read "atmosCO2 - waterCO2" and "atmosCO2-waterCO2" to have 
distinct meanings?)


As you say we have already headed down this (slippery) slope. Characters 
like "+", "-", "." and case-sensitivity have leaked through into fairly 
common practice.  For better or worse. :-(   (Should the publishers of 
science textbooks start using case-sensitive variable names?)  So the 
question that you've posed is in a sense, /now that the horse is out of 
the barn, is there any merit to keeping the other animals penned?/   
Like Brian, I would argue that the way to answer this is to insist that 
at least there be significant gains from letting them out.


Another unintended negative consequence:  the impact on free text 
searches when our variable names include special syntax characters.   
Are our metadata procedures on an arc so promising that we have no need 
to rely on general Google-style tools for discovery?


  - Steve

=

On 1/13/2014 12:12 PM, John Graybeal wrote:
Not sure I am following you -- constraints are presumably there for a 
reason, I wasn't sure what the reason was for these particular 
constraints, but thought they might have simply echoed earlier netCDF 
constraints.


To your 'use case' question, we were thinking about alternatives to 
mx_ as prefix for our own attributes, to minimize the chance of 
collisions (e.g., with some maintenance variables someone might name mx_).


john

On Jan 13, 2014, at 11:27, Bryan Lawrence > wrote:



Hi John

In the spirit of CF being *constrained* netCDF, it seems that we 
wouldn't, unless we had a specific use case ... do you?


Cheers
Bryan


On 13 January 2014 18:54, > wrote:


As netCDF is growing to allow @, +, hyphen, and period in
variable/dimension/attribute names, is there any likelihood CF
will grow to allow some or all of those characters?

I seem to recall some tools have conflicts with some of those
characters (aside from them being non-conformant). But
consistency and flexibility would be nice.

john

John Graybeal
Sr. Data Manager, Metadata & Semantics

M +1 408 675-5445 
skype: graybealski
Marinexplore
920 Stewart Drive
Sunnyvale 94085
California, USA
www.marinexplore.com
>


--
Scanned by iCritical.




--

Bryan Lawrence
University of Reading: Professor of Weather and Climate Computing.
National Centre for Atmospheric Science: Director of Models and Data.
STFC: Director of the Centre for Environmental Data Archival.
Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence 




*John Graybeal*
Sr. Data Manager, Metadata & Semantics

M +1 408 675-5445
skype: graybealski
Marinexplore
920 Stewart Drive
Sunnyvale 94085
California, USA
www.marinexplore.com 



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-13 Thread John Graybeal
Not sure I am following you -- constraints are presumably there for a reason, I 
wasn't sure what the reason was for these particular constraints, but thought 
they might have simply echoed earlier netCDF constraints.

To your 'use case' question, we were thinking about alternatives to mx_ as 
prefix for our own attributes, to minimize the chance of collisions (e.g., with 
some maintenance variables someone might name mx_).

john

On Jan 13, 2014, at 11:27, Bryan Lawrence  wrote:

> Hi John
> 
> In the spirit of CF being *constrained* netCDF, it seems that we wouldn't, 
> unless we had a specific use case ... do you?
> 
> Cheers
> Bryan
> 
> 
> On 13 January 2014 18:54,  wrote:
> As netCDF is growing to allow @, +, hyphen, and period in 
> variable/dimension/attribute names, is there any likelihood CF will grow to 
> allow some or all of those characters?
> 
> I seem to recall some tools have conflicts with some of those characters 
> (aside from them being non-conformant). But consistency and flexibility would 
> be nice.
> 
> john
> 
> John Graybeal
> Sr. Data Manager, Metadata & Semantics
> 
> M +1 408 675-5445
> skype: graybealski
> Marinexplore
> 920 Stewart Drive
> Sunnyvale 94085
> California, USA
> www.marinexplore.com
> 
> 
> --
> Scanned by iCritical.
> 
> 
> 
> 
> -- 
> 
> Bryan Lawrence
> University of Reading: Professor of Weather and Climate Computing.
> National Centre for Atmospheric Science: Director of Models and Data.
> STFC: Director of the Centre for Environmental Data Archival.
> Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence 


John Graybeal
Sr. Data Manager, Metadata & Semantics

M +1 408 675-5445
skype: graybealski
Marinexplore
920 Stewart Drive
Sunnyvale 94085
California, USA
www.marinexplore.com

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-13 Thread Bryan Lawrence
Hi John

In the spirit of CF being *constrained* netCDF, it seems that we wouldn't,
unless we had a specific use case ... do you?

Cheers
Bryan


On 13 January 2014 18:54,  wrote:

> As netCDF is growing to allow @, +, hyphen, and period in
> variable/dimension/attribute names, is there any likelihood CF will grow to
> allow some or all of those characters?
>
> I seem to recall some tools have conflicts with some of those characters
> (aside from them being non-conformant). But consistency and flexibility
> would be nice.
>
> john
> 
> John Graybeal
> Sr. Data Manager, Metadata & Semantics
>
> M +1 408 675-5445
> skype: graybealski
> Marinexplore
> 920 Stewart Drive
> Sunnyvale 94085
> California, USA
> www.marinexplore.com
>
>
> --
> Scanned by iCritical.
>
>


-- 

Bryan Lawrence
University of Reading: Professor of Weather and Climate Computing.
National Centre for Atmospheric Science: Director of Models and Data.
STFC: Director of the Centre for Environmental Data Archival.
Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata