Dear all, after reading through the unusally vivid discussion on this issue, I feel like I have to make a statement as well. My background being the experience with not-so-CF-compliant "CF-netCDF" datasets submitted to us by several different groups and the resulting development of a CFchecker to improve this situation in general.
I think Chris' response gives me a good starting point I can agree with. 1. I also doubt that anyone actually reads the binary netCDF files (or hexdumps of those files) directly. Tools like ncdump are probably what people are most often thinking of when talking about "looking at the raw file/data". ncdump _is already_ a "client library"! So we _already_ rely on a computer program (ncdump) decoding the binary data properly for us. It has also already been pointed out that ncdump can actually convert (proper) netCDF time encoding to ISO format (-t parameter). So I really don't see what the argument should be here. I absolutely disagree with the position against fixing this in libraries. This is exactly what libraries are for. Maybe this kind of time en-/decoding has to be implemented in a lower level library (like the netCDF reader/writer itself) - but in a library nonetheless. 2. Talking about those imaginary data archeologists: The source code to read and write netCDF data files is open source as far as I know (and some compilers that can compile it are as well). Would those archeologists not look into the source code first to figure out the encoding people used at the time instead of trying to reverse engineer ISO timestamps from hex dumps or binary files all by themselves? 3. What exactly would the use case of the additional encoding be? Sure, you can try to ncdump a file to get an idea of its contents. But why do you do that in the first place? As already pointed out by Chris this is hardly the "real world" use case because in the end you _will_ have to process your data with some sort of program that can read (CF-)netCDF. The real reason I see why you have to look at ncdump output of files regularly is because the (CF-)Metadata in the file is so bad that you need to manually figure out what it contains at all. I do not want to put all the blame on the file creator - CF is still pretty complicated to understand unfortunately - but incomplete or wrong metadata is a problem the user has a fair share in. This in itself is a problem that won't be solved by adding yet more options for the user to do something wrong - which leads to the next problem: consistency. 4. Jonathan already pointed out the consistency problem. Adding redundant information to a dataset is something I consider a really bad idea. It _will_ lead to inconsistencies in metadata/files. And what do you do then? Which data do you trust? The netCDF-encoded version in the time variable or the extra ISO time? In my opinion this just creates a new problem instead of solving one. The entire goal of CF (in my opinion) is to make data machine readable (else you could just stick with ASCII if you want to do everything by hand anyway). 5. Then there are still the encoding problems pointed out by several people. The big ones I see here are calendards and time zones. Again, there is a lot a user can (and will) do wrong. In the end there is not much use in ISO time labels if you can't trust that they are properly defined anyway. While I have been writing up this lengthy statement of mine (sorry!) Martin has also expressed his views on this issue, so I want to say that I agree on the 2nd point he makes. It might be useful to preserve some original labels from the data source. Those could also include ISO timestamps that the instruments themselves generated. I have absolutely no problem with adding any kind of such data within the scope of what CF allows (or can be modified to allow). However, such information should probably not be used or trusted for automated computations in interoperable systems. What you do with your own data is of course your own choice - but you don't need CF if you don't want to share data anyway. Cheers, Michael On 20.03.2013 00:26, Chris Barker - NOAA Federal wrote: > Richard, > > Very well put! > > >> However you choose to peer into your netCDF files you are seeing them >> through the lens of a "client library". > > This is a very good point -- indeed, even if we use a text > representation of dates, that's really still binary on disk, though > with a well-known encoding (ascii/unicode, or ??/) > > So there is, by definition, no human-readable encoding available! > >> But if the library can't do machine-to-human then it probably >> can't do human-to-machine. In which case there's very little you can >> actually _do_ with the date/time values (e.g. determine ordering or compute >> intervals). > > Bingo! > > Indeed, for "real" use cases, human readability really is worthless -- > most data sets are far too big to do anything useful with the data by > hand anyway. > > John Caron wrote: > >> An insidious mistake is to think that problems should be fixed in software >> libraries. > > Fair enough, but I'm not sure we have a "problem" here at all. And > indeed, it way be just as insidious to think that problems can be > solved with some nifty new addition to an existing data standard. > > 1) I agree with Steve that we aren't in a position at this point to > decide what the best encoding of datetime is -- rather, we are > deciding if adding another encoding is a good idea. Unless someone is > suggesting deprecating the old one. > > 2) I'm also not at all sure that string representations are a better > way to go -- netcdf is primarily for consumption by computer programs > -- and the existing encoding is a pretty natural fit for that. > > 3) I don't see how datetime strings "solve" the calendar problem -- > sure, it's clear what calendar data the provider intended, but if you > want to know how much time passed between two dates, you're back to > the same problem (I just noticed that you pointed that out) -- I > actually think time deltas are often more important that absolute > times anyway. > >> Finish your beer and ill order another round. > > I'll get the round after that -- it take a few! > > John Graybeal wrote: > >> (Note that among those users are people who look at binary dumps of files, >> of which I am one but I'm sure there are many others.) > > binary dumps? in hex? and ISO strings are somehow readable there? huh? > > (ncdump, which I use often is not a binary dump, it's ascii dump (does > it support any other text encoding?), and already handles the > conversion to iso strings (in recent versions, anyway). > > By the way, no objection to the standard name -- none at all. > > -Chris > > -- Michael Decker Forschungszentrum Jülich Institut für Energie- und Klimaforschung - Troposphäre (IEK-8) E-Mail: m.dec...@fz-juelich.de ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
_______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata