All, I am probably missing something in this discussion. Since Pedro asked me to chime in and answer his question, I’ll try... [I am referring to Pedro’s initial question "Regarding the handling of both HDF5 and netCDF, it seems there is a potential issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by the netCDF API?”]
netCDF-4 file is an HDF5 file. netCDF-4 is not a file format but a convention how to store data that is described by the netCDF-4 data model in HDF5. I don’t think there is a solution to the problem which APIs wrote the file. One can write a pure C program that doesn’t call HDF5 or netCDF-4 library but writes an HDF5 file according to the HDF5 file format and to the netCDF-4 convention making it a netCDF-4 file. One should probably have a checker function that traverses an HDF5 file and tells if the file is compliant with the netCDF-4 convention. Adding attributes, etc., really will not help. I can add an attribute to a “non-netCDF-4" HDF5 file and fool netCDF-4 library. I can also write netCDF-4 file using just pure HDF5 library by following convention of the netCDF-4 library. I think the tool should follow Common Data Model and shield data formats from the user. What I am missing? Elena ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820 217.531.6112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Apr 24, 2016, at 6:08 PM, Pedro Vicente <[email protected]> wrote: > All > > I posted some code on github that solves the issue for older netCDF files, > see below > > In reply to previous comments > > @ John Caron > >>> Here are the blogs: >>> >>> http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions > > I had seen some of your blogs but not the one above. > By looking at the netCDF code I came out with the code below that uses > detection > of one of the "hidden" attributes described in that blog, and other one that > is not described > > > @ David Brown > >>> But this is not ideal, because we only >> want to open files that are explicitly written using NetCDF4 as >> NetCDF > > Hi David, yes, that's the issue. > > I think this piece of code I posted on github is the possible best solution > for this: > > https://github.com/pedro-vicente/netcdf-detect > > @ Ed Hartnett > > I wrote that code by reading the comments you wrote on the files nc4file.c > and nc4hdf.c > > here > > https://github.com/Unidata/netcdf-c/tree/master/libsrc4 > > do you agree with the solution? > > > anyone feel free to use that code > > the C function is called is_netcdf() > > the netCDF API writes, if variables and dimensions are present in the file: > > 1) an attribute named "_Netcdf4Dimid" (in some cases) > 2) an attribute named "NAME", (always), saved by the HDF5 Dimension Scales > API, > that contains the string "This is a netCDF dimension but not a netCDF > variable." > > This utility tries to detect both attributes by traversing the HDF5 file, if > either case is found, it returns a value of 1 > > the program includes 3 test cases: 2 cases that generate a file with 1) and > 2) above (they are mutually exclusive, it seems) > , and a third one that simply does > > nc_create > nc_close > > in this case, the above attributes are not written, so the test will fail, > like someone posted here before. > I would say if someone writes this kind of file, it is irrelevant using HDF5 > or netCDF , the files are virtually identical > > * another * case that would give a false positive is the case where someone > tries to be a spoiler and uses the HDF5 API > to write these 2 attributes > "_Netcdf4Dimid" > "This is a netCDF dimension but not a netCDF variable." > > The only real "spoiler proof" 100% solution is the SOLUTION 1 I posted before: > to have HDF5 save a byte in the file that explicitly tells what kind of > derived API it is. > This function would be a private HDF5 function called by the derived API, say > called on > nc_create() > So, it does not deal with attributes written by public APIs at all > > @ Elena Pourmal > > Hi Elena, how are you? > > Any change of discussing this solution? > > by the way some of my email on this thread sent to the hdf-forum last Friday > is waiting for approval > > "Your mail to 'Hdf-forum' with the subject... > Is being held until the list moderator can review it for approval. > " > > The hdf-forum now requires approval by a moderator? > that does not work very well on weekends for example > > > ---------------------- > Pedro Vicente > [email protected] > https://twitter.com/_pedro__vicente > http://www.space-research.org/ > > > > ----- Original Message ----- From: "David Brown" <[email protected]> > To: <[email protected]> > Sent: Saturday, April 23, 2016 3:06 PM > Subject: Re: [netcdfgroup] netcdfgroup Digest, Vol 1126, Issue 2 > > >> Since Pedro asked earlier about how NCL distinguishes between NetCDF4 >> and HDF5, I'm going to add my 2 cents to what now appears to be the >> longest thread ever on this mailing list. >> >> First a bit of background. Traditionally NCL has distinguished among >> file formats based solely on file extensions. If a file name ends with >> ".nc" then it is considered to be a NetCDF file and will be opened >> using the NetCDF library calls. Additionally there is an idiosyncratic >> feature where you can add an "virtual" extension to a file name to >> specify the format you want to use. For example, if the file is name >> "test", you can open it as "test.h5" to open it using HDF5 calls. >> Given this name NCL will look first for a file called "test.h5" and if >> that is not found then it will look for "test". You can even add >> extensions to files that already have them to open a file using >> another format: e.g. test.hdf.nc. >> >> But recent versions of NCL attempt to figure out the format of files >> that do not have recognized extensions. And that means we have >> definitely run into the issue that Pedro originally brought up. We >> want our HDF5 module to handle HDF5 files on their own terms, >> including, e.g., recognizing reference types. For now, we first try to >> see if the file can be opened using the NetCDF library, and if not, we >> try various versions of HDF. But this is not ideal, because we only >> want to open files that are explicitly written using NetCDF4 as >> NetCDF. So it is indeed welcome news that there will be global >> attributes added to explicitly identify the file as NetCDF4. However, >> it also would be nice if nc_inq_format or nc_inq_format_extended could >> be adjusted to give a definitive answer as to whether the file was >> created as NetCDF4. I have to admit I was quite surprised to discover >> that nc_inq_format_extended would not answer this seemingly obvious >> (to me at least) question. >> -Dave Brown >> NCL technical architect >> >> >> On Sat, Apr 23, 2016 at 10:21 AM, <[email protected]> >> wrote: >>> Send netcdfgroup mailing list submissions to >>> [email protected] >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> http://mailman.unidata.ucar.edu/mailman/listinfo/netcdfgroup >>> or, via email, send a message with subject or body 'help' to >>> [email protected] >>> >>> You can reach the person managing the list at >>> [email protected] >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of netcdfgroup digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: [CF-metadata] [Hdf-forum] Detecting netCDF versus HDF5 -- >>> PROPOSED SOLUTIONS --REQUEST FOR COMMENTS (John Caron) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 22 Apr 2016 21:57:51 -0600 >>> From: John Caron <[email protected]> >>> To: Pedro Vicente <[email protected]> >>> Cc: [email protected], NetCDF-Java community >>> <[email protected]>, [email protected] >>> Subject: Re: [netcdfgroup] [CF-metadata] [Hdf-forum] Detecting netCDF >>> versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS >>> Message-ID: >>> >>> <can1vdkp3iyvabcevoc8irp83avkt85mq+h75pwu_l-dexjw...@mail.gmail.com> >>> Content-Type: text/plain; charset="utf-8" >>> >>> Here are the blogs: >>> >>> http://www.unidata.ucar.edu/blogs/developer/en/entry/dimensions_scales >>> http://www.unidata.ucar.edu/blogs/developer/en/entry/dimension_scale2 >>> http://www.unidata.ucar.edu/blogs/developer/en/entry/dimension_scales_part_3 >>> http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions >>> http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_use_of_dimension_scales >>> >>> On Fri, Apr 22, 2016 at 7:57 AM, Pedro Vicente < >>> [email protected]> wrote: >>> >>>> John >>>> >>>> >>>i have written various blogs on the unidata site about why netcdf4 != >>>> hdf5, and what the unique signature for shared dimensions looks like, in >>>> >>>case you want details. >>>> >>>> yes, I am interested, I had the impression by looking at the code some >>>> years ago that netCDF writes some unique name attributes somewhere >>>> >>>> ---------------------- >>>> Pedro Vicente >>>> [email protected] >>>> https://twitter.com/_pedro__vicente >>>> http://www.space-research.org/ >>>> >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> *From:* John Caron <[email protected]> >>>> *To:* Pedro Vicente <[email protected]> >>>> *Cc:* [email protected] ; Discussion forum for the NeXus data >>>> format <[email protected]> ; [email protected] ; Dennis >>>> Heimbigner <[email protected]> ; NetCDF-Java community >>>> <[email protected]> >>>> *Sent:* Thursday, April 21, 2016 11:11 PM >>>> *Subject:* Re: [CF-metadata] [netcdfgroup] [Hdf-forum] Detecting netCDF >>>> versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS >>>> >>>> 1) I completely agree with the idea of adding system metadata that >>>> indicates the library version(s) that wrote the file. >>>> >>>> 2) the way shared dimensions are implemented by netcdf4 is a unique >>>> signature that would likely identify (100 - epsilon) % of real data >>>> files >>>> in the wild. One could add such detection to the netcdf4 and/or hdf5 >>>> libraries, and/or write a utility program to detect. >>>> >>>> there are 2 variants: >>>> >>>> 2.1) one could write a netcdf4 file without shared dimensions, though im >>>> pretty sure no one does. but you could argue then that its fine to just >>>> treat it as an hdf5 file and read through hdf5 library >>>> >>>> 2.2) one could write a netcdf4 file with hdf5 library, if you knew what >>>> you are doing. i have heard of this happening. but then you could argue >>>> that its really a netcdf4 file and you should use netcdf library to read >>>> . >>>> >>>> i have written various blogs on the unidata site about why netcdf4 != >>>> hdf5, and what the unique signature for shared dimensions looks like, in >>>> case you want details. >>>> >>>> On Thu, Apr 21, 2016 at 4:18 PM, Pedro Vicente < >>>> [email protected]> wrote: >>>> >>>>> If you have hdf5 files that should be readable, then I will undertake >>>>> to >>>>>> look at them and see what the problem is. >>>>>> >>>>> >>>>> >>>>> ok, thank you >>>>> >>>>> WRT to old files: We could produce a utility that would redef the file >>>>>> and insert the >>>>>> _NCProperties attribute. This would allow someone to wholesale >>>>>> mark old files. >>>>>> >>>>> >>>>> >>>>> Excellent idea , Dennis >>>>> >>>>> ---------------------- >>>>> Pedro Vicente >>>>> [email protected] >>>>> https://twitter.com/_pedro__vicente >>>>> http://www.space-research.org/ >>>>> >>>>> >>>>> ----- Original Message ----- From: <[email protected]> >>>>> To: "Pedro Vicente" <[email protected]>; < >>>>> [email protected]>; "Discussion forum for the NeXus data format" >>>>> < >>>>> [email protected]>; <[email protected]> >>>>> Sent: Thursday, April 21, 2016 5:02 PM >>>>> Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 -- >>>>> PROPOSED SOLUTIONS --REQUEST FOR COMMENTS >>>>> >>>>> >>>>> If you have hdf5 files that should be readable, then I will undertake >>>>> to >>>>>> look at them and see what the problem is. >>>>>> WRT to old files: We could produce a utility that would redef the >>>>>> file >>>>>> and insert the >>>>>> _NCProperties attribute. This would allow someone to wholesale >>>>>> mark old files. >>>>>> =Dennis Heimbigner >>>>>> Unidata >>>>>> >>>>>> >>>>>> On 4/21/2016 2:17 PM, Pedro Vicente wrote: >>>>>> >>>>>>> Dennis >>>>>>> >>>>>>> I am in the process of adding a global attribute in the root group >>>>>>>>>>> >>>>>>>>>> that captures both the netcdf library version and the hdf5 library >>>>>>>> version >>>>>>>> whenever a netcdf file is created. The current form is >>>>>>>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..." >>>>>>>> >>>>>>> >>>>>>> >>>>>>> ok, good to know, thank you >>>>>>> >>>>>>> >>>>>>> > 1. I am open to suggestions about changing the format or adding >>>>>>>>>> info > to it. >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> I personally don't care, anything that uniquely identifies a netCDF >>>>>>> file (HDF5 based) as such will work >>>>>>> >>>>>>> >>>>>>> 2. Of course this attribute will not exist in files written using >>>>>>> older >>>>>>>>>> >>>>>>>>> versions of the netcdf library, but at least the process will have >>>>>>>> begun. >>>>>>>> >>>>>>> >>>>>>> yes >>>>>>> >>>>>>> >>>>>>> 3. This technically does not address the original issue because there >>>>>>>> exist >>>>>>>> hdf5 files not written by netcdf that are still compatible >>>>>>>> with >>>>>>>> and can be >>>>>>>> read by netcdf. Not sure this case is important or not. >>>>>>>> >>>>>>> >>>>>>> there will always be HDF5 files not written by netcdf that netCDF >>>>>>> will >>>>>>> read as we are now. >>>>>>> >>>>>>> this is not really the issue, but you just made a further issue :-) >>>>>>> >>>>>>> the issue is that I would like an application that reads a netCDF >>>>>>> (HDF5 >>>>>>> based) file to decide to use the netCDF or HDF5 API. >>>>>>> your attribute writing will do , for future files. >>>>>>> for older nertCDF files there may be a way to detect the current >>>>>>> attributes and data structures to see if we can make it "identify >>>>>>> itself" >>>>>>> as netCDF. A bit of debugging will confirm that, since Dimension >>>>>>> Scales >>>>>>> are used, that would be an (imperfect maybe) way to do it >>>>>>> >>>>>>> regarding the "further issue " above >>>>>>> >>>>>>> you could go one step further and for any HDF5 files not written by >>>>>>> netcdf , you could make netCDF reject the file reading, >>>>>>> because it's not "netCDF compliant". >>>>>>> Since having netCDF read pure HDF5 files is not a problem (at least >>>>>>> for >>>>>>> me), I don't know if you would want to do this, just an idea. >>>>>>> In my mind taking complexity and ambiguities of problems is always a >>>>>>> good thing >>>>>>> >>>>>>> >>>>>>> ah, I forgot one thing, related to this >>>>>>> >>>>>>> >>>>>>> In the past I have found several pure HDF5 files that netCDF failed >>>>>>> in >>>>>>> reading. >>>>>>> Since netCDF is HDF5 binary compatible, one would expect that all >>>>>>> HDF5 >>>>>>> files will be read by netCDF. >>>>>>> Except if you specifically wrote something in the code that makes it >>>>>>> to >>>>>>> fail if some condition is not met, >>>>>>> This was a while ago, I'll try to find those cases and I'll send a >>>>>>> bug >>>>>>> report to the bug report email >>>>>>> >>>>>>> ---------------------- >>>>>>> Pedro Vicente >>>>>>> [email protected] >>>>>>> https://twitter.com/_pedro__vicente >>>>>>> http://www.space-research.org/ >>>>>>> >>>>>>> ----- Original Message ----- From: <[email protected]> >>>>>>> To: "Pedro Vicente" <[email protected]>; "HDF Users >>>>>>> Discussion List" <[email protected]>; < >>>>>>> [email protected]>; "Discussion forum for the NeXus data >>>>>>> format" <[email protected]>; <[email protected]> >>>>>>> Cc: "John Shalf" <[email protected]>; <[email protected]>; >>>>>>> "Marinelli, Daniel J. (GSFC-5810)" <[email protected]>; >>>>>>> "Miller, Mark C." <[email protected]> >>>>>>> Sent: Thursday, April 21, 2016 2:30 PM >>>>>>> Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus >>>>>>> HDF5 -- >>>>>>> PROPOSED SOLUTIONS --REQUEST FOR COMMENTS >>>>>>> >>>>>>> >>>>>>> I am in the process of adding a global attribute in the root group >>>>>>>> that captures both the netcdf library version and the hdf5 library >>>>>>>> version >>>>>>>> whenever a netcdf file is created. The current form is >>>>>>>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..." >>>>>>>> Where version is the version of the _NCProperties attribute and the >>>>>>>> others >>>>>>>> are e.g. 1.8.18 or 4.4.1-rc1. >>>>>>>> Issues: >>>>>>>> 1. I am open to suggestions about changing the format or adding info >>>>>>>> to it. >>>>>>>> 2. Of course this attribute will not exist in files written using >>>>>>>> older versions >>>>>>>> of the netcdf library, but at least the process will have begun. >>>>>>>> 3. This technically does not address the original issue because >>>>>>>> there >>>>>>>> exist >>>>>>>> hdf5 files not written by netcdf that are still compatible >>>>>>>> with >>>>>>>> and can be >>>>>>>> read by netcdf. Not sure this case is important or not. >>>>>>>> =Dennis Heimbigner >>>>>>>> Unidata >>>>>>>> >>>>>>>> >>>>>>>> On 4/21/2016 9:33 AM, Pedro Vicente wrote: >>>>>>>> >>>>>>>>> DETECTING HDF5 VERSUS NETCDF GENERATED FILES >>>>>>>>> REQUEST FOR COMMENTS >>>>>>>>> AUTHOR: Pedro Vicente >>>>>>>>> >>>>>>>>> AUDIENCE: >>>>>>>>> 1) HDF, netcdf developers, >>>>>>>>> Ed Hartnett >>>>>>>>> Kent Yang >>>>>>>>> 2) HDF, netcdf users, that replied to this thread >>>>>>>>> Miller, Mark C. >>>>>>>>> John Shalf >>>>>>>>> 3 ) netcdf tools developers >>>>>>>>> Mary Haley , NCL >>>>>>>>> 4) HDF, netcdf managers and sponsors >>>>>>>>> David Pearah , CEO HDF Group >>>>>>>>> Ward Fisher, UCAR >>>>>>>>> Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA >>>>>>>>> 5) >>>>>>>>> [CF-metadata] list >>>>>>>>> After this thread started 2 months ago, there was an annoucement on >>>>>>>>> the [CF-metadata] mail list >>>>>>>>> about >>>>>>>>> "a meeting to discuss current and future netCDF-CF efforts and >>>>>>>>> directions. >>>>>>>>> The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at >>>>>>>>> the >>>>>>>>> UCAR Center Green facility." >>>>>>>>> This would be a good topic to put on the agenda, maybe? >>>>>>>>> THE PROBLEM: >>>>>>>>> Currently it is impossible to detect if an HDF5 file was generated >>>>>>>>> by >>>>>>>>> the HDF5 API or by the netCDF API. >>>>>>>>> See previous email about the reasons why. >>>>>>>>> WHY THIS MATTERS: >>>>>>>>> Software applications that need to handle both netCDF and HDF5 >>>>>>>>> files >>>>>>>>> cannot decide which API to use. >>>>>>>>> This includes popular visualization tools like IDL, Matlab, NCL, >>>>>>>>> HDF >>>>>>>>> Explorer. >>>>>>>>> SOLUTIONS PROPOSED: 2 >>>>>>>>> SOLUTION 1: Add a flag to HDF5 source >>>>>>>>> The hdf5 format specification, listed here >>>>>>>>> https://www.hdfgroup.org/HDF5/doc/H5.format.html >>>>>>>>> describes a sequence of bytes in the file layout that have special >>>>>>>>> meaning for the HDF5 API. It is common practice, when designing a >>>>>>>>> data >>>>>>>>> format, >>>>>>>>> so leave some fields "reserved for future use". >>>>>>>>> This solution makes use of one of these empty "reserved for future >>>>>>>>> use" spaces to save a byte (for example) that describes an >>>>>>>>> enumerator >>>>>>>>> of "HDF5 compatible formats". >>>>>>>>> An "HDF5 compatible format" is a data format that uses the HDF5 API >>>>>>>>> at a lower level (usually hidden from the user of the upper API), >>>>>>>>> and providing its own API. >>>>>>>>> This category can still be divide in 2 formats: >>>>>>>>> 1) A "pure HDF5 compatible format". Example, NeXus >>>>>>>>> http://www.nexusformat.org/ >>>>>>>>> NeXus just writes some metadata (attributes) on top of the HDF5 >>>>>>>>> API, >>>>>>>>> that has some special meaning for the NeXus community >>>>>>>>> 2) A "non pure HDF5 compatible format". Example, netCDF >>>>>>>>> Here, the format adds some extra feature besides HDF5. In the case >>>>>>>>> of >>>>>>>>> netCDF, these are shared dimensions between variables. >>>>>>>>> This sub-division between 1) and 2) is irrelevant for the problem >>>>>>>>> and >>>>>>>>> solution in question >>>>>>>>> The solution consists of writing a different enumerator value on >>>>>>>>> the >>>>>>>>> "reserved for future use" space. For example >>>>>>>>> Value decimal 0 (current value): This file was generated by the >>>>>>>>> HDF5 >>>>>>>>> API (meaning the HDF5 only API) >>>>>>>>> Value decimal 1: This file was generated by the netCDF API (using >>>>>>>>> HDF5) >>>>>>>>> Value decimal 2: This file was generated by <put here another HDF5 >>>>>>>>> based format> >>>>>>>>> and so on >>>>>>>>> The advantage of this solution is that this process involves 2 >>>>>>>>> parties: the HDF Group and the other format's organization. >>>>>>>>> This allows the HDF Group to "keep track" of new HDF5 based formats >>>>>>>>> . >>>>>>>>> It allows to make the other format "HDF5 certified" . >>>>>>>>> SOLUTION 2: Add some metadata to the other API on top of HDF5 >>>>>>>>> This is what Nexus uses. >>>>>>>>> A Nexus file on creation writes several attributes on the root >>>>>>>>> group, >>>>>>>>> like "NeXus_version" and other numeric data. >>>>>>>>> This is done using the public HDF5 API calls. >>>>>>>>> The solution for netCDF consists of the same approach, just write >>>>>>>>> some specific attributes, and a special netCDF API to write/read >>>>>>>>> them. >>>>>>>>> This solutions just requires the work of one party (the netCDF >>>>>>>>> group) >>>>>>>>> END OF RFC >>>>>>>>> In reply to people that commented in the thread >>>>>>>>> @John Shalf >>>>>>>>> >>Perhaps NetCDF (and other higher-level APIs that are built on top >>>>>>>>> >>of >>>>>>>>> HDF5) should include an attribute attached >>>>>>>>> >>to the root group that identifies the name and version of the API >>>>>>>>> that created the file? (adopt this as a convention) >>>>>>>>> yes, that's one way to do it, Solution 2 above >>>>>>>>> @Mark Miller >>>>>>>>> >>>Hmmm. Is there any big reason NOT to try to read a netCDF >>>>>>>>> >>>produced >>>>>>>>> HDF5 file with the native HDF5 library if someone so chooses? >>>>>>>>> It's possible to read a netCDF file using HDF5, yes. >>>>>>>>> There are 2 things that you will miss doing this: >>>>>>>>> 1) the ability to inquire about shared netCDF dimensions. >>>>>>>>> 2) the ability to read remotely with openDAP. >>>>>>>>> Reading with HDF5 also exposes metadata that is supposed to be >>>>>>>>> private to netCDF. See below >>>>>>>>> >>>> And, attempting to read an HDF5 file produced by Silo using >>>>>>>>> >>>> just >>>>>>>>> the HDF5 library (e.g. w/o Silo) is a major pain. >>>>>>>>> This I don't understand. Why not read the Silo file with the Silo >>>>>>>>> API? >>>>>>>>> That's the all purpose of this issue, each higher level API on top >>>>>>>>> of >>>>>>>>> HDF5 should be able to detect "itself". >>>>>>>>> I am not familiar with Silo, but if Silo cannot do this, then you >>>>>>>>> have the same design flaw that netCDF has. >>>>>>>>> >>>>>>>>> >>> In a cursory look over the libsrc4 sources in netCDF distro, I >>>>>>>>> >>> see >>>>>>>>> a few things that might give a hint a file was created with netCDF. >>>>>>>>> . >>>>>>>>> . >>>>>>>>> >>>> First, in NC_CLASSIC_MODEL, an attribute gets attached to the >>>>>>>>> root group named "_nc3_strict". So, the existence of an attribute >>>>>>>>> on >>>>>>>>> the root group by that name would suggest the HDF5 file was >>>>>>>>> generated by >>>>>>>>> netCDF. >>>>>>>>> I think this is done only by the "old" netCDF3 format. >>>>>>>>> >>>>> Also, I tested a simple case of nc_open, nc_def_dim, etc. >>>>>>>>> nc_close to see what it produced. >>>>>>>>> >>>> It appears to produce datasets for each 'dimension' defined >>>>>>>>> >>>> with >>>>>>>>> two attributes named "CLASS" and "NAME". >>>>>>>>> This is because netCDF uses the HDF5 Dimension Scales API >>>>>>>>> internally >>>>>>>>> to keep track of shared dimensions. These are internal attributes >>>>>>>>> of Dimension Scales. This approach would not work because an HDF5 >>>>>>>>> only file with Dimension Scales would have the same attributes. >>>>>>>>> >>>>>>>>> >>>> I like John's suggestion here. >>>>>>>>> >>>>>But, any code you add to any applications now will work *only* >>>>>>>>> for files that were produced post-adoption of this convention. >>>>>>>>> yes. there are 2 actions to take here. >>>>>>>>> 1) fix the issue for the future >>>>>>>>> 2) try to retroactively have some workaround that makes possible >>>>>>>>> now >>>>>>>>> to differentiate a HDF5/netCDF files made before the adopted >>>>>>>>> convention >>>>>>>>> see below >>>>>>>>> >>>>>>>>> >>>> In VisIt, we support >140 format readers. Over 20 of those are >>>>>>>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, >>>>>>>>> Samrai, >>>>>>>>> netCDF, Flash, Enzo, Chombo, etc., etc.) >>>>>>>>> >>>>When opening a file, how does VisIt figure out which plugin to >>>>>>>>> use? In particular, how do we avoid one poorly written reader >>>>>>>>> plugin >>>>>>>>> (which may be the wrong one for a given file) from preventing the >>>>>>>>> correct >>>>>>>>> one from being found. Its kinda a hard problem. >>>>>>>>> >>>>>>>>> Yes, that's the problem we are trying to solve. I have to say, that >>>>>>>>> is quick a list of HDF5 based formats there. >>>>>>>>> >>>> Some of our discussion is captured here. . . >>>>>>>>> http://www.visitusers.org/index.php?title=Database_Format_Detection >>>>>>>>> I"ll check it out, thank you for the suggestions >>>>>>>>> @Ed Hartnett >>>>>>>>> >>>I must admit that when putting netCDF-4 together I never >>>>>>>>> >>>considered >>>>>>>>> that someone might want to tell the difference between a "native" >>>>>>>>> HDF5 file and a netCDF-4/HDF5 file. >>>>>>>>> >>>>>Well, you can't think of everything. >>>>>>>>> This is a major design flaw. >>>>>>>>> If you are in the business of designing data file formats, one of >>>>>>>>> the >>>>>>>>> things you have to do is how to make it possible to identify it >>>>>>>>> from the >>>>>>>>> other formats. >>>>>>>>> >>>>>>>>> >>> I agree that it is not possible to canonically tell the >>>>>>>>> difference. The netCDF-4 API does use some special attributes to >>>>>>>>> track named dimensions, >>>>>>>>> >>>>and to tell whether classic mode should be enforced. But it can >>>>>>>>> easily produce files without any named dimensions, etc. >>>>>>>>> >>>So I don't think there is any easy way to tell. >>>>>>>>> I remember you wrote that code together with Kent Yang from the HDF >>>>>>>>> Group. >>>>>>>>> At the time I was with the HDF Group but unfortunately I did follow >>>>>>>>> closely what you were doing. >>>>>>>>> I don't remember any design document being circulated that explains >>>>>>>>> the internals of the "how to" make the netCDF (classic) model of >>>>>>>>> shared >>>>>>>>> dimensions >>>>>>>>> use the hierarchical group model of HDF5. >>>>>>>>> I know this was done using the HDF5 Dimension Scales (that I >>>>>>>>> wrote), >>>>>>>>> but is there any design document that explains it? >>>>>>>>> Maybe just some internal email exchange between you and Kent Yang? >>>>>>>>> Kent, how are you? >>>>>>>>> Do you remember having any design document that explains this? >>>>>>>>> Maybe something like a unique private attribute that is written >>>>>>>>> somewhere in the netCDF file? >>>>>>>>> >>>>>>>>> @Mary Haley, NCL >>>>>>>>> NCL is a widely used tool that handles both netCDF and HDF5 >>>>>>>>> Mary, how are you? >>>>>>>>> How does NCL deal with the case of reading both pure HDF5 files and >>>>>>>>> netCDF files that use HDF5? >>>>>>>>> Would you be interested in joining a community based effort to deal >>>>>>>>> with this, in case this is an issue for you? >>>>>>>>> >>>>>>>>> @David Pearah , CEO HDF Group >>>>>>>>> I volunteer to participate in the effort of this RFC together with >>>>>>>>> the HDF Group (and netCDF Group). >>>>>>>>> Maybe we could make a "task force" between HDF Group, netCDF Group >>>>>>>>> and any volunteer (such as tools developers that happen to be in >>>>>>>>> these mail >>>>>>>>> lists)? >>>>>>>>> The "task force" would have 2 tasks: >>>>>>>>> 1) make a HDF5 based convention for the future and >>>>>>>>> 2) try to retroactively salvage the current design issue of netCDF >>>>>>>>> My phone is 217-898-9356, you are welcome to call in anytime. >>>>>>>>> ---------------------- >>>>>>>>> Pedro Vicente >>>>>>>>> [email protected] <mailto: >>>>>>>>> [email protected]> >>>>>>>>> https://twitter.com/_pedro__vicente >>>>>>>>> http://www.space-research.org/ >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>> *From:* Miller, Mark C. <mailto:[email protected]> >>>>>>>>> *To:* HDF Users Discussion List <mailto: >>>>>>>>> [email protected]> >>>>>>>>> *Cc:* [email protected] >>>>>>>>> <mailto:[email protected]> ; Ward Fisher >>>>>>>>> <mailto:[email protected]> >>>>>>>>> *Sent:* Wednesday, March 02, 2016 7:07 PM >>>>>>>>> *Subject:* Re: [Hdf-forum] Detecting netCDF versus HDF5 >>>>>>>>> >>>>>>>>> I like John's suggestion here. >>>>>>>>> >>>>>>>>> But, any code you add to any applications now will work *only* >>>>>>>>> for >>>>>>>>> files that were produced post-adoption of this convention. >>>>>>>>> >>>>>>>>> There are probably a bazillion files out there at this point >>>>>>>>> that >>>>>>>>> don't follow that convention and you probably still want your >>>>>>>>> applications to be able to read them. >>>>>>>>> >>>>>>>>> In VisIt, we support >140 format readers. Over 20 of those are >>>>>>>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, >>>>>>>>> Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a >>>>>>>>> file, how does VisIt figure out which plugin to use? In >>>>>>>>> particular, how do we avoid one poorly written reader plugin >>>>>>>>> (which may be the wrong one for a given file) from preventing >>>>>>>>> the >>>>>>>>> correct one from being found. Its kinda a hard problem. >>>>>>>>> >>>>>>>>> Some of our discussion is captured here. . . >>>>>>>>> >>>>>>>>> http://www.visitusers.org/index.php?title=Database_Format_Detection >>>>>>>>> >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Hdf-forum <[email protected] >>>>>>>>> <mailto:[email protected]>> on behalf of >>>>>>>>> John >>>>>>>>> Shalf <[email protected] <mailto:[email protected]>> >>>>>>>>> Reply-To: HDF Users Discussion List >>>>>>>>> <[email protected] >>>>>>>>> <mailto:[email protected]>> >>>>>>>>> Date: Wednesday, March 2, 2016 1:02 PM >>>>>>>>> To: HDF Users Discussion List <[email protected] >>>>>>>>> <mailto:[email protected]>> >>>>>>>>> Cc: "[email protected] >>>>>>>>> <mailto:[email protected]>" >>>>>>>>> <[email protected] >>>>>>>>> <mailto:[email protected]>>, Ward Fisher >>>>>>>>> <[email protected] <mailto:[email protected]>> >>>>>>>>> Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5 >>>>>>>>> >>>>>>>>> Perhaps NetCDF (and other higher-level APIs that are built >>>>>>>>> on >>>>>>>>> top of HDF5) should include an attribute attached to the >>>>>>>>> root >>>>>>>>> group that identifies the name and version of the API that >>>>>>>>> created the file? (adopt this as a convention) >>>>>>>>> >>>>>>>>> -john >>>>>>>>> >>>>>>>>> On Mar 2, 2016, at 12:55 PM, Pedro Vicente >>>>>>>>> <[email protected] >>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>> Hi Ward >>>>>>>>> As you know, Data Explorer is going to be a general >>>>>>>>> purpose data reader for many formats, including HDF5 >>>>>>>>> and >>>>>>>>> netCDF. >>>>>>>>> Here >>>>>>>>> http://www.space-research.org/ >>>>>>>>> Regarding the handling of both HDF5 and netCDF, it >>>>>>>>> seems >>>>>>>>> there is a potential issue, which is, how to tell if >>>>>>>>> any >>>>>>>>> HDF5 file was saved by the HDF5 API or by the netCDF >>>>>>>>> API? >>>>>>>>> It seems to me that this is not possible. Is this >>>>>>>>> correct? >>>>>>>>> netCDF uses an internal function NC_check_file_type to >>>>>>>>> examine the first few bytes of a file, and for example >>>>>>>>> for >>>>>>>>> any HDF5 file the test is >>>>>>>>> /* Look at the magic number */ >>>>>>>>> /* Ignore the first byte for HDF */ >>>>>>>>> if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == >>>>>>>>> 'F') { >>>>>>>>> *filetype = FT_HDF; >>>>>>>>> *version = 5; >>>>>>>>> The problem is that this test works for any HDF5 file >>>>>>>>> and >>>>>>>>> for any netCDF file, which makes it impossible to tell >>>>>>>>> which is which. >>>>>>>>> Which makes it impossible for any general purpose data >>>>>>>>> reader to decide to use the netCDF API or the HDF5 API. >>>>>>>>> I have a possible solution for this , but before going >>>>>>>>> any >>>>>>>>> further, I would just like to confirm that >>>>>>>>> 1) Is indeed not possible >>>>>>>>> 2) See if you have a solid workaround for this, >>>>>>>>> excluding the dumb ones, for example deciding on a >>>>>>>>> extension .nc or .h5, or traversing the HDF5 file to >>>>>>>>> see >>>>>>>>> if it's non netCDF conforming one. Yes, to further >>>>>>>>> complicate things, it is possible that the above test >>>>>>>>> says >>>>>>>>> OK for a HDF5 file, but then the read by the netCDF API >>>>>>>>> fails because the file is a HDF5 non netCDF conformant >>>>>>>>> Thanks >>>>>>>>> ---------------------- >>>>>>>>> Pedro Vicente >>>>>>>>> [email protected] >>>>>>>>> <mailto:[email protected]> >>>>>>>>> http://www.space-research.org/ >>>>>>>>> _______________________________________________ >>>>>>>>> Hdf-forum is for HDF software users discussion. >>>>>>>>> [email protected] >>>>>>>>> <mailto:[email protected]> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org >>>>>>>>> Twitter: https://twitter.com/hdf5 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Hdf-forum is for HDF software users discussion. >>>>>>>>> [email protected] <mailto: >>>>>>>>> [email protected]> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org >>>>>>>>> Twitter: https://twitter.com/hdf5 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Hdf-forum is for HDF software users discussion. >>>>>>>>> [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>>>>>>>> Twitter: https://twitter.com/hdf5 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> netcdfgroup mailing list >>>>>>>>> [email protected] >>>>>>>>> For list information or to unsubscribe, visit: >>>>>>>>> http://www.unidata.ucar.edu/mailing_lists/ >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> CF-metadata mailing list >>>>> [email protected] >>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata >>>>> >>>> >>>> >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: >>> <http://mailman.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/attachments/20160422/f64faad2/attachment.html> >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> netcdfgroup mailing list >>> [email protected] >>> For list information or to unsubscribe, visit: >>> http://www.unidata.ucar.edu/mailing_lists/ >>> >>> End of netcdfgroup Digest, Vol 1126, Issue 2 >>> ******************************************** >> >> _______________________________________________ >> netcdfgroup mailing list >> [email protected] >> For list information or to unsubscribe, visit: >> http://www.unidata.ucar.edu/mailing_lists/ >> > _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
