Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package python-h5netcdf for openSUSE:Factory checked in at 2023-01-07 17:19:57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-h5netcdf (Old) and /work/SRC/openSUSE:Factory/.python-h5netcdf.new.1563 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-h5netcdf" Sat Jan 7 17:19:57 2023 rev:7 rq:1056763 version:1.1.0 Changes: -------- --- /work/SRC/openSUSE:Factory/python-h5netcdf/python-h5netcdf.changes 2022-08-09 15:43:50.436272948 +0200 +++ /work/SRC/openSUSE:Factory/.python-h5netcdf.new.1563/python-h5netcdf.changes 2023-01-07 17:23:19.047445675 +0100 @@ -1,0 +2,12 @@ +Sat Jan 7 12:08:07 UTC 2023 - Dirk Müller <dmuel...@suse.com> + +- update to 1.1.0: + * Rework adding _FillValue-attribute, add tests. + * Add special add_phony method for creating phony dimensions, add test. + * Rewrite _unlabeled_dimension_mix (labeled/unlabeled), add tests. + * Add default netcdf fillvalues, pad only if necessary, adapt tests. + * Fix regression in padding algorithm, add test. + * Set ``track_order=True`` by default in created files if h5py 3.7.0 or + greater is detected to help compatibility with netCDF4-c programs. + +------------------------------------------------------------------- Old: ---- h5netcdf-1.0.2.tar.gz New: ---- h5netcdf-1.1.0.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-h5netcdf.spec ++++++ --- /var/tmp/diff_new_pack.G7VZgM/_old 2023-01-07 17:23:19.647449254 +0100 +++ /var/tmp/diff_new_pack.G7VZgM/_new 2023-01-07 17:23:19.651449278 +0100 @@ -1,7 +1,7 @@ # # spec file for package python-h5netcdf # -# Copyright (c) 2022 SUSE LLC +# Copyright (c) 2023 SUSE LLC # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -20,7 +20,7 @@ %define skip_python2 1 %define skip_python36 1 Name: python-h5netcdf -Version: 1.0.2 +Version: 1.1.0 Release: 0 Summary: A Python library to use netCDF4 files via h5py License: BSD-3-Clause ++++++ h5netcdf-1.0.2.tar.gz -> h5netcdf-1.1.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/CHANGELOG.rst new/h5netcdf-1.1.0/CHANGELOG.rst --- old/h5netcdf-1.0.2/CHANGELOG.rst 2022-08-02 11:33:39.000000000 +0200 +++ new/h5netcdf-1.1.0/CHANGELOG.rst 2022-11-23 07:40:05.000000000 +0100 @@ -1,5 +1,21 @@ Change Log ---------- +Version 1.1.0 (November 23rd, 2022): + +- Rework adding _FillValue-attribute, add tests. + By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_. +- Add special add_phony method for creating phony dimensions, add test. + By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_. +- Rewrite _unlabeled_dimension_mix (labeled/unlabeled), add tests. + By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_. +- Add default netcdf fillvalues, pad only if necessary, adapt tests. + By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_. +- Fix regression in padding algorithm, add test. + By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_. +- Set ``track_order=True`` by default in created files if h5py 3.7.0 or + greater is detected to help compatibility with netCDF4-c programs. + By `Mark Harfouche <https://github.com/hmaarrfk>`_. + Version 1.0.2 (August 2nd, 2022): - Adapt boolean indexing as h5py 3.7.0 started supporting it. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/PKG-INFO new/h5netcdf-1.1.0/PKG-INFO --- old/h5netcdf-1.0.2/PKG-INFO 2022-08-02 11:34:00.711885000 +0200 +++ new/h5netcdf-1.1.0/PKG-INFO 2022-11-23 07:40:28.608608200 +0100 @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: h5netcdf -Version: 1.0.2 +Version: 1.1.0 Summary: netCDF4 via h5py Home-page: https://h5netcdf.org Author: h5netcdf developers @@ -265,15 +265,30 @@ Track Order ~~~~~~~~~~~ -In h5netcdf version 0.12.0 and earlier, `order tracking`_ was disabled in -HDF5 file. As this is a requirement for the current netCDF4 standard, -it has been enabled without deprecation as of version 0.13.0 `[*]`_. +As of h5netcdf 1.1.0, if h5py 3.7.0 or greater is detected, the ``track_order`` +parameter is set to ``True`` enabling `order tracking`_ for newly created +netCDF4 files. This helps ensure that files created with the h5netcdf library +can be modified by the netCDF4-c and netCDF4-python implementation used in +other software stacks. Since this change should be transparent to most users, +it was made without deprecation. + +Since track_order is set at creation time, any dataset that was created with +``track_order=False`` (h5netcdf version 1.0.2 and older except for 0.13.0) will +continue to opened with order tracker disabled. + +The following describes the behavior of h5netcdf with respect to order tracking +for a few key versions: + +- Version 0.12.0 and earlier, the ``track_order`` parameter`order was missing + and thus order tracking was implicitely set to ``False``. +- Version 0.13.0 enabled order tracking by setting the parameter + ``track_order`` to ``True`` by default without deprecation. +- Versions 0.13.1 to 1.0.2 set ``track_order`` to ``False`` due to a bug in a + core dependency of h5netcdf, h5py `upstream bug`_ which was resolved in h5py + 3.7.0 with the help of the h5netcdf team. +- In version 1.1.0, if h5py 3.7.0 or above is detected, the ``track_order`` + parameter is set to ``True`` by default. -However in version 0.13.1 this has been reverted due to a bug in a core -dependency of h5netcdf, h5py `upstream bug`_. - -Datasets created with h5netcdf version 0.12.0 that are opened with -newer versions of h5netcdf will continue to disable order tracker. .. _order tracking: https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html#creation_order .. _upstream bug: https://github.com/h5netcdf/h5netcdf/issues/136 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/README.rst new/h5netcdf-1.1.0/README.rst --- old/h5netcdf-1.0.2/README.rst 2022-08-02 11:33:39.000000000 +0200 +++ new/h5netcdf-1.1.0/README.rst 2022-11-23 07:40:05.000000000 +0100 @@ -241,15 +241,30 @@ Track Order ~~~~~~~~~~~ -In h5netcdf version 0.12.0 and earlier, `order tracking`_ was disabled in -HDF5 file. As this is a requirement for the current netCDF4 standard, -it has been enabled without deprecation as of version 0.13.0 `[*]`_. +As of h5netcdf 1.1.0, if h5py 3.7.0 or greater is detected, the ``track_order`` +parameter is set to ``True`` enabling `order tracking`_ for newly created +netCDF4 files. This helps ensure that files created with the h5netcdf library +can be modified by the netCDF4-c and netCDF4-python implementation used in +other software stacks. Since this change should be transparent to most users, +it was made without deprecation. -However in version 0.13.1 this has been reverted due to a bug in a core -dependency of h5netcdf, h5py `upstream bug`_. +Since track_order is set at creation time, any dataset that was created with +``track_order=False`` (h5netcdf version 1.0.2 and older except for 0.13.0) will +continue to opened with order tracker disabled. + +The following describes the behavior of h5netcdf with respect to order tracking +for a few key versions: + +- Version 0.12.0 and earlier, the ``track_order`` parameter`order was missing + and thus order tracking was implicitely set to ``False``. +- Version 0.13.0 enabled order tracking by setting the parameter + ``track_order`` to ``True`` by default without deprecation. +- Versions 0.13.1 to 1.0.2 set ``track_order`` to ``False`` due to a bug in a + core dependency of h5netcdf, h5py `upstream bug`_ which was resolved in h5py + 3.7.0 with the help of the h5netcdf team. +- In version 1.1.0, if h5py 3.7.0 or above is detected, the ``track_order`` + parameter is set to ``True`` by default. -Datasets created with h5netcdf version 0.12.0 that are opened with -newer versions of h5netcdf will continue to disable order tracker. .. _order tracking: https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html#creation_order .. _upstream bug: https://github.com/h5netcdf/h5netcdf/issues/136 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/h5netcdf/_version.py new/h5netcdf-1.1.0/h5netcdf/_version.py --- old/h5netcdf-1.0.2/h5netcdf/_version.py 2022-08-02 11:34:00.000000000 +0200 +++ new/h5netcdf-1.1.0/h5netcdf/_version.py 2022-11-23 07:40:28.000000000 +0100 @@ -1,5 +1,5 @@ # coding: utf-8 # file generated by setuptools_scm # don't change, don't track in version control -__version__ = version = '1.0.2' -__version_tuple__ = version_tuple = (1, 0, 2) +__version__ = version = '1.1.0' +__version_tuple__ = version_tuple = (1, 1, 0) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/h5netcdf/core.py new/h5netcdf-1.1.0/h5netcdf/core.py --- old/h5netcdf-1.0.2/h5netcdf/core.py 2022-08-02 11:33:39.000000000 +0200 +++ new/h5netcdf-1.1.0/h5netcdf/core.py 2022-11-23 07:40:05.000000000 +0100 @@ -51,12 +51,17 @@ def _transform_1d_boolean_indexers(key): """Find and transform 1D boolean indexers to int""" - key = [ - np.asanyarray(k).nonzero()[0] - if isinstance(k, (np.ndarray, list)) and type(k[0]) in (bool, np.bool_) - else k - for k in key - ] + # return key, if not iterable + try: + key = [ + np.asanyarray(k).nonzero()[0] + if isinstance(k, (np.ndarray, list)) and type(k[0]) in (bool, np.bool_) + else k + for k in key + ] + except TypeError: + return key + return tuple(key) @@ -145,10 +150,14 @@ # normal variable carrying DIMENSION_LIST # extract hdf5 file references and get objects name if "DIMENSION_LIST" in attrs: - return tuple( - self._root._h5file[ref[0]].name.split("/")[-1] - for ref in list(self._h5ds.attrs.get("DIMENSION_LIST", [])) - ) + # check if malformed variable and raise + if _unlabeled_dimension_mix(self._h5ds) == "labeled": + # If a dimension has attached more than one scale for some reason, then + # take the last one. This is in line with netcdf-c and netcdf4-python. + return tuple( + self._root._h5file[ref[-1]].name.split("/")[-1] + for ref in list(self._h5ds.attrs.get("DIMENSION_LIST", [])) + ) # need to use the h5ds name here to distinguish from collision dimensions child_name = self._h5ds.name.split("/")[-1] @@ -271,6 +280,39 @@ """Return NumPy dtype object giving the variableâs type.""" return self._h5ds.dtype + def _get_padding(self, key): + """Return padding if needed, defaults to False.""" + padding = False + if self.dtype != str and self.dtype.kind in ["f", "i", "u"]: + key0 = _expanded_indexer(key, self.ndim) + key0 = _transform_1d_boolean_indexers(key0) + # extract max shape of key vs hdf5-shape + h5ds_shape = self._h5ds.shape + shape = self.shape + + # check for ndarray and list + # see https://github.com/pydata/xarray/issues/7154 + # first get maximum index + max_index = [ + max(k) + 1 if isinstance(k, (np.ndarray, list)) else k.stop + for k in key0 + ] + # second convert to max shape + max_shape = tuple( + [ + shape[i] if k is None else max(h5ds_shape[i], k) + for i, k in enumerate(max_index) + ] + ) + + # check if hdf5 dataset dimensions are smaller than + # their respective netcdf dimensions + sdiff = [d0 - d1 for d0, d1 in zip(max_shape, h5ds_shape)] + # create padding only if hdf5 dataset is smaller than netcdf dimension + if sum(sdiff): + padding = [(0, s) for s in sdiff] + return padding + def __array__(self, *args, **kwargs): return self._h5ds.__array__(*args, **kwargs) @@ -279,7 +321,6 @@ if isinstance(self._parent._root, Dataset): # this is only for legacyapi - key = _expanded_indexer(key, self.ndim) # fix boolean indexing for affected versions # https://github.com/h5py/h5py/pull/2079 # https://github.com/h5netcdf/h5netcdf/pull/125/ @@ -292,18 +333,17 @@ if string_info and string_info.length is None: return self._h5ds.asstr()[key] - # return array padded with fillvalue (both api) - if self.dtype != str and self.dtype.kind in ["f", "i", "u"]: - sdiff = [d0 - d1 for d0, d1 in zip(self.shape, self._h5ds.shape)] - if sum(sdiff): - fv = self.dtype.type(self._h5ds.fillvalue) - padding = [(0, s) for s in sdiff] - return np.pad( - self._h5ds, - pad_width=padding, - mode="constant", - constant_values=fv, - )[key] + # get padding + padding = self._get_padding(key) + # apply padding with fillvalue (both api) + if padding: + fv = self.dtype.type(self._h5ds.fillvalue) + return np.pad( + self._h5ds, + pad_width=padding, + mode="constant", + constant_values=fv, + )[key] return self._h5ds[key] @@ -406,15 +446,26 @@ def _unlabeled_dimension_mix(h5py_dataset): - dims = sum([len(j) for j in h5py_dataset.dims]) - if dims: - if dims != h5py_dataset.ndim: + # check if dataset has dims and get it + dimlist = getattr(h5py_dataset, "dims", []) + if not dimlist: + status = "nodim" + else: + dimset = set([len(j) for j in dimlist]) + # either all dimensions have exactly one scale + # or all dimensions have no scale + if dimset ^ {0} == set(): + status = "unlabeled" + elif dimset & {0}: name = h5py_dataset.name.split("/")[-1] raise ValueError( "malformed variable {0} has mixing of labeled and " "unlabeled dimensions.".format(name) ) - return dims + else: + status = "labeled" + + return status class Group(Mapping): @@ -462,9 +513,8 @@ self._dimensions.add(k) else: if self._root._phony_dims_mode is not None: - - # check if malformed variable - if not _unlabeled_dimension_mix(v): + # check if malformed variable and raise + if _unlabeled_dimension_mix(v) == "unlabeled": # if unscaled variable, get phony dimensions phony_dims |= Counter(v.shape) @@ -486,7 +536,7 @@ if self._root._phony_dims_mode == "sort": name += self._root._max_dim_id + 1 name = "phony_dim_{}".format(name) - self._dimensions[name] = size + self._dimensions.add_phony(name, size) self._initialized = True @@ -675,6 +725,14 @@ if self._root._h5py.__name__ == "h5py": kwargs.update(dict(track_order=self._parent._track_order)) + # handling default fillvalues for legacyapi + # see https://github.com/h5netcdf/h5netcdf/issues/182 + from .legacyapi import Dataset, _get_default_fillvalue + + fillval = fillvalue + if fillvalue is None and isinstance(self._parent._root, Dataset): + fillval = _get_default_fillvalue(dtype) + # create hdf5 variable self._h5group.create_dataset( h5name, @@ -682,7 +740,7 @@ dtype=dtype, data=data, chunks=chunks, - fillvalue=fillvalue, + fillvalue=fillval, **kwargs, ) @@ -712,7 +770,20 @@ variable._ensure_dim_id() if fillvalue is not None: - value = variable.dtype.type(fillvalue) + # trying to create correct type of fillvalue + if variable.dtype is str: + value = fillvalue + else: + string_info = self._root._h5py.check_string_dtype(variable.dtype) + if ( + string_info + and string_info.length is not None + and string_info.length > 1 + ): + value = fillvalue + else: + value = variable.dtype.type(fillvalue) + variable.attrs._h5attrs["_FillValue"] = value return variable @@ -773,6 +844,12 @@ """ # if root-variable if name.startswith("/"): + # handling default fillvalues for legacyapi + # see https://github.com/h5netcdf/h5netcdf/issues/182 + from .legacyapi import Dataset, _get_default_fillvalue + + if fillvalue is None and isinstance(self._parent._root, Dataset): + fillvalue = _get_default_fillvalue(dtype) return self._root.create_variable( name[1:], dimensions, @@ -911,6 +988,16 @@ phony_dims: 'sort', 'access' See :ref:`phony dims` for more details. + track_order: bool + Corresponds to the h5py.File `track_order` parameter. Unless + specified, the library will choose a default that enhances + compatibility with netCDF4-c. If h5py version 3.7.0 or greater is + installed, this parameter will be set to True by default. + track_order is required to be true to for netCDF4-c libraries to + append to a file. If an older version of h5py is detected, this + parameter will be set to False by default to work around a bug in + h5py limiting the number of attributes for a given variable. + **kwargs: Additional keyword arguments to be passed to the ``h5py.File`` constructor. @@ -930,22 +1017,14 @@ # standard # https://github.com/Unidata/netcdf-c/issues/2054 # https://github.com/h5netcdf/h5netcdf/issues/128 - # 2022/01/20: hmaarrfk - # However, it was found that this causes issues with attrs and h5py - # https://github.com/h5netcdf/h5netcdf/issues/136 - # https://github.com/h5py/h5py/issues/1385 - track_order = kwargs.pop("track_order", False) - - # When the issues with track_order in h5py are resolved, we - # can consider uncommenting the code below - # if not track_order: - # self._closed = True - # raise ValueError( - # f"track_order, if specified must be set to to True (got {track_order})" - # "to conform to the netCDF4 file format. Please see " - # "https://github.com/h5netcdf/h5netcdf/issues/130 " - # "for more details." - # ) + # h5py versions less than 3.7.0 had a bug that limited the number of + # attributes when track_order was set to true by default. + # However, setting track_order to True helps with compatibility + # with netcdf4-c and generally, keeping track of how things were added + # to the dataset. + # https://github.com/h5netcdf/h5netcdf/issues/136#issuecomment-1017457067 + track_order_default = version.parse(h5py.__version__) >= version.parse("3.7.0") + track_order = kwargs.pop("track_order", track_order_default) if version.parse(h5py.__version__) >= version.parse("3.0.0"): self.decode_vlen_strings = kwargs.pop("decode_vlen_strings", None) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/h5netcdf/dimensions.py new/h5netcdf-1.1.0/h5netcdf/dimensions.py --- old/h5netcdf-1.0.2/h5netcdf/dimensions.py 2022-08-02 11:33:39.000000000 +0200 +++ new/h5netcdf-1.1.0/h5netcdf/dimensions.py 2022-11-23 07:40:05.000000000 +0100 @@ -21,14 +21,18 @@ def __setitem__(self, name, size): # creating new dimensions - phony = "phony_dim" in name - if not self._group._root._writable and not phony: + if not self._group._root._writable: raise RuntimeError("H5NetCDF: Write to read only") if name in self._objects: raise ValueError("dimension %r already exists" % name) self._objects[name] = Dimension(self._group, name, size, create_h5ds=True) + def add_phony(self, name, size): + self._objects[name] = Dimension( + self._group, name, size, create_h5ds=False, phony=True + ) + def add(self, name): # adding dimensions which are already created in the file self._objects[name] = Dimension(self._group, name) @@ -56,7 +60,7 @@ class Dimension(object): - def __init__(self, parent, name, size=None, create_h5ds=False): + def __init__(self, parent, name, size=None, create_h5ds=False, phony=False): """NetCDF4 Dimension constructor. Parameters @@ -69,9 +73,11 @@ Size of the Netcdf4 Dimension. Defaults to None (unlimited). create_h5ds : bool For internal use only. + phony : bool + For internal use only. """ self._parent_ref = weakref.ref(parent) - self._phony = "phony_dim" in name + self._phony = phony self._root_ref = weakref.ref(parent._root) self._h5path = _join_h5paths(parent.name, name) self._name = name diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/h5netcdf/legacyapi.py new/h5netcdf-1.1.0/h5netcdf/legacyapi.py --- old/h5netcdf-1.0.2/h5netcdf/legacyapi.py 2022-08-02 11:33:39.000000000 +0200 +++ new/h5netcdf-1.1.0/h5netcdf/legacyapi.py 2022-11-23 07:40:05.000000000 +0100 @@ -5,6 +5,30 @@ from . import core +#: default netcdf fillvalues +default_fillvals = { + "S1": "\x00", + "i1": -127, + "u1": 255, + "i2": -32767, + "u2": 65535, + "i4": -2147483647, + "u4": 4294967295, + "i8": -9223372036854775806, + "u8": 18446744073709551614, + "f4": 9.969209968386869e36, + "f8": 9.969209968386869e36, +} + + +def _get_default_fillvalue(dtype): + kind = np.dtype(dtype).kind + fillvalue = None + if kind in ["u", "i", "f"]: + size = np.dtype(dtype).itemsize + fillvalue = default_fillvals[f"{kind}{size}"] + return fillvalue + def _check_return_dtype_endianess(endian="native"): little_endian = sys.byteorder == "little" @@ -204,7 +228,7 @@ fletcher32=fletcher32, chunks=chunksizes, fillvalue=fill_value, - **kwds + **kwds, ) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/h5netcdf/tests/test_h5netcdf.py new/h5netcdf-1.1.0/h5netcdf/tests/test_h5netcdf.py --- old/h5netcdf-1.0.2/h5netcdf/tests/test_h5netcdf.py 2022-08-02 11:33:39.000000000 +0200 +++ new/h5netcdf-1.1.0/h5netcdf/tests/test_h5netcdf.py 2022-11-23 07:40:05.000000000 +0100 @@ -677,6 +677,13 @@ pass +def test_fake_phony_dims(tmp_local_or_remote_netcdf): + # tests writing of dimension with phony naming scheme + # see https://github.com/h5netcdf/h5netcdf/issues/178 + with h5netcdf.File(tmp_local_or_remote_netcdf, mode="w") as ds: + ds.dimensions["phony_dim_0"] = 3 + + def check_invalid_netcdf4_mixed(var, i): pdim = "phony_dim_{}".format(i) assert var["foo1"].dimensions[0] == "y1" @@ -761,8 +768,14 @@ f["foo1"].dims[0].attach_scale(f["x"]) with raises(ValueError): + with h5netcdf.File(tmp_local_or_remote_netcdf, "r") as ds: + assert ds + print(ds) + + with raises(ValueError): with h5netcdf.File(tmp_local_or_remote_netcdf, "r", phony_dims="sort") as ds: assert ds + print(ds) def test_hierarchical_access_auto_create(tmp_local_or_remote_netcdf): @@ -1142,6 +1155,7 @@ assert f["dummy2"].shape == (3, 2, 2) f.groups["test"]["dummy3"].shape == (3, 3) f.groups["test"]["dummy4"].shape == (0, 0) + assert f["dummy5"].shape == (2, 3) def test_reading_unused_unlimited_dimension(tmp_local_or_remote_netcdf): @@ -1163,10 +1177,12 @@ def test_nc4_non_coord(tmp_local_netcdf): - # Track order True is the new default for versions after 0.12.0 - # 0.12.0 defaults to `track_order=False` - # Ensure that the tests order the variables in their creation order - # not alphabetical order + # Here we generate a few variables and coordinates + # The default should be to track the order of creation + # Thus, on reopening the file, the order in which + # the variables are listed should be maintained + # y -- refers to the coordinate y + # _nc4_non_coord_y -- refers to the data y with h5netcdf.File(tmp_local_netcdf, "w") as f: f.dimensions = {"x": None, "y": 2} f.create_variable("test", dimensions=("x",), dtype=np.int64) @@ -1177,8 +1193,23 @@ assert f.dimensions["x"].size == 0 assert f.dimensions["x"].isunlimited() assert f.dimensions["y"].size == 2 - assert list(f.variables) == ["y", "test"] - assert list(f._h5group.keys()) == ["_nc4_non_coord_y", "test", "x", "y"] + if version.parse(h5py.__version__) >= version.parse("3.7.0"): + assert list(f.variables) == ["test", "y"] + assert list(f._h5group.keys()) == ["x", "y", "test", "_nc4_non_coord_y"] + + with h5netcdf.File(tmp_local_netcdf, "w") as f: + f.dimensions = {"x": None, "y": 2} + f.create_variable("y", dimensions=("x",), dtype=np.int64) + f.create_variable("test", dimensions=("x",), dtype=np.int64) + + with h5netcdf.File(tmp_local_netcdf, "r") as f: + assert list(f.dimensions) == ["x", "y"] + assert f.dimensions["x"].size == 0 + assert f.dimensions["x"].isunlimited() + assert f.dimensions["y"].size == 2 + if version.parse(h5py.__version__) >= version.parse("3.7.0"): + assert list(f.variables) == ["y", "test"] + assert list(f._h5group.keys()) == ["x", "y", "_nc4_non_coord_y", "test"] def test_overwrite_existing_file(tmp_local_netcdf): @@ -1472,6 +1503,9 @@ def test_expanded_variables_netcdf4(tmp_local_netcdf, netcdf_write_module): + # partially reimplemented due to performance reason in edge cases + # https://github.com/h5netcdf/h5netcdf/issues/182 + with netcdf_write_module.Dataset(tmp_local_netcdf, "w") as ds: f = ds.createGroup("test") f.createDimension("x", None) @@ -1483,8 +1517,8 @@ dummy4 = f.createVariable("dummy4", float, ("x", "y")) dummy1[:] = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] - dummy2[:] = [[1, 2, 3]] - dummy3[:] = [[1, 2, 3], [4, 5, 6]] + dummy2[1, :] = [4, 5, 6] + dummy3[0:2, :] = [[1, 2, 3], [4, 5, 6]] # don't mask, since h5netcdf doesn't do masking if netcdf_write_module == netCDF4: @@ -1503,10 +1537,16 @@ f = ds["test"] np.testing.assert_allclose(f.variables["dummy1"][:], res1) + np.testing.assert_allclose(f.variables["dummy1"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy1"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy1"].shape == (3, 3) np.testing.assert_allclose(f.variables["dummy2"][:], res2) + np.testing.assert_allclose(f.variables["dummy2"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy2"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy2"].shape == (3, 3) np.testing.assert_allclose(f.variables["dummy3"][:], res3) + np.testing.assert_allclose(f.variables["dummy3"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy3"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy3"].shape == (3, 3) np.testing.assert_allclose(f.variables["dummy4"][:], res4) assert f.variables["dummy4"].shape == (3, 3) @@ -1514,12 +1554,22 @@ with legacyapi.Dataset(tmp_local_netcdf, "r") as ds: f = ds["test"] np.testing.assert_allclose(f.variables["dummy1"][:], res1) + np.testing.assert_allclose(f.variables["dummy1"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy1"][1:2, :], [[4.0, 5.0, 6.0]]) + np.testing.assert_allclose(f.variables["dummy1"]._h5ds[1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose( + f.variables["dummy1"]._h5ds[1:2, :], [[4.0, 5.0, 6.0]] + ) assert f.variables["dummy1"].shape == (3, 3) assert f.variables["dummy1"]._h5ds.shape == (3, 3) np.testing.assert_allclose(f.variables["dummy2"][:], res2) + np.testing.assert_allclose(f.variables["dummy2"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy2"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy2"].shape == (3, 3) - assert f.variables["dummy2"]._h5ds.shape == (1, 3) + assert f.variables["dummy2"]._h5ds.shape == (2, 3) np.testing.assert_allclose(f.variables["dummy3"][:], res3) + np.testing.assert_allclose(f.variables["dummy3"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy3"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy3"].shape == (3, 3) assert f.variables["dummy3"]._h5ds.shape == (2, 3) np.testing.assert_allclose(f.variables["dummy4"][:], res4) @@ -1529,12 +1579,19 @@ with h5netcdf.File(tmp_local_netcdf, "r") as ds: f = ds["test"] np.testing.assert_allclose(f.variables["dummy1"][:], res1) + np.testing.assert_allclose(f.variables["dummy1"][:, :], res1) + np.testing.assert_allclose(f.variables["dummy1"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy1"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy1"].shape == (3, 3) assert f.variables["dummy1"]._h5ds.shape == (3, 3) np.testing.assert_allclose(f.variables["dummy2"][:], res2) + np.testing.assert_allclose(f.variables["dummy2"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy2"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy2"].shape == (3, 3) - assert f.variables["dummy2"]._h5ds.shape == (1, 3) + assert f.variables["dummy2"]._h5ds.shape == (2, 3) np.testing.assert_allclose(f.variables["dummy3"][:], res3) + np.testing.assert_allclose(f.variables["dummy3"][1, :], [4.0, 5.0, 6.0]) + np.testing.assert_allclose(f.variables["dummy3"][1:2, :], [[4.0, 5.0, 6.0]]) assert f.variables["dummy3"].shape == (3, 3) assert f.variables["dummy3"]._h5ds.shape == (2, 3) np.testing.assert_allclose(f.variables["dummy4"][:], res4) @@ -1573,13 +1630,14 @@ np.testing.assert_array_equal(variable[...].data, 10) -# https://github.com/h5netcdf/h5netcdf/issues/136 -@pytest.mark.skip(reason="h5py bug with track_order") -def test_track_order_false(tmp_local_netcdf): - # track_order must be specified as True or not specified at all - # https://github.com/h5netcdf/h5netcdf/issues/130 - with pytest.raises(ValueError): - h5netcdf.File(tmp_local_netcdf, "w", track_order=False) +def test_track_order_specification(tmp_local_netcdf): + # While netcdf4-c has historically only allowed track_order to be True + # There doesn't seem to be a good reason for this + # https://github.com/Unidata/netcdf-c/issues/2054 historically, h5netcdf + # has not specified this parameter (leaving it implicitely as False) + # We want to make sure we allow both here + with h5netcdf.File(tmp_local_netcdf, "w", track_order=False): + pass with h5netcdf.File(tmp_local_netcdf, "w", track_order=True): pass @@ -1607,8 +1665,8 @@ # We don't expect any errors. This is effectively a void context manager expected_errors = memoryview(b"") - with expected_errors: - with h5netcdf.File(tmp_local_netcdf, "w", track_order=track_order) as h5file: + with h5netcdf.File(tmp_local_netcdf, "w", track_order=track_order) as h5file: + with expected_errors: for i in range(100): h5file.attrs[f"key{i}"] = i h5file.attrs[f"key{i}"] = 0 @@ -1710,6 +1768,25 @@ ds["hello"][bool_slice, :] +def test_fancy_indexing(tmp_local_netcdf): + # regression test for https://github.com/pydata/xarray/issues/7154 + with h5netcdf.legacyapi.Dataset(tmp_local_netcdf, "w") as ds: + ds.createDimension("x", None) + ds.createDimension("y", None) + ds.createVariable("hello", int, ("x", "y"), fill_value=0) + ds["hello"][:5, :10] = np.arange(5 * 10, dtype="int").reshape((5, 10)) + ds.createVariable("hello2", int, ("x", "y")) + ds["hello2"][:10, :20] = np.arange(10 * 20, dtype="int").reshape((10, 20)) + + with legacyapi.Dataset(tmp_local_netcdf, "a") as ds: + np.testing.assert_array_equal(ds["hello"][1, [7, 8, 9]], [17, 18, 19]) + np.testing.assert_array_equal(ds["hello"][1, [9, 10, 11]], [19, 0, 0]) + np.testing.assert_array_equal(ds["hello"][1, slice(9, 12)], [19, 0, 0]) + np.testing.assert_array_equal(ds["hello"][[2, 3, 4], 1], [21, 31, 41]) + np.testing.assert_array_equal(ds["hello"][[4, 5, 6], 1], [41, 0, 0]) + np.testing.assert_array_equal(ds["hello"][slice(4, 7), 1], [41, 0, 0]) + + def test_h5py_chunking(tmp_local_netcdf): with h5netcdf.File(tmp_local_netcdf, "w") as ds: ds.dimensions = {"x": 10, "y": 10, "z": 10, "t": None} @@ -1789,7 +1866,7 @@ def test_create_invalid_netcdf_catch_error(tmp_local_netcdf): # see https://github.com/h5netcdf/h5netcdf/issues/138 - with h5netcdf.File("test.nc", "w") as f: + with h5netcdf.File(tmp_local_netcdf, "w") as f: try: f.create_variable("test", ("x", "y"), data=np.ones((10, 10), dtype="bool")) except CompatibilityError: @@ -1797,8 +1874,8 @@ assert repr(f.dimensions) == "<h5netcdf.Dimensions: >" -def test_dimensions_in_parent_groups(): - with netCDF4.Dataset("test_netcdf.nc", mode="w") as ds: +def test_dimensions_in_parent_groups(tmpdir): + with netCDF4.Dataset(tmpdir.join("test_netcdf.nc"), mode="w") as ds: ds0 = ds for i in range(10): ds = ds.createGroup(f"group{i:02d}") @@ -1808,7 +1885,7 @@ var = ds0["group00"].createVariable("x", float, ("x", "y")) var[:] = np.ones((10, 20)) - with legacyapi.Dataset("test_legacy.nc", mode="w") as ds: + with legacyapi.Dataset(tmpdir.join("test_legacy.nc"), mode="w") as ds: ds0 = ds for i in range(10): ds = ds.createGroup(f"group{i:02d}") @@ -1818,8 +1895,8 @@ var = ds0["group00"].createVariable("x", float, ("x", "y")) var[:] = np.ones((10, 20)) - with h5netcdf.File("test_netcdf.nc", mode="r") as ds0: - with h5netcdf.File("test_legacy.nc", mode="r") as ds1: + with h5netcdf.File(tmpdir.join("test_netcdf.nc"), mode="r") as ds0: + with h5netcdf.File(tmpdir.join("test_legacy.nc"), mode="r") as ds1: assert repr(ds0.dimensions["x"]) == repr(ds1.dimensions["x"]) assert repr(ds0.dimensions["y"]) == repr(ds1.dimensions["y"]) assert repr(ds0["group00"]) == repr(ds1["group00"]) @@ -2025,3 +2102,78 @@ np.testing.assert_equal(ds.int_array, np.arange(10)) np.testing.assert_equal(ds.empty_list, np.array([])) np.testing.assert_equal(ds.empty_array, np.array([])) + + +@pytest.mark.skipif( + version.parse(h5py.__version__) < version.parse("3.7.0"), + reason="does not work with h5py < 3.7.0", +) +def test_vlen_string_dataset_fillvalue(tmp_local_netcdf, decode_vlen_strings): + # check _FillValue for VLEN string datasets + # only works for h5py >= 3.7.0 + + # first with new API + with h5netcdf.File(tmp_local_netcdf, "w") as ds: + ds.dimensions = {"string": 10} + dt0 = h5py.string_dtype() + fill_value0 = "bár" + ds.create_variable("x0", ("string",), dtype=dt0, fillvalue=fill_value0) + dt1 = h5py.string_dtype("ascii") + fill_value1 = "bar" + ds.create_variable("x1", ("string",), dtype=dt1, fillvalue=fill_value1) + + # check, if new API can read them + with h5netcdf.File(tmp_local_netcdf, "r", **decode_vlen_strings) as ds: + decode_vlen = decode_vlen_strings["decode_vlen_strings"] + fvalue0 = fill_value0 if decode_vlen else fill_value0.encode("utf-8") + fvalue1 = fill_value1 if decode_vlen else fill_value1.encode("utf-8") + assert ds["x0"][0] == fvalue0 + assert ds["x0"].attrs["_FillValue"] == fill_value0 + assert ds["x1"][0] == fvalue1 + assert ds["x1"].attrs["_FillValue"] == fill_value1 + + # check if legacyapi can read them + with legacyapi.Dataset(tmp_local_netcdf, "r") as ds: + assert ds["x0"][0] == fill_value0 + assert ds["x0"]._FillValue == fill_value0 + assert ds["x1"][0] == fill_value1 + assert ds["x1"]._FillValue == fill_value1 + + # check if netCDF4-python can read them + with netCDF4.Dataset(tmp_local_netcdf, "r") as ds: + assert ds["x0"][0] == fill_value0 + assert ds["x0"]._FillValue == fill_value0 + assert ds["x1"][0] == fill_value1 + assert ds["x1"]._FillValue == fill_value1 + + # second with legacyapi + with legacyapi.Dataset(tmp_local_netcdf, "w") as ds: + ds.createDimension("string", 10) + fill_value0 = "bár" + ds.createVariable("x0", str, ("string",), fill_value=fill_value0) + fill_value1 = "bar" + ds.createVariable("x1", str, ("string",), fill_value=fill_value1) + + # check if new API can read them + with h5netcdf.File(tmp_local_netcdf, "r", **decode_vlen_strings) as ds: + decode_vlen = decode_vlen_strings["decode_vlen_strings"] + fvalue0 = fill_value0 if decode_vlen else fill_value0.encode("utf-8") + fvalue1 = fill_value1 if decode_vlen else fill_value1.encode("utf-8") + assert ds["x0"][0] == fvalue0 + assert ds["x0"].attrs["_FillValue"] == fill_value0 + assert ds["x1"][0] == fvalue1 + assert ds["x1"].attrs["_FillValue"] == fill_value1 + + # check if legacyapi can read them + with legacyapi.Dataset(tmp_local_netcdf, "r") as ds: + assert ds["x0"][0] == fill_value0 + assert ds["x0"]._FillValue == fill_value0 + assert ds["x1"][0] == fill_value1 + assert ds["x1"]._FillValue == fill_value1 + + # check if netCDF4-python can read them + with netCDF4.Dataset(tmp_local_netcdf, "r") as ds: + assert ds["x0"][0] == fill_value0 + assert ds["x0"]._FillValue == fill_value0 + assert ds["x1"][0] == fill_value1 + assert ds["x1"]._FillValue == fill_value1 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/h5netcdf-1.0.2/h5netcdf.egg-info/PKG-INFO new/h5netcdf-1.1.0/h5netcdf.egg-info/PKG-INFO --- old/h5netcdf-1.0.2/h5netcdf.egg-info/PKG-INFO 2022-08-02 11:34:00.000000000 +0200 +++ new/h5netcdf-1.1.0/h5netcdf.egg-info/PKG-INFO 2022-11-23 07:40:28.000000000 +0100 @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: h5netcdf -Version: 1.0.2 +Version: 1.1.0 Summary: netCDF4 via h5py Home-page: https://h5netcdf.org Author: h5netcdf developers @@ -265,15 +265,30 @@ Track Order ~~~~~~~~~~~ -In h5netcdf version 0.12.0 and earlier, `order tracking`_ was disabled in -HDF5 file. As this is a requirement for the current netCDF4 standard, -it has been enabled without deprecation as of version 0.13.0 `[*]`_. +As of h5netcdf 1.1.0, if h5py 3.7.0 or greater is detected, the ``track_order`` +parameter is set to ``True`` enabling `order tracking`_ for newly created +netCDF4 files. This helps ensure that files created with the h5netcdf library +can be modified by the netCDF4-c and netCDF4-python implementation used in +other software stacks. Since this change should be transparent to most users, +it was made without deprecation. + +Since track_order is set at creation time, any dataset that was created with +``track_order=False`` (h5netcdf version 1.0.2 and older except for 0.13.0) will +continue to opened with order tracker disabled. + +The following describes the behavior of h5netcdf with respect to order tracking +for a few key versions: + +- Version 0.12.0 and earlier, the ``track_order`` parameter`order was missing + and thus order tracking was implicitely set to ``False``. +- Version 0.13.0 enabled order tracking by setting the parameter + ``track_order`` to ``True`` by default without deprecation. +- Versions 0.13.1 to 1.0.2 set ``track_order`` to ``False`` due to a bug in a + core dependency of h5netcdf, h5py `upstream bug`_ which was resolved in h5py + 3.7.0 with the help of the h5netcdf team. +- In version 1.1.0, if h5py 3.7.0 or above is detected, the ``track_order`` + parameter is set to ``True`` by default. -However in version 0.13.1 this has been reverted due to a bug in a core -dependency of h5netcdf, h5py `upstream bug`_. - -Datasets created with h5netcdf version 0.12.0 that are opened with -newer versions of h5netcdf will continue to disable order tracker. .. _order tracking: https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html#creation_order .. _upstream bug: https://github.com/h5netcdf/h5netcdf/issues/136