Thanks Joe, will check this out On Wed, Jul 24, 2024 at 12:30 PM Joe Lee <hyok...@hdfgroup.org> wrote:
> Hi, Michael! > > It's an interesting idea since Kerchunk can't handle HDF4 yet [1]. > OPeNDAP DMR++ now can handle HDF4 > so I think Kerchunk can do, too. > > For GDAL, is there C++ binding for Kerchunk? > I think that will be the main blocker for GDAL driver development. > > [1] https://github.com/hyoklee/kerchunk/wiki > > --- > Reality is hierarchical. Store scientific reality in HDF for Spatial > Computing. > > > > ________________________________________ > From: gdal-dev <gdal-dev-boun...@lists.osgeo.org> on behalf of Michael > Sumner via gdal-dev <gdal-dev@lists.osgeo.org> > Sent: Tuesday, July 23, 2024 17:37 > To: gdal-dev > Subject: [gdal-dev] kerchunk > > Hi, is there any effort or thought into something like Python's kerchunk > in GDAL? (my summary of kerchunk is below) > > https://github.com/fsspec/kerchunk > > I'll be exploring the python outputs in detail and looking for hooks into > where we might bring some of this tighter into GDAL. This would work > nicely inside the GTI driver, for example. But, a *kerchunk-driver*? That > would be in the family of raw/ drivers, my skillset won't have much to > offer but I'm going to explore with some simpler examples. It could even > bring old HDF4 files into the fold, I think. > > It's a bit weird from a GDAL perspective to map the chunks in a format for > which we have a driver, but there's definitely performance advantages and > convenience for virtualizing huge disparate collections (even the simplest > time-series-of-files in netcdf is nicely abstracted here for xarray, a > super-charged VRT for xarray). > > Interested in any thoughts, feedback, pointers to related efforts ... > thanks! > > (my take on) A description of kerchunk: > > kerchunk replaces the actual binary blobs on file in a Zarr with json > references to a file/uri/object and the byte start and end values, in this > way kerchunk brings formats like hdf/netcdf/grib into the fold of "cloud > readiness" by having a complete separation of metadata from the actual > storage. The information about those chunks (compression, type, orientation > etc is stored in json also). > > (a Zarr is a multidimensional version of a single-zoom-level image > tiling, imagine every image tile as a potentially n-dimensional child block > of a larger array. The blobs are stored like one zoom of an z/y/x tile > server [[[v/]w/]y/]x way (with a position for each dimension of the array, > 1, 2, 3, 4, or n, and z is not special, and with more general encoding > possibilities than tif/png/jpeg provide.) This scheme is extremely > general, literally a virtualized array-like abstraction on any storage, > and with kerchunk you can transcend many legacy issues with actual formats. > > Cheers, Mike > > > -- > Michael Sumner > Research Software Engineer > Australian Antarctic Division > Hobart, Australia > e-mail: mdsum...@gmail.com<mailto:mdsum...@gmail.com> > -- Michael Sumner Research Software Engineer Australian Antarctic Division Hobart, Australia e-mail: mdsum...@gmail.com
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev