Hi Riley, Thank you for working on the extensive changes to add support for Zarr. Having the capability to register existing Zarr data for analysis within SDAP is a much desired feature. Since the changes do not impact the existing nexusproto implementation, I think it's safe to merge and release this change.
Thanks, Nga On Tue, Oct 17, 2023 at 4:12 PM Riley Kuttruff <[email protected]> wrote: > > Hi everyone, > > I currently have PRs open for Nexus (#265) and Ingester (#86) to overhaul the > data-access component of SDAP to support analysis-ready datasets in other > formats without the need to ingest NetCDF files into our own format (data > duplication). This support would be useful for users who wish to use SDAP > with their existing data (DAACs come to mind here) without needing to have > (or pay for) added storage for SDAP tiles. > > My initial implementation focuses on Zarr as that is what I have more > experience in. Cloud-optimized GeoTIFF support is in work and nearing > readiness (though I'd rather these changes be accepted/merged instead of > making the PRs even larger). I also plan to investigate Parquet. The PRs > support Zarr data stored locally or on AWS S3. > > The changes boil down to separating the various backends (nexusproto & Zarr) > into their own modules that implement many of the existing nexustiles.py > methods. nexustiles.py acts as an interface, routing the existing method > calls to the appropriate backend's method. It also maintains a mapping > between current datasets and their associated backend, which it builds from > Solr's nexusdatasets collection. This collection stores all the needed info > for Zarr datasets. Datasets can be added by listing them in the collections > config, or dynamically through a set of dataset management endpoints (add, > update and delete), which are useful for on the fly onboarding, updating > (such as rotating AWS keys) and removal of Zarr datasets. > > I've done a decent degree of testing with these changes, both locally and by > deploying them to JPL SDAP instances (such as > https://ideas-digitaltwin.jpl.nasa.gov/nexus/ or see the OCO-3 section of > https://github.com/EarthDigitalTwin/FireAlarm-notebooks/blob/dc5c64d9e0311e45f9a3f93908ea6394f5304130/AirQuality_Demo.ipynb) > and there's been no indication of breaking existing endpoints using the old > nexusproto implementation and the Zarr backend seems reliable. > > There's more documentation for these changes on the Nexus PR, I'm available > to address any questions & concerns regarding these changes as well. If need > be, I can also write a more thorough gist documenting these changes. > > Let me know what you think! > > Thanks, > Riley
