Riley Kuttruff created SDAP-472: ----------------------------------- Summary: General Zarr support for gridded datasets Key: SDAP-472 URL: https://issues.apache.org/jira/browse/SDAP-472 Project: Apache Science Data Analytics Platform Issue Type: New Feature Components: analysis, collection-ingester Reporter: Riley Kuttruff Assignee: Riley Kuttruff
End goal would be SDAP being able to onboard existing Zarr datasets with minimal to no interaction with the data (ie, no scanning the data for metadata generation). Gridded formats allow for this, with only the need to record some (additional) dataset-level metadata. Swath data will require a different and much more labor-intensive approach, so we should just focus on gridded data as it will likely be more commonly used by our users. Collections should be able to be specified in the collection config yaml. Currently we should implement zarr available in an S3 bucket and the local filesystem; however, we should leave the door open for other storage options (explicitly set in CC or determined by URL) - essentially zarr plugins we can add in the future: {code:java} collections: - id: zarr_example_ds_s3 # Zarr array in S3; need to give creds store-type: zarr path: s3://sdap-zarr-bucket/zarr_example_ds priority: 5 forward-processing-priority: 5 projection: Grid dimensionNames: latitude: lat longitude: lon time: time variable: analysed_sst slices: lat: 100 lon: 100 time: 1 aws: accessKeyID: <id> secretAccessKey: <id> public: falsecollections: - id: zarr_example_ds_local # Zarr array in local fs store-type: zarr path: file:///data/zarr_example_ds_local priority: 5 forward-processing-priority: 5 projection: Grid dimensionNames: latitude: lat longitude: lon time: time variable: analysed_sst slices: lat: 100 lon: 100 time: 1 - id: AVHRR_OI_L4_GHRSST_NCEI # Standard ingest to tiles in Cassandra store-type: nexusproto path: /data/granules/*.nc priority: 10 forward-processing-priority: 10 projection: Grid dimensionNames: latitude: lat longitude: lon time: time variable: analysed_sst slices: lat: 100 lon: 100 time: 1{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)