Riley Kuttruff created SDAP-472:
-----------------------------------
Summary: General Zarr support for gridded datasets
Key: SDAP-472
URL: https://issues.apache.org/jira/browse/SDAP-472
Project: Apache Science Data Analytics Platform
Issue Type: New Feature
Components: analysis, collection-ingester
Reporter: Riley Kuttruff
Assignee: Riley Kuttruff
End goal would be SDAP being able to onboard existing Zarr datasets with
minimal to no interaction with the data (ie, no scanning the data for metadata
generation). Gridded formats allow for this, with only the need to record some
(additional) dataset-level metadata. Swath data will require a different and
much more labor-intensive approach, so we should just focus on gridded data as
it will likely be more commonly used by our users.
Collections should be able to be specified in the collection config yaml.
Currently we should implement zarr available in an S3 bucket and the local
filesystem; however, we should leave the door open for other storage options
(explicitly set in CC or determined by URL) - essentially zarr plugins we can
add in the future:
{code:java}
collections:
- id: zarr_example_ds_s3 # Zarr array in S3; need to give creds
store-type: zarr
path: s3://sdap-zarr-bucket/zarr_example_ds
priority: 5
forward-processing-priority: 5
projection: Grid
dimensionNames:
latitude: lat
longitude: lon
time: time
variable: analysed_sst
slices:
lat: 100
lon: 100
time: 1
aws:
accessKeyID: <id>
secretAccessKey: <id>
public: falsecollections:
- id: zarr_example_ds_local # Zarr array in local fs
store-type: zarr
path: file:///data/zarr_example_ds_local
priority: 5
forward-processing-priority: 5
projection: Grid
dimensionNames:
latitude: lat
longitude: lon
time: time
variable: analysed_sst
slices:
lat: 100
lon: 100
time: 1
- id: AVHRR_OI_L4_GHRSST_NCEI # Standard ingest to tiles in Cassandra
store-type: nexusproto
path: /data/granules/*.nc
priority: 10
forward-processing-priority: 10
projection: Grid
dimensionNames:
latitude: lat
longitude: lon
time: time
variable: analysed_sst
slices:
lat: 100
lon: 100
time: 1{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)