[
https://issues.apache.org/jira/browse/SDAP-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riley Kuttruff updated SDAP-472:
--------------------------------
Status: In Progress (was: To Do)
> General Zarr support for gridded datasets
> -----------------------------------------
>
> Key: SDAP-472
> URL: https://issues.apache.org/jira/browse/SDAP-472
> Project: Apache Science Data Analytics Platform
> Issue Type: New Feature
> Components: analysis, collection-ingester
> Reporter: Riley Kuttruff
> Assignee: Riley Kuttruff
> Priority: Major
>
> End goal would be SDAP being able to onboard existing Zarr datasets with
> minimal to no interaction with the data (ie, no scanning the data for
> metadata generation). Gridded formats allow for this, with only the need to
> record some (additional) dataset-level metadata. Swath data will require a
> different and much more labor-intensive approach, so we should just focus on
> gridded data as it will likely be more commonly used by our users.
>
> Collections should be able to be specified in the collection config yaml.
> Currently we should implement zarr available in an S3 bucket and the local
> filesystem; however, we should leave the door open for other storage options
> (explicitly set in CC or determined by URL) - essentially zarr plugins we can
> add in the future:
>
> {code:yaml}
> collections:
> - id: zarr_example_ds_s3 # Zarr array in S3; need to give creds
> store-type: zarr
> path: s3://sdap-zarr-bucket/zarr_example_ds
> priority: 5
> forward-processing-priority: 5
> projection: Grid
> dimensionNames:
> latitude: lat
> longitude: lon
> time: time
> variable: analysed_sst
> slices:
> lat: 100
> lon: 100
> time: 1
> aws:
> accessKeyID: <id>
> secretAccessKey: <id>
> public: false
> - id: zarr_example_ds_local # Zarr array in local fs
> store-type: zarr
> path: file:///data/zarr_example_ds_local
> priority: 5
> forward-processing-priority: 5
> projection: Grid
> dimensionNames:
> latitude: lat
> longitude: lon
> time: time
> variable: analysed_sst
> slices:
> lat: 100
> lon: 100
> time: 1
> - id: AVHRR_OI_L4_GHRSST_NCEI # Standard ingest to tiles in Cassandra
> store-type: nexusproto
> path: /data/granules/*.nc
> priority: 10
> forward-processing-priority: 10
> projection: Grid
> dimensionNames:
> latitude: lat
> longitude: lon
> time: time
> variable: analysed_sst
> slices:
> lat: 100
> lon: 100
> time: 1{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)