[
https://issues.apache.org/jira/browse/SPARK-52224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandy Ryza updated SPARK-52224:
-------------------------------
Description:
The pipeline spec file described in the [Declarative Pipelines
SPIP|https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0]
expects data in a YAML format. YAML is superior to alternatives, for a few
reasons:
* Unlike the flat files that are used for [spark-submit
confs|https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file],
it supports the hierarchical data required by the pipeline spec.
* It's much more user-friendly to author than JSON.
* It's consistent with the config files used for similar tools, like dbt.
The Declarative Pipelines CLI will be a script and thus require a Python
library for loading YAML. The pyyaml library is an extremely stable dependency.
The `safe_load` function that we'll use to load YAML files was introduced at
least a decade ago.
> Add pyyaml as an optional dependency
> ------------------------------------
>
> Key: SPARK-52224
> URL: https://issues.apache.org/jira/browse/SPARK-52224
> Project: Spark
> Issue Type: Sub-task
> Components: PySpark
> Affects Versions: 4.1.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Priority: Major
>
> The pipeline spec file described in the [Declarative Pipelines
> SPIP|https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0]
> expects data in a YAML format. YAML is superior to alternatives, for a few
> reasons:
> * Unlike the flat files that are used for [spark-submit
> confs|https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file],
> it supports the hierarchical data required by the pipeline spec.
> * It's much more user-friendly to author than JSON.
> * It's consistent with the config files used for similar tools, like dbt.
> The Declarative Pipelines CLI will be a script and thus require a Python
> library for loading YAML. The pyyaml library is an extremely stable
> dependency. The `safe_load` function that we'll use to load YAML files was
> introduced at least a decade ago.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]