[ 
https://issues.apache.org/jira/browse/SPARK-52224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated SPARK-52224:
-------------------------------
    Description: 
The pipeline spec file described in the [Declarative Pipelines 
SPIP|https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0]
 expects data in a YAML format. YAML is superior to alternatives, for a few 
reasons: 
 * Unlike the flat files that are used for [spark-submit 
confs|https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file],
 it supports the hierarchical data required by the pipeline spec.
 * It's much more user-friendly to author than JSON.
 * It's consistent with the config files used for similar tools, like dbt.

The Declarative Pipelines CLI will be a script and thus require a Python 
library for loading YAML. The pyyaml library is an extremely stable dependency. 
The `safe_load` function that we'll use to load YAML files was introduced at 
least a decade ago.

> Add pyyaml as an optional dependency
> ------------------------------------
>
>                 Key: SPARK-52224
>                 URL: https://issues.apache.org/jira/browse/SPARK-52224
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 4.1.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Major
>
> The pipeline spec file described in the [Declarative Pipelines 
> SPIP|https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0]
>  expects data in a YAML format. YAML is superior to alternatives, for a few 
> reasons: 
>  * Unlike the flat files that are used for [spark-submit 
> confs|https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file],
>  it supports the hierarchical data required by the pipeline spec.
>  * It's much more user-friendly to author than JSON.
>  * It's consistent with the config files used for similar tools, like dbt.
> The Declarative Pipelines CLI will be a script and thus require a Python 
> library for loading YAML. The pyyaml library is an extremely stable 
> dependency. The `safe_load` function that we'll use to load YAML files was 
> introduced at least a decade ago.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to