[ 
https://issues.apache.org/jira/browse/SPARK-52224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-52224:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add pyyaml as an optional dependency
> ------------------------------------
>
>                 Key: SPARK-52224
>                 URL: https://issues.apache.org/jira/browse/SPARK-52224
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 4.1.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Major
>              Labels: pull-request-available
>
> The pipeline spec file described in the [Declarative Pipelines 
> SPIP|https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0]
>  expects data in a YAML format. YAML is superior to alternatives, for a few 
> reasons: 
>  * Unlike the flat files that are used for [spark-submit 
> confs|https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file],
>  it supports the hierarchical data required by the pipeline spec.
>  * It's much more user-friendly to author than JSON.
>  * It's consistent with the config files used for similar tools, like dbt.
> The Declarative Pipelines CLI will be a script and thus require a Python 
> library for loading YAML. The pyyaml library is an extremely stable 
> dependency. The `safe_load` function that we'll use to load YAML files was 
> introduced at least a decade ago.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to