[ https://issues.apache.org/jira/browse/SPARK-52224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandy Ryza resolved SPARK-52224. -------------------------------- Resolution: Fixed > Add pyyaml as an optional dependency > ------------------------------------ > > Key: SPARK-52224 > URL: https://issues.apache.org/jira/browse/SPARK-52224 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 4.1.0 > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Priority: Major > Labels: pull-request-available > > The pipeline spec file described in the [Declarative Pipelines > SPIP|https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0] > expects data in a YAML format. YAML is superior to alternatives, for a few > reasons: > * Unlike the flat files that are used for [spark-submit > confs|https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file], > it supports the hierarchical data required by the pipeline spec. > * It's much more user-friendly to author than JSON. > * It's consistent with the config files used for similar tools, like dbt. > The Declarative Pipelines CLI will be a script and thus require a Python > library for loading YAML. The pyyaml library is an extremely stable > dependency. The `safe_load` function that we'll use to load YAML files was > introduced at least a decade ago. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org