This is an automated email from the ASF dual-hosted git repository.
sandy pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new 0b8a641a2e06 [SPARK-55258][DOCS] Document CLI parameters in
declarative pipelines programming guide
0b8a641a2e06 is described below
commit 0b8a641a2e067db2d0025fea24678e336ceea727
Author: Sandy Ryza <[email protected]>
AuthorDate: Tue Feb 3 08:05:44 2026 -0800
[SPARK-55258][DOCS] Document CLI parameters in declarative pipelines
programming guide
### What changes were proposed in this pull request?
Documents parameters for the `spark-pipelines` CLI, in the declarative
pipelines programming guide
### Why are the changes needed?
Complete documentation
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
### Was this patch authored or co-authored using generative AI tooling?
Closes #54035 from sryza/refresh-selection-docs.
Authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Sandy Ryza <[email protected]>
(cherry picked from commit 9788c52426df29fe4d145255f7a7f945bee96d3a)
Signed-off-by: Sandy Ryza <[email protected]>
---
docs/declarative-pipelines-programming-guide.md | 47 +++++++++++++++++++++++--
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/docs/declarative-pipelines-programming-guide.md
b/docs/declarative-pipelines-programming-guide.md
index 5b3a06fe26c0..c5d18a7cb71b 100644
--- a/docs/declarative-pipelines-programming-guide.md
+++ b/docs/declarative-pipelines-programming-guide.md
@@ -117,10 +117,47 @@ The `spark-pipelines` command line interface (CLI) is the
primary way to manage
`spark-pipelines run` launches an execution of a pipeline and monitors its
progress until it completes.
-The `--spec` parameter allows selecting the pipeline spec file. If not
provided, the CLI will look in the current directory and parent directories for
one of the files:
+Since `spark-pipelines` is built on top of `spark-submit`, it supports all
`spark-submit` arguments except for `--class`. For the complete list of
available parameters, see the [Spark Submit
documentation](https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit).
-* `spark-pipeline.yml`
-* `spark-pipeline.yaml`
+It also supports several pipeline-specific parameters:
+
+* `--spec PATH` - Path to the pipeline specification file. If not provided,
the CLI will look in the current directory and parent directories for one of
the files:
+ * `spark-pipeline.yml`
+ * `spark-pipeline.yaml`
+
+* `--full-refresh DATASETS` - List of datasets to reset and recompute
(comma-separated). This clears all existing data and checkpoints for the
specified datasets and recomputes them from scratch.
+
+* `--full-refresh-all` - Perform a full graph reset and recompute. This is
equivalent to `--full-refresh` for all datasets in the pipeline.
+
+* `--refresh DATASETS` - List of datasets to update (comma-separated). This
triggers an update for the specified datasets without clearing existing data.
+
+#### Refresh Selection Behavior
+
+If no refresh options are specified, a default incremental update is
performed. The refresh parameters are mutually exclusive:
+- `--full-refresh-all` cannot be combined with `--full-refresh` or `--refresh`
+- `--full-refresh` and `--refresh` can be used together to specify different
behaviors for different datasets
+
+#### Examples
+
+```bash
+# Basic run with default incremental update
+spark-pipelines run
+
+# Run with specific spec file
+spark-pipelines run --spec /path/to/my-pipeline.yaml
+
+# Full refresh of specific datasets
+spark-pipelines run --full-refresh orders,customers
+
+# Full refresh of entire pipeline
+spark-pipelines run --full-refresh-all
+
+# Run with custom Spark configuration
+spark-pipelines run --conf spark.sql.shuffle.partitions=200 --driver-memory 4g
+
+# Run on remote Spark Connect server
+spark-pipelines run --remote sc://my-cluster:15002
+```
### `spark-pipelines dry-run`
@@ -129,6 +166,10 @@ The `--spec` parameter allows selecting the pipeline spec
file. If not provided,
- Analysis errors – e.g. selecting from a table or a column that doesn't exist
- Graph validation errors - e.g. cyclic dependencies
+Since `spark-pipelines` is built on top of `spark-submit`, it supports all
`spark-submit` arguments except for `--class`. For the complete list of
available parameters, see the [Spark Submit
documentation](https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit).
+
+It also supports the pipeline-specific `--spec` parameter (see description
above in the `run` section).
+
## Programming with SDP in Python
SDP Python definitions are defined in the `pyspark.pipelines` module.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]