(spark) branch branch-4.1 updated: [SPARK-55258][DOCS] Document CLI parameters in declarative pipelines programming guide

sandy Tue, 03 Feb 2026 08:45:42 -0800

This is an automated email from the ASF dual-hosted git repository.

sandy pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.1 by this push:
     new 0b8a641a2e06 [SPARK-55258][DOCS] Document CLI parameters in 
declarative pipelines programming guide
0b8a641a2e06 is described below

commit 0b8a641a2e067db2d0025fea24678e336ceea727
Author: Sandy Ryza <[email protected]>
AuthorDate: Tue Feb 3 08:05:44 2026 -0800

    [SPARK-55258][DOCS] Document CLI parameters in declarative pipelines 
programming guide
    
    ### What changes were proposed in this pull request?
    
    Documents parameters for the `spark-pipelines` CLI, in the declarative 
pipelines programming guide
    
    ### Why are the changes needed?
    
    Complete documentation
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Closes #54035 from sryza/refresh-selection-docs.
    
    Authored-by: Sandy Ryza <[email protected]>
    Signed-off-by: Sandy Ryza <[email protected]>
    (cherry picked from commit 9788c52426df29fe4d145255f7a7f945bee96d3a)
    Signed-off-by: Sandy Ryza <[email protected]>
---
 docs/declarative-pipelines-programming-guide.md | 47 +++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/docs/declarative-pipelines-programming-guide.md 
b/docs/declarative-pipelines-programming-guide.md
index 5b3a06fe26c0..c5d18a7cb71b 100644
--- a/docs/declarative-pipelines-programming-guide.md
+++ b/docs/declarative-pipelines-programming-guide.md
@@ -117,10 +117,47 @@ The `spark-pipelines` command line interface (CLI) is the 
primary way to manage
 
 `spark-pipelines run` launches an execution of a pipeline and monitors its 
progress until it completes.
 
-The `--spec` parameter allows selecting the pipeline spec file. If not 
provided, the CLI will look in the current directory and parent directories for 
one of the files:
+Since `spark-pipelines` is built on top of `spark-submit`, it supports all 
`spark-submit` arguments except for `--class`. For the complete list of 
available parameters, see the [Spark Submit 
documentation](https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit).
 
-* `spark-pipeline.yml`
-* `spark-pipeline.yaml`
+It also supports several pipeline-specific parameters:
+
+* `--spec PATH` - Path to the pipeline specification file. If not provided, 
the CLI will look in the current directory and parent directories for one of 
the files:
+  * `spark-pipeline.yml`
+  * `spark-pipeline.yaml`
+
+* `--full-refresh DATASETS` - List of datasets to reset and recompute 
(comma-separated). This clears all existing data and checkpoints for the 
specified datasets and recomputes them from scratch.
+
+* `--full-refresh-all` - Perform a full graph reset and recompute. This is 
equivalent to `--full-refresh` for all datasets in the pipeline.
+
+* `--refresh DATASETS` - List of datasets to update (comma-separated). This 
triggers an update for the specified datasets without clearing existing data.
+
+#### Refresh Selection Behavior
+
+If no refresh options are specified, a default incremental update is 
performed. The refresh parameters are mutually exclusive:
+- `--full-refresh-all` cannot be combined with `--full-refresh` or `--refresh`
+- `--full-refresh` and `--refresh` can be used together to specify different 
behaviors for different datasets
+
+#### Examples
+
+```bash
+# Basic run with default incremental update
+spark-pipelines run
+
+# Run with specific spec file
+spark-pipelines run --spec /path/to/my-pipeline.yaml
+
+# Full refresh of specific datasets
+spark-pipelines run --full-refresh orders,customers
+
+# Full refresh of entire pipeline
+spark-pipelines run --full-refresh-all
+
+# Run with custom Spark configuration
+spark-pipelines run --conf spark.sql.shuffle.partitions=200 --driver-memory 4g
+
+# Run on remote Spark Connect server
+spark-pipelines run --remote sc://my-cluster:15002
+```
 
 ### `spark-pipelines dry-run`
 
@@ -129,6 +166,10 @@ The `--spec` parameter allows selecting the pipeline spec 
file. If not provided,
 - Analysis errors – e.g. selecting from a table or a column that doesn't exist
 - Graph validation errors - e.g. cyclic dependencies
 
+Since `spark-pipelines` is built on top of `spark-submit`, it supports all 
`spark-submit` arguments except for `--class`. For the complete list of 
available parameters, see the [Spark Submit 
documentation](https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit).
+
+It also supports the pipeline-specific `--spec` parameter (see description 
above in the `run` section).
+
 ## Programming with SDP in Python
 
 SDP Python definitions are defined in the `pyspark.pipelines` module.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.1 updated: [SPARK-55258][DOCS] Document CLI parameters in declarative pipelines programming guide

Reply via email to