zhengruifeng opened a new pull request, #55945:
URL: https://github.com/apache/spark/pull/55945

   ### What changes were proposed in this pull request?
   
   Backport of #55872 to branch-4.2.
   
   This PR consolidates the `python-ps-minimum` Docker image and its CI 
workflow into the existing `python-minimum` image, eliminating a near-duplicate.
   
   Specifically:
   - Deletes `dev/spark-test-image/python-ps-minimum/Dockerfile`.
   - Deletes `.github/workflows/build_python_ps_minimum.yml`.
   - Adds `"pyspark-pandas": "true"` to 
`.github/workflows/build_python_minimum.yml` so Pandas API on Spark 
minimum-deps coverage is preserved.
   - Drops the `python-ps-minimum` entries from 
`.github/workflows/build_infra_images_cache.yml` (the `paths` trigger and the 
build/push step).
   - Removes the `build_python_ps_minimum.yml` badge from `README.md`.
   
   ### Why are the changes needed?
   
   To save CI resources. The two Dockerfiles were nearly identical. The only 
functional differences were in `BASIC_PIP_PKGS`:
   
   | Package | python-minimum | python-ps-minimum |
   |---|---|---|
   | `numpy` | pinned `==1.22.4` | unpinned |
   | `scikit-learn` | included | omitted |
   
   Everything else (base image, apt packages, Python version, venv setup, 
`CONNECT_PIP_PKGS`) was the same. Maintaining both images doubles the image 
build/cache cost and runs a duplicate scheduled workflow without commensurate 
test value. Reusing `python-minimum` (which has the stricter pin and a superset 
of packages) for the Pandas API on Spark minimum-deps job keeps coverage while 
halving the image footprint and the associated CI runtime.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI-only change.
   
   ### How was this patch tested?
   
   Existing CI. The merged `build_python_minimum.yml` now runs both `pyspark` 
and `pyspark-pandas` jobs against the `python-minimum` image.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (model: claude-opus-4-7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to