[GitHub] [spark] sarutak commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

GitBox Sun, 31 Oct 2021 00:14:38 -0700


sarutak commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739767873




##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); 
print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"

Review comment:
       @HyukjinKwon Let me explain again how the problem happens, just in case. 
Imagine the following situation.
   
   1. Spark `X.Y.Z-rcN` which refers the commit hash `abcde` is in voting 
period.
   2. Someone accesses to Binder and build the container image with the commit 
hash `abcde` or equivalent tags (e.g. `rcN`). The image contains `pyspark` but 
its version is not `X.Y.Z` because `pyspark-X.Y.Z` is not published yet.
   3. `rcN` passes the vote and `pyspark-X.Y.Z` is published to PyPi. But the 
container image in Binder won't be rebuilt because the commit hash is not 
updated.
   
   As a result, the live notebook environment where we can access from the 
document for `X.Y.Z` doesn't contain `pyspark-X.Y.Z` even though it contains 
the notebooks of `X.Y.Z`.
   
   Can we prevent this issue with `git describe --tags --exact-match` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Reply via email to