sarutak commented on a change in pull request #34449: URL: https://github.com/apache/spark/pull/34449#discussion_r739767873
########## File path: binder/postBuild ########## @@ -21,4 +21,4 @@ # Jupyter notebook. VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)") -pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION" Review comment: @HyukjinKwon Let me explain again how the problem happens, just in case. Imagine the following situation. 1. Spark `X.Y.Z-rcN` which refers the commit hash `abcde` is in voting period. 2. Someone accesses to Binder and build the container image with the commit hash `abcde` or equivalent tags (e.g. `rcN`). The image contains `pyspark` but its version is not `X.Y.Z` because `pyspark-X.Y.Z` is not published yet. 3. `rcN` passes the vote and `pyspark-X.Y.Z` is published to PyPi. But the container image in Binder won't be rebuilt because the commit hash is not updated. As a result, the live notebook environment where we can access from the document for `X.Y.Z` doesn't contain `pyspark-X.Y.Z` even though it contains the notebooks of `X.Y.Z`. Can we prevent this issue with `git describe --tags --exact-match` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org