Yikun commented on a change in pull request #35977: URL: https://github.com/apache/spark/pull/35977#discussion_r836004084
########## File path: python/pyspark/ml/image.py ########## @@ -28,7 +28,7 @@ from typing import Any, Dict, List, NoReturn, Optional, cast import numpy as np -from distutils.version import LooseVersion +from packaging.version import Version Review comment: > Is this 3rd party library? https://pypi.org/project/packaging/ , yes, it is. And in the standard library, seems not have a way to handle this since python 3.12. > Adding a new dep is problematic Looks like we might have below way to solve this: - Maintaince the code like `distutils`/`packaging` version in PySpark, just like [cloudpickle](https://github.com/apache/spark/tree/master/python/pyspark/cloudpickle), or at least a simple version implementations. - Introduce the packaging as 3rd party lib, add deps in setup and [docs](https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies). BTW, I just think it isn't an ideal way that we always matainence all 3rd deps in pyspark, have we considered the extra third-party dependent installations as a required/option step in the installation of pyspark (especially via downloading distribution)? such as require users install before pyspark startup or install deps in [`bin/pyspark`](https://github.com/apache/spark/blob/master/bin/pyspark) automatically ( This may require additional network access). ########## File path: python/pyspark/ml/image.py ########## @@ -28,7 +28,7 @@ from typing import Any, Dict, List, NoReturn, Optional, cast import numpy as np -from distutils.version import LooseVersion +from packaging.version import Version Review comment: > Is this 3rd party library? https://pypi.org/project/packaging/ , yes, it is. And in the standard library, seems not have a way to handle this since python 3.12 in future. > Adding a new dep is problematic Looks like we might have below way to solve this: - Maintaince the code like `distutils`/`packaging` version in PySpark, just like [cloudpickle](https://github.com/apache/spark/tree/master/python/pyspark/cloudpickle), or at least a simple version implementations. - Introduce the packaging as 3rd party lib, add deps in setup and [docs](https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies). BTW, I just think it isn't an ideal way that we always matainence all 3rd deps in pyspark, have we considered the extra third-party dependent installations as a required/option step in the installation of pyspark (especially via downloading distribution)? such as require users install before pyspark startup or install deps in [`bin/pyspark`](https://github.com/apache/spark/blob/master/bin/pyspark) automatically ( This may require additional network access). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org