[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports
HyukjinKwon commented on a change in pull request #29121: URL: https://github.com/apache/spark/pull/29121#discussion_r461360724 ## File path: python/pyspark/ml/tests/test_stat.py ## @@ -40,7 +40,7 @@ def test_chisquaretest(self): if __name__ == "__main__": -from pyspark.ml.tests.test_stat import * +from pyspark.ml.tests.test_stat import * # noqa: F401 Review comment: Ah, these imports actually makes the output file paths correct. For example, if you run the unittests, the console outputs are like: ``` test_blahblah (pyspark.sql.blahblah.BlahBlahTests) ... ok -- Ran 3 tests in 0.001s OK ``` if you remove the imports, the output becomes: ``` test_blahblah (__main__.BlahBlahTests) ... ok -- Ran 3 tests in 0.001s OK ``` It's related to who we run the tests via our custom script `python/run-tests.py`. Let's leave them as are for now. ## File path: python/pyspark/ml/tests/test_wrapper.py ## @@ -116,9 +116,8 @@ def test_new_java_array(self): java_array = JavaWrapper._new_java_array(str_list, java_class) self.assertEqual(_java2py(self.sc, java_array), expected_str_list) -if __name__ == "__main__": -from pyspark.ml.tests.test_wrapper import * Review comment: So .. let's don't remove this. ## File path: python/pyspark/ml/tests/test_stat.py ## @@ -40,7 +40,7 @@ def test_chisquaretest(self): if __name__ == "__main__": -from pyspark.ml.tests.test_stat import * +from pyspark.ml.tests.test_stat import * # noqa: F401 Review comment: Ah, these imports actually make the output file paths correct. For example, if you run the unittests, the console outputs are like: ``` test_blahblah (pyspark.sql.blahblah.BlahBlahTests) ... ok -- Ran 3 tests in 0.001s OK ``` if you remove the imports, the output becomes: ``` test_blahblah (__main__.BlahBlahTests) ... ok -- Ran 3 tests in 0.001s OK ``` It's related to how we run the tests via our custom script `python/run-tests.py`. Let's leave them as are for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports
HyukjinKwon commented on a change in pull request #29121: URL: https://github.com/apache/spark/pull/29121#discussion_r460362434 ## File path: python/pyspark/ml/tests/test_stat.py ## @@ -40,7 +40,7 @@ def test_chisquaretest(self): if __name__ == "__main__": -from pyspark.ml.tests.test_stat import * +from pyspark.ml.tests.test_stat import * # noqa: F401 Review comment: No, no. don't remove this. It will keep the source code path in console output. It is also used in coverage report as well. XML reporter is used in Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports
HyukjinKwon commented on a change in pull request #29121: URL: https://github.com/apache/spark/pull/29121#discussion_r460362184 ## File path: python/pyspark/heapq3.py ## @@ -865,7 +865,7 @@ def nlargest(n, iterable, key=None): # If available, use C implementation try: -from _heapq import * +from _heapq import * # noqa: F401 Review comment: This too. It was ported back from CPython 3, and is excluded in other linter checks. Can we exclude this file as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports
HyukjinKwon commented on a change in pull request #29121: URL: https://github.com/apache/spark/pull/29121#discussion_r460361924 ## File path: python/pyspark/cloudpickle/cloudpickle.py ## @@ -57,7 +57,6 @@ from .compat import pickle from typing import Generic, Union, Tuple, Callable from pickle import _getattribute -from importlib._bootstrap import _find_spec Review comment: @Fokko, can we exclude this file in the linter checking? We're already skipping other lint checks (see `dev/tox.ini`) and this is usually exactly matched to the release of cloudpickle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports
HyukjinKwon commented on a change in pull request #29121: URL: https://github.com/apache/spark/pull/29121#discussion_r455781637 ## File path: python/pyspark/mllib/classification.py ## @@ -102,6 +102,7 @@ class LogisticRegressionModel(LinearClassificationModel): in Multinomial Logistic Regression. By default, it is binary logistic regression so numClasses will be set to 2. +>>> from pyspark.mllib.linalg import SparseVector Review comment: I don't think PySpark examples import relevant PySpark classes within doctests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports
HyukjinKwon commented on a change in pull request #29121: URL: https://github.com/apache/spark/pull/29121#discussion_r454870938 ## File path: python/pyspark/sql/functions.py ## @@ -29,9 +29,9 @@ from pyspark.sql.dataframe import DataFrame from pyspark.sql.types import StringType, DataType # Keep UserDefinedFunction import for backwards compatible import; moved in SPARK-22409 -from pyspark.sql.udf import UserDefinedFunction, _create_udf Review comment: `UserDefinedFunction` should be kept as described above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org