[GitHub] [spark] zero323 commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

GitBox Sat, 26 Sep 2020 12:48:12 -0700


zero323 commented on a change in pull request #29180:
URL: https://github.com/apache/spark/pull/29180#discussion_r495487007




##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi

Review comment:
       I think we should make this optional for now, i.e.
   
   ```bash
       if ! hash "$MYPY_BUILD" 2> /dev/null; then
           echo "The $MYPY_BUILD command was not found. Skipping MyPy test 
check build for now."
           echo
           return
       fi
   ```

##########
File path: python/pyspark/serializers.pyi
##########
@@ -0,0 +1,128 @@
+#

Review comment:
       This has been already ported.

##########
File path: dev/lint-python
##########
@@ -122,6 +123,31 @@ function pycodestyle_test {
     fi
 }
 
+function mypy_test {
+    local MYPY_REPORT=
+    local MYPY_STATUS=
+
+    if ! hash "$MYPY_BUILD" 2> /dev/null; then
+        echo "The mypy command was not found."
+        echo "mypy checks failed."
+        exit 1
+    fi
+
+    echo "starting $MYPY_BUILD test..."
+    MYPY_REPORT=$( ($MYPY_BUILD --ignore-missing-imports --config-file 
python/mypy.ini python/) 2>&1)

Review comment:
       I wouldn't use `--ignore-missing-imports` here. It is an atomic option.
   
   Instead I would use either specific excludes (as we have right now in 
mypy.ini) or targeted `type: ignore` (as in some test modules and such).

##########
File path: python/pyspark/accumulators.pyi
##########
@@ -0,0 +1,76 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/pyspark/broadcast.pyi
##########
@@ -0,0 +1,51 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/mypy.ini
##########
@@ -0,0 +1,20 @@
+[mypy]
+python_version = 3.6
+
+[mypy-pyspark.cloudpickle.*]
+ignore_errors = True
+
+[mypy-pyspark.streaming.tests.*]
+ignore_errors = True
+
+[mypy-pyspark.sql.tests.*]
+ignore_errors = True
+
+[mypy-pyspark.testing.*]
+ignore_errors = True
+
+[mypy-pyspark.tests.*]
+ignore_errors = True

Review comment:
       I am strongly against excluding tests and testing here.
   
   These already type check, with only a bunch of `ignores` and  it is an 
additional coverage.

##########
File path: dev/lint-python
##########
@@ -246,6 +272,7 @@ PYTHON_SOURCE="$(find . -name "*.py")"
 compile_python_test "$PYTHON_SOURCE"
 pycodestyle_test "$PYTHON_SOURCE"
 flake8_test
+mypy_test

Review comment:
       Personally I'd consider having type checks in a separate script, as we 
might want to extend things over time (I plan to add tests on examples, same as 
in 
[`pyspark-stubs`](https://github.com/zero323/pyspark-stubs/blob/d317c27083549b6727542d825e642ad91f275f9d/.travis.yml#L18)
 and thinking about migrating original data driven tests).
   
   But it is something we can discuss later.

##########
File path: python/pyspark/mllib/linalg/__init__.py
##########
@@ -469,7 +469,7 @@ def __getattr__(self, item):
     def __neg__(self):
         return DenseVector(-self.array)
 
-    def _delegate(op):
+    def _delegate(op: str):  # type: ignore

Review comment:
       I believe we can skip this one.

##########
File path: python/pyspark/ml/base.py
##########
@@ -309,7 +309,8 @@ def setPredictionCol(self, value):
         """
         return self._set(predictionCol=value)
 
-    @abstractproperty
+    @property  # type: ignore

Review comment:
       These ignores are not strictly required now, but I think it is OK to 
keep them, as there is not real harm here and they're likely to prove required 
in the future.

##########
File path: python/pyspark/sql/session.py
##########
@@ -76,7 +76,7 @@ class Builder(object):
         """
 
         _lock = RLock()
-        _options = {}
+        _options = {}  # type: ignore

Review comment:
       I believe we can skip this one.

##########
File path: python/pyspark/sql/types.pyi
##########
@@ -0,0 +1,238 @@
+#

Review comment:
       This has been already ported.

##########
File path: python/setup.py
##########
@@ -28,7 +28,8 @@
     print("Failed to load PySpark version file for packaging. You must be in 
Spark's python dir.",
           file=sys.stderr)
     sys.exit(-1)
-VERSION = __version__  # noqa
+
+VERSION = __version__  # type: ignore  # noqa

Review comment:
       I don't think it is required

##########
File path: python/pyspark/util.pyi
##########
@@ -0,0 +1,33 @@
+#

Review comment:
       This has been already ported, though I am not sure if we're going to 
consider it a part of the API.

##########
File path: python/pyspark/storagelevel.py
##########
@@ -19,6 +19,14 @@
 
 
 class StorageLevel(object):
+    DISK_ONLY: 'StorageLevel'

Review comment:
       This is already covered with stubs and I think it is better to avoid 
annotations in `.py` unless they're strictly needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #29180: [SPARK-17333][PYSPARK] Enable mypy on the repository

Reply via email to