[GitHub] [spark] ueshin commented on a change in pull request #32886: [SPARK-35478][PYTHON] Enable disallow_untyped_defs mypy check for pyspark.pandas.window.

GitBox Mon, 14 Jun 2021 00:21:27 -0700


ueshin commented on a change in pull request #32886:
URL: https://github.com/apache/spark/pull/32886#discussion_r650257962




##########
File path: python/pyspark/pandas/window.py
##########
@@ -621,10 +633,16 @@ def var(self) -> Union["Series", "DataFrame"]:
 
 
 class RollingGroupby(Rolling):
-    def __init__(self, groupby, window, min_periods=None):
+    def __init__(
+        self,
+        groupby: Union["SeriesGroupBy", "DataFrameGroupBy"],
+        window: int,
+        min_periods: int = None,
+    ):
         from pyspark.pandas.groupby import SeriesGroupBy
         from pyspark.pandas.groupby import DataFrameGroupBy
 
+        psdf_or_psser: Union[DataFrame, Series]

Review comment:
       Shall we avoid variable type annotations for now?
   
   I guess this works.
   
   ```py
   if isinstance(groupby, SeriesGroupBy):
       psdf_or_psser = groupby._psser  # type: Union[DataFrame, Series]
   ...
   ```

##########
File path: python/pyspark/pandas/window.py
##########
@@ -47,7 +54,7 @@ def __init__(self, psdf_or_psser, window, min_periods):
         )
         self._min_periods = min_periods
 
-    def _apply_as_series_or_frame(self, func):
+    def _apply_as_series_or_frame(self, func: Callable[[spark.Column], 
spark.Column]):

Review comment:
       Shall we use only either `spark.Column` or `Column`?
   I don't have strong preference, but we should use one of them.

##########
File path: python/pyspark/pandas/window.py
##########
@@ -1401,10 +1419,11 @@ def var(self) -> Union["Series", "DataFrame"]:
 
 
 class ExpandingGroupby(Expanding):
-    def __init__(self, groupby, min_periods=1):
+    def __init__(self, groupby: Union["SeriesGroupBy", "DataFrameGroupBy"], 
min_periods: int = 1):
         from pyspark.pandas.groupby import SeriesGroupBy
         from pyspark.pandas.groupby import DataFrameGroupBy
 
+        psdf_or_psser: Union[DataFrame, Series]

Review comment:
       ditto.

##########
File path: python/pyspark/pandas/window.py
##########
@@ -723,7 +741,7 @@ def _apply_as_series_or_frame(self, func):
             data_fields=[c._internal.data_fields[0] for c in applied],
         )
 
-        ret = DataFrame(internal)
+        ret: DataFrame = DataFrame(internal)

Review comment:
       ditto.

##########
File path: python/pyspark/pandas/window.py
##########
@@ -31,14 +32,20 @@
 
 from pyspark.pandas.internal import NATURAL_ORDER_COLUMN_NAME, 
SPARK_INDEX_NAME_FORMAT
 from pyspark.pandas.utils import scol_for
+from pyspark.sql.column import Column
+from pyspark.sql.window import WindowSpec
 
 if TYPE_CHECKING:
     from pyspark.pandas.frame import DataFrame  # noqa: F401 (SPARK-34943)
     from pyspark.pandas.series import Series  # noqa: F401 (SPARK-34943)
+    from pyspark.pandas.groupby import SeriesGroupBy
+    from pyspark.pandas.groupby import DataFrameGroupBy

Review comment:
       We need to mark these as `# noqa: F401 (SPARK-34943)`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #32886: [SPARK-35478][PYTHON] Enable disallow_untyped_defs mypy check for pyspark.pandas.window.

Reply via email to