HyukjinKwon commented on code in PR #36127: URL: https://github.com/apache/spark/pull/36127#discussion_r846923142
########## python/pyspark/pandas/generic.py: ########## @@ -3181,6 +3181,83 @@ def ffill( pad = ffill + # TODO: add 'axis', 'inplace', 'limit_direction', 'limit_area', 'downcast' + def interpolate( + self: FrameLike, + method: Optional[str] = None, + limit: Optional[int] = None, + ) -> FrameLike: + """ + Fill NaN values using an interpolation method. + + Parameters + ---------- + method : str, default 'linear' + Interpolation technique to use. One of: + + * 'linear': Ignore the index and treat the values as equally + spaced. + + limit : int, optional + Maximum number of consecutive NaNs to fill. Must be greater than + 0. + + Returns + ------- + Series or DataFrame or None + Returns the same object type as the caller, interpolated at + some or all NA values. + + See Also + -------- + fillna : Fill missing values using different methods. + + Examples + -------- + Filling in NA via linear interpolation. + + >>> s = ps.Series([0, 1, np.nan, 3]) + >>> s + 0 0.0 + 1 1.0 + 2 NaN + 3 3.0 + dtype: float64 + >>> s.interpolate() + 0 0.0 + 1 1.0 + 2 2.0 + 3 3.0 + dtype: float64 + + Fill the DataFrame forward (that is, going down) along each column + using linear interpolation. + + Note how the last entry in column 'a' is interpolated differently, + because there is no entry after it to use for interpolation. + Note how the first entry in column 'b' remains NA, because there + is no entry before it to use for interpolation. + + >>> df = ps.DataFrame([(0.0, np.nan, -1.0, 1.0), + ... (np.nan, 2.0, np.nan, np.nan), + ... (2.0, 3.0, np.nan, 9.0), + ... (np.nan, 4.0, -4.0, 16.0)], + ... columns=list('abcd')) + >>> df + a b c d + 0 0.0 NaN -1.0 1.0 + 1 NaN 2.0 NaN NaN + 2 2.0 3.0 NaN 9.0 + 3 NaN 4.0 -4.0 16.0 + >>> df.interpolate(method='linear') + a b c d + 0 0.0 NaN -1.0 1.0 + 1 1.0 2.0 -2.0 5.0 + 2 2.0 3.0 -3.0 9.0 + 3 2.0 4.0 -4.0 16.0 Review Comment: I think we should better probably add `Notes` section with describing that this API is expensive because Window functions will be executed within one executor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org