This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 3148511a923 [SPARK-43123][PS] Raise `TypeError` for `DataFrame.interpolate` when all columns are object-dtype 3148511a923 is described below commit 3148511a923bf59ea37d8f44e7427cde66f9f167 Author: Haejoon Lee <haejoon....@databricks.com> AuthorDate: Tue Sep 12 14:36:42 2023 +0800 [SPARK-43123][PS] Raise `TypeError` for `DataFrame.interpolate` when all columns are object-dtype ### What changes were proposed in this pull request? This PR proposes to aise `TypeError` for `DataFrame.interpolate` when all columns are object-dtype. ### Why are the changes needed? To match the behavior of Pandas: ```python >>> pd.DataFrame({"A": ['a', 'b', 'c'], "B": ['a', 'b', 'c']}).interpolate() ... TypeError: Cannot interpolate with all object-dtype columns in the DataFrame. Try setting at least one column to a numeric dtype. ``` We currently return empty DataFrame instead of raise TypeError: ```python >>> pd.DataFrame({"A": ['a', 'b', 'c'], "B": ['a', 'b', 'c']}).interpolate() Empty DataFrame Columns: [] Index: [0, 1, 2] ``` ### Does this PR introduce _any_ user-facing change? Compute `DataFrame.interpolate` on DataFrame that has all object-dtype columns will raise TypeError instead of returning an empty DataFrame. ### How was this patch tested? Added UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42878 from itholic/SPARK-45123. Authored-by: Haejoon Lee <haejoon....@databricks.com> Signed-off-by: Ruifeng Zheng <ruife...@apache.org> --- python/pyspark/pandas/frame.py | 5 +++++ python/pyspark/pandas/tests/test_frame_interpolate.py | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index adbef607256..3aebbd65427 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -6097,6 +6097,11 @@ defaultdict(<class 'list'>, {'col..., 'col...})] if isinstance(psser.spark.data_type, (NumericType, BooleanType)): numeric_col_names.append(psser.name) + if len(numeric_col_names) == 0: + raise TypeError( + "Cannot interpolate with all object-dtype columns in the DataFrame. " + "Try setting at least one column to a numeric dtype." + ) psdf = self[numeric_col_names] return psdf._apply_series_op( lambda psser: psser._interpolate( diff --git a/python/pyspark/pandas/tests/test_frame_interpolate.py b/python/pyspark/pandas/tests/test_frame_interpolate.py index 5b5856f7ab8..17c73781f8e 100644 --- a/python/pyspark/pandas/tests/test_frame_interpolate.py +++ b/python/pyspark/pandas/tests/test_frame_interpolate.py @@ -53,6 +53,11 @@ class FrameInterpolateTestsMixin: with self.assertRaisesRegex(ValueError, "invalid limit_area"): psdf.id.interpolate(limit_area="jump") + with self.assertRaisesRegex( + TypeError, "Cannot interpolate with all object-dtype columns in the DataFrame." + ): + ps.DataFrame({"A": ["a", "b", "c"], "B": ["a", "b", "c"]}).interpolate() + def _test_interpolate(self, pobj): psobj = ps.from_pandas(pobj) self.assert_eq(psobj.interpolate(), pobj.interpolate()) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org