tvalentyn commented on code in PR #28422:
URL: https://github.com/apache/beam/pull/28422#discussion_r1327694184
##########
sdks/python/apache_beam/dataframe/frames_test.py:
##########
@@ -1900,14 +1916,35 @@ def test_dataframe_groupby_series(self, agg_type):
self.skipTest(
"https://github.com/apache/beam/issues/20967: proxy generation of "
"DataFrameGroupBy.describe fails in pandas < 1.2")
- self._run_test(
- lambda df: df[df.foo > 40].groupby(df.group).agg(agg_type),
- GROUPBY_DF,
- check_proxy=False)
- self._run_test(
- lambda df: df[df.foo > 40].groupby(df.foo % 3).agg(agg_type),
- GROUPBY_DF,
- check_proxy=False)
+
+ def agg(df, group_by, **kwargs):
Review Comment:
nit: i'd try to simplify this. i would try to mimick the anticipated usage.
If Pandas1:
test all aggregations from new list without extra parameters
otherwise:
set test `numeric_only`=True .
I don't think there is much value to check that Pandas 2 will throw error
for incorrect usage.
##########
sdks/python/apache_beam/dataframe/frames_test.py:
##########
@@ -1634,6 +1634,11 @@ def test_pivot_no_index_provided_on_multiindex(self):
# https://github.com/pandas-dev/pandas/issues/40139
ALL_GROUPING_AGGREGATIONS = sorted(
set(frames.ALL_AGGREGATIONS) - set(('kurt', 'kurtosis')))
+# In Pandas 2 all these change to having default False for numeric_only.
+# The other methods already started that way or don't have that argument.
+NUMERIC_ONLY_DEFAULT_TRUE_FOR_PANDAS_LT_2_AGGREGATIONS = set(
Review Comment:
```suggestion
AGGREGATIONS_WHERE_NUMERIC_ONLY_DEFAULTS_TO_TRUE_IN_PANDAS_1 = set(
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]