[ https://issues.apache.org/jira/browse/BEAM-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363989#comment-17363989 ]
Brian Hulette commented on BEAM-12495: -------------------------------------- Sent https://github.com/apache/beam/pull/15019 which should detect this situation and raise a helpful error (rather than returning the wrong result). Unfortunately to actually fix this we will need to address the upstream bug: https://github.com/pandas-dev/pandas/issues/36470 > DataFrame API: groupby(dropna=False) still drops NAs when grouping on > multiple columns or indexes > ------------------------------------------------------------------------------------------------- > > Key: BEAM-12495 > URL: https://issues.apache.org/jira/browse/BEAM-12495 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Brian Hulette > Priority: P2 > Labels: dataframe-api > Time Spent: 1h > Remaining Estimate: 0h > > {code} > df.groupby(['foo', 'bar'], dropna=False).sum() > {code} > This will still drop NAs in the output. > This is due to pandas bug > [36470|https://github.com/pandas-dev/pandas/issues/36470] "BUG: groupby(..., > dropna=False) excludes NA values when grouping on MultiIndex levels". > We implement groupby by moving all grouped data into the index and requiring > Index() partitioning, so we will always run into this issue, even when the > user is grouping on columns, not indexes. -- This message was sent by Atlassian Jira (v8.3.4#803005)