[ https://issues.apache.org/jira/browse/BEAM-11393?focusedWorklogId=519832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519832 ]
ASF GitHub Bot logged work on BEAM-11393: ----------------------------------------- Author: ASF GitHub Bot Created on: 03/Dec/20 19:25 Start Date: 03/Dec/20 19:25 Worklog Time Spent: 10m Work Description: TheNeuralBit opened a new pull request #13476: URL: https://github.com/apache/beam/pull/13476 The PR also includes some other minor changes: - Links BEAM-11393 in the error raised when grouping by Series - PartitioningSession cleanup - moves random state restoration to avoid test_partition_fn consuming random values - now always expects Singleton partitioning when input is Singleton (regardless of preserves spec), since it should always be trivially preserved. Post-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | --- XLang | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | --- Pre-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ --- |Java | Python | Go | Website | Whitespace | Typescript --- | --- | --- | --- | --- | --- | --- Non-portable | [](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/) Portable | --- | [](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | --- See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs. GitHub Actions Tests Status (on master branch) ------------------------------------------------------------------------------------------------ [](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 519832) Remaining Estimate: 0h Time Spent: 10m > Support grouping by a Series > ---------------------------- > > Key: BEAM-11393 > URL: https://issues.apache.org/jira/browse/BEAM-11393 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Reporter: Brian Hulette > Assignee: Brian Hulette > Priority: P2 > Time Spent: 10m > Remaining Estimate: 0h > > grouping by a Series (e.g. \{{df.groupby(df.column)}}, > \{{series.groupby(other_series)}}) does not work. The previous implementation > relied on aligning the index between the two deferred frames, but it's > possible that one or both frames will have duplicate values in their index. > Leading to the following error at execution time: > {code} > Traceback (most recent call last): > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/doctests.py", > line 237, in fix > > computed = self.compute(to_compute) > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/doctests.py", > line 195, in compute_using_session > return { > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/doctests.py", > line 196, in <dictcomp> > name: frame._expr.evaluate_at(session) > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py", > line 329, in evaluate_at > return self._func(*(session.evaluate(arg) for arg in self._args)) > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py", > line 329, in <genexpr> > return self._func(*(session.evaluate(arg) for arg in self._args)) > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py", > line 144, in evaluate > result = evaluate_with(input_partitioning) > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py", > line 114, in evaluate_with > results.append(session.evaluate(expr)) > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py", > line 42, in evaluate > self._bindings[expr] = expr.evaluate_at(self) > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py", > line 329, in evaluate_at > return self._func(*(session.evaluate(arg) for arg in self._args)) > > > File > "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/frames.py", > line 149, in set_index > df, by = df.align(by, axis=0, join='inner') > > > File > "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/frame.py", > line 3962, in align > return super().align( > > File > "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/generic.py", > line 8559, in align > return self._align_series( > File > "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/generic.py", > line 8681, in _align_series > > fdata = fdata.reindex_indexer(join_index, lidx, axis=1) > > > File > "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/internals/managers.py", > line 1276, in reindex_indexer > self.axes[axis]._can_reindex(indexer) > File > "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/indexes/base.py", > line 3289, in _can_reindex > raise ValueError("cannot reindex from > a duplicate axis") > ValueError: cannot reindex from a duplicate axis > {code} > Discovered in https://github.com/apache/beam/pull/13401, GHA run: > https://github.com/apache/beam/runs/1445605501 -- This message was sent by Atlassian Jira (v8.3.4#803005)