[ 
https://issues.apache.org/jira/browse/BEAM-11393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-11393:
---------------------------------
    Labels:   (was: stale-P2)

> Support grouping by a Series
> ----------------------------
>
>                 Key: BEAM-11393
>                 URL: https://issues.apache.org/jira/browse/BEAM-11393
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Priority: P3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> grouping by a Series (e.g. \{{df.groupby(df.column)}}, 
> \{{series.groupby(other_series)}}) does not work. The previous implementation 
> relied on aligning the index between the two deferred frames, but it's 
> possible that one or both frames will have duplicate values in their index. 
> Leading to the following error at execution time:
> {code}
>     Traceback (most recent call last):                                        
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/doctests.py",
>  line 237, in fix                                                             
>                                               
>         computed = self.compute(to_compute)                                   
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/doctests.py",
>  line 195, in compute_using_session
>         return {                                                              
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/doctests.py",
>  line 196, in <dictcomp>                                              
>         name: frame._expr.evaluate_at(session)                                
>                                      
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py",
>  line 329, in evaluate_at                        
>         return self._func(*(session.evaluate(arg) for arg in self._args))     
>                                      
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py",
>  line 329, in <genexpr>                                             
>         return self._func(*(session.evaluate(arg) for arg in self._args))     
>                                      
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py",
>  line 144, in evaluate                           
>         result = evaluate_with(input_partitioning)                            
>                                                                               
>                                                                               
>     File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py",
>  line 114, in evaluate_with
>         results.append(session.evaluate(expr))                                
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py",
>  line 42, in evaluate
>         self._bindings[expr] = expr.evaluate_at(self)                         
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/expressions.py",
>  line 329, in evaluate_at
>         return self._func(*(session.evaluate(arg) for arg in self._args))     
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/working_dir/beam/sdks/python/apache_beam/dataframe/frames.py",
>  line 149, in set_index
>         df, by = df.align(by, axis=0, join='inner')                           
>                                                                               
>                                                                             
>       File 
> "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/frame.py",
>  line 3962, in align                                                          
>                                                return super().align(          
>                    
>       File 
> "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/generic.py",
>  line 8559, in align                                   
>         return self._align_series(                        
>       File 
> "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/generic.py",
>  line 8681, in _align_series                                                  
>     
>         fdata = fdata.reindex_indexer(join_index, lidx, axis=1)               
>                                                                               
>                                                                               
>     File 
> "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/internals/managers.py",
>  line 1276, in reindex_indexer
>         self.axes[axis]._can_reindex(indexer)             
>       File 
> "/usr/local/google/home/bhulette/.pyenv/versions/beam/lib/python3.8/site-packages/pandas/core/indexes/base.py",
>  line 3289, in _can_reindex                                                   
>                                         raise ValueError("cannot reindex from 
> a duplicate axis")                                                    
>     ValueError: cannot reindex from a duplicate axis           
> {code}
> Discovered in https://github.com/apache/beam/pull/13401, GHA run: 
> https://github.com/apache/beam/runs/1445605501



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to