TheNeuralBit commented on a change in pull request #12812:
URL: https://github.com/apache/beam/pull/12812#discussion_r488296111



##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -551,9 +559,13 @@ def merge(
       return merged.reset_index(drop=True)
 
   @frame_base.args_to_kwargs(pd.DataFrame)
-  def nlargest(self, **kwargs):
-    if 'keep' in kwargs and kwargs['keep'] != 'all':
+  @frame_base.populate_defaults(pd.DataFrame)
+  def nlargest(self, keep, **kwargs):
+    if keep == 'any':
+      keep = 'first'

Review comment:
       `any` is not part of the pandas API. I assume this is an option you're 
adding to provide a way to get exactly `n` without promising to get the first 
or last since that's order-sensitive?
   
   That makes sense - but we should probably document the places where we 
diverge from pandas like this (could just be a TODO here and/or a jira for now).

##########
File path: sdks/python/apache_beam/dataframe/expressions.py
##########
@@ -45,6 +45,45 @@ def lookup(self, expr):  #  type: (Expression) -> Any
     return self._bindings[expr]
 
 
+class PartitioningSession(Session):
+  def evaluate(self, expr):

Review comment:
       ```suggestion
   class PartitioningSession(Session):
     """An extension of Session that ensures input dataframes are split into at 
least `len(df)` separate partitions and input in a random order. For testing 
only.
     """
     def evaluate(self, expr):
   ```
   
   Or something like that :)

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -551,9 +559,13 @@ def merge(
       return merged.reset_index(drop=True)
 
   @frame_base.args_to_kwargs(pd.DataFrame)
-  def nlargest(self, **kwargs):
-    if 'keep' in kwargs and kwargs['keep'] != 'all':
+  @frame_base.populate_defaults(pd.DataFrame)
+  def nlargest(self, keep, **kwargs):
+    if keep == 'any':
+      keep = 'first'

Review comment:
       Also: I take it PartitioningSession revealed this problem with 
first/last? That's good to know, I was wondering about these options earlier 
but figured I was misunderstanding something.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to