[GitHub] [beam] TheNeuralBit commented on a change in pull request #14517: [BEAM-12029] Make WontImplementErrors more helpful for order-sensitive operations

GitBox Wed, 14 Apr 2021 15:16:22 -0700


TheNeuralBit commented on a change in pull request #14517:
URL: https://github.com/apache/beam/pull/14517#discussion_r613621517




##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1393,9 +1430,16 @@ def fill_dataframe(*args):
 
 
 
-  cummax = cummin = cumsum = cumprod = frame_base.wont_implement_method(
-      'order-sensitive')
-  diff = frame_base.wont_implement_method('order-sensitive')
+  cummax = frame_base.order_sensitive_method(pd.DataFrame, 'cummax')
+  cummin = frame_base.order_sensitive_method(pd.DataFrame, 'cummin')
+  cumprod = frame_base.order_sensitive_method(pd.DataFrame, 'cumprod')
+  cumsum = frame_base.order_sensitive_method(pd.DataFrame, 'cumsum')
+  diff = frame_base.order_sensitive_method(pd.DataFrame, 'diff')

Review comment:
       The problem isn't that the individual operations are difficult to make 
efficient, it's that the underlying PCollections have no concept of ordering. 
So at execution time we don't know which row is supposed to be before this one. 
In all likelihood it's on another worker due to our hash-based partitioning.
   
   @robertwb and I have talked about implementing order-sensitive methods in 
the future (I recently filed BEAM-12129 to track this). I think the way it 
would work is that we would allow users to perform an operation like 
`sort_values` that _imposes_ an ordering, and then it would be possible to 
perform order-sensitive operations on the output.
   
   Under the hood we would partition "sorted" dataframes based on their 
ordering, rather than randomly with the hash-based partitioning.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] TheNeuralBit commented on a change in pull request #14517: [BEAM-12029] Make WontImplementErrors more helpful for order-sensitive operations

Reply via email to