[GitHub] [beam] TheNeuralBit commented on a change in pull request #13401: Add additional verification in PartitioningSession

GitBox Mon, 23 Nov 2020 10:51:15 -0800


TheNeuralBit commented on a change in pull request #13401:
URL: https://github.com/apache/beam/pull/13401#discussion_r528923757




##########
File path: sdks/python/apache_beam/dataframe/expressions.py
##########
@@ -48,10 +48,15 @@ def lookup(self, expr):  #  type: (Expression) -> Any
 class PartitioningSession(Session):
   """An extension of Session that enforces actual partitioning of inputs.
 
-  When evaluating an expression, inputs are partitioned according to its
-  `requires_partition_by` specifications, the expression is evaluated on each
-  partition separately, and the final result concatinated, as if this were
-  actually executed in a parallel manner.
+  Each expression is evaluated multiple times for various supported

Review comment:
       Yeah I think the performance impact is worth it. We could probably 
reduce it by being a little more discerning in what gets re-executed. For 
example I don't think there's any value in doing this for expressions that have 
preserves=Nothing().
   
   Really good point about the random seed. I could bracket the runs with calls 
to random.getstate() and random.setstate()




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] TheNeuralBit commented on a change in pull request #13401: Add additional verification in PartitioningSession

Reply via email to