jorgecarleitao commented on pull request #288:
URL: https://github.com/apache/arrow-datafusion/pull/288#issuecomment-839705580


   AFAIK `current_*` are all derived from `now`; imo the differentiator aspect 
here is that there is some state `X` that is being shared.
   
   It seems to me that the use-case here is that we want to preserve state 
across nodes, so that their execution depends on said state. `NOW` is an 
example, but in reality, `random` is also an example; we "cheated" a bit by not 
allowing users to select a seed. If they want that, we hit the same problem as 
NOW.
   
   IMO a natural construct here is something like `struct StatefulFunction<T: 
Send + Sync>`, where `T` is the state, and `Arc<T>` is inside of it, and that 
implements `PhysicalExpr`. During planning, the initial state is passed to it 
from the planner, and we are ready to fly.
   
   For Ballista, we would probably need `T` to also be "proto-able", but we can 
handle that on a future PR when we add support for this node to Ballista.
   
   IMO this would enable us to implement `now`, but also a lot of other 
functions that share state (e.g. pass `T: Mutex<something>` and we have a 
function that can change state during evaluation, which AFAIK is what we need 
for window functions, whereby the state is updated as batches pass through it).
   
   The `ScalarFunction` construct was meant to be stateless because it makes it 
very easy to develop, and it also makes it obvious that is stateless. Trying to 
couple execution state to them is imo going beyond its scope.
   
   IMO a `PhysicalExpr` is not really that difficult to implement; it is just a 
`struct` with `evaluate`, `schema`, and a new match on the physical planner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to