alamb opened a new issue #340:
URL: https://github.com/apache/arrow-datafusion/issues/340


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   On a PR that added what postgres would term a `stable` function (something 
that is not the same from transaction to transaction, but something that not a 
function of its inputs either), namely `now()`, @jorgecarleitao suggested 
adding a concept of a `StatefulFunction` to use for functions that needed 
state, unlike `ScalarFunction` which is designed to not have state. 
   
   There is a lot of discussion on 
https://github.com/apache/arrow-datafusion/pull/288#issuecomment-839705580 and 
I will try to summarize a bunch of that; 
   
   @jorgecarleitao :
   
   > AFAIK current_* are all derived from now; imo the differentiator aspect 
here is that there is some state X that is being shared.
   >
   > It seems to me that the use-case here is that we want to preserve state 
across nodes, so that their execution depends on said state. NOW is an example, 
but in reality, random is also an example; we "cheated" a bit by not allowing 
users to select a seed. If they want that, we hit the same problem as NOW.
   >
   > IMO a natural construct here is something like struct StatefulFunction<T: 
Send + Sync>, where T is the state, and Arc<T> is inside of it, and that 
implements PhysicalExpr. During planning, the initial state is passed to it 
from the planner, and we are ready to fly.
   >
   > The ScalarFunction construct was meant to be stateless because it makes it 
very easy to develop, and it also makes it obvious that is stateless. Trying to 
couple execution state to them is imo going beyond its scope.
   
   @returnString 
   > In Postgres, this sort of corresponds to the function volatility 
categories (https://www.postgresql.org/docs/13/xfunc-volatility.html) which 
might be a useful basis for any future definition of different function types.
   >
   > immutable: pure function, can only use arguments and internal constants 
(example: basic math ops). Optimiser can do lots here
   > stable: can refer to shared state but must return the same value for the 
same arguments within a given statement (example: `now`). Optimiser is allowed 
to unify all references into one call per unique set of arguments 
   > volatile: no rules, no optimiser potential! Must always be evaluated 
exactly as initially planned (example: `random`)
   >
   > ...
   > Off the top of my head I think it'll open up some potential for 
generalised optimisation passes over function usage in queries according to 
function class, i.e. the optimiser rule used for the initial implementation of 
this PR but applicable to arbitrary functions provided they indicate themselves 
to be "stable".
   
   cc @returnString @jorgecarleitao @msathis @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to