findepi commented on issue #8051:
URL: https://github.com/apache/datafusion/issues/8051#issuecomment-2348333254

   I am not using ballista currently. I realized the plan serialization concern 
is easy to address if we separate simplify into phases: the Expr-constraint 
simplify (e.g. pruning args known to be null, etc) that would be run during 
plan optimization phase. And then, local execution simplify which allows a 
function to "compile itself" into most optimal form, without any needs for 
serialization anymore. Ballista would need to serialize and distribute the 
plans in between these phases.
   
   BTW we focused so far on compiling regular expressions, but we didn't think 
about memory needs for their execution. 
   Internally `regex::Regex::is_match` uses a synchronized pool of "caches" 
(regex execution scratch space) underneath. I don't know if this is a perf 
problem (probably not!), but let me use this as an example. It would probably 
be good if at runtime a scalar could have its own thread local "scratch space" 
/ "local buffer". And without having to use thread locals which aren't great if 
DF is embedded and doesn't control thread creation.
   
   Why am I mentioning this? I thought that maybe if we had "scratch space" / 
"local buffer" support, we wouldn't have need to "compile functions" during 
planning.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to