Hello Charles, Although the idea seems good in general, actual implementation may cause much more issues than solve. If you are thinking about distributed cache, then every cache miss on current drillbit will cause network request to other drillbits, passing function arguments along with ids of executable query fragments. Local cache is more suitable of course, but another problem may be that caching is actually great only for repeatable arguments and especially when similar arguments are coming in one by one (like presort rows by the args before function execution). For cases when args are mostly distinct the caching will cause heavy memory overhead. But one case when the caching may perform well is to know that rows are sorted by function arguments and locally cache just one function call result for bunch of repeated rows. For example, suppose that we are executing query
*select x,y,* *slow_function(x,y) from (select x,y from dfs.`large_table` order by x,y)* *x y * *slow_function(x,y)* 1 1 (calculate and cache) 1 1 (get cached) 1 1 (get cached) 1 1 (get cached) 1 2 (calculate and cache) 1 2 (get cached) 1 2 (get cached) in such case heavy logic in *slow_function(x,y)* will be executed only twice for the rows. But in this case ordering by x, y will most probably kill all benefits provided by caching. Thanks, Igor On Thu, Aug 8, 2019 at 7:46 PM Charles Givre <charles.gi...@gtkcyber.com> wrote: > Hello Drill Devs, > I have a question about UDFs. Let's say you have a non-trivial UDF called > foo(x,y) which returns some value. Assuming that if the arguments are the > same, the function foo() will return the same result, does Drill have any > optimizations to prevent running the non-trivial function? > > I was thinking that it might make sense to cache the arguments and results > in memory and before the function is executed, check the cache to see if > they're there. If they are, return the cached results, and if not, execute > the function. I was thinking that for some functions, like date/time > functions, we might want to include something in the code to ensure that > the results do not get cached. > > Thoughts? > > > Charles S. Givre CISSP > Data Scientist, > Co-Founder GTK Cyber LLC > > charles.gi...@gtkcyber.com > *Mobile*: (443) 762-3286 > > > <https://www.linkedin.com/in/cgivre/> > <https://www.linkedin.com/in/cgivre/> > >