zentol commented on PR #20003:
URL: https://github.com/apache/flink/pull/20003#issuecomment-1162865164

   > But we don't and won't support removing URLs/JARs.
   
   Do you think this will likely stay the case?
   Once a jar is added it is always distributed for every subsequent query 
independent of whether it actually requires it (as I understood it).
   Are these jars small enough that users won't worry about that?
   
   > Because this is an interactive process, the jars are added dynamically, 
and we don't know what jars will be added at what time point.
   
   I assume the components that make use of the user CL all created when the 
session is started?
   Is there any state kept across these statement that has a hard requirement 
for having access to the user CL?
   For example, when I submit `CREATE TEMPORARY FUNCTION ...`, is the function 
eagerly loaded and put into some data-structure (a catalog I guess?)? Or do we 
store just some description and load it later when required?
   
   > are you fine with the MutableURLClassLoader approach
   
   yes-ish.
   
   I think it would be still be nice if we would still create the user CL ahead 
of time before execution of a query (or more generally before creating any 
component that requires the user-jar). Conceptually that's certainly possible; 
as to whether that's possible (or reasonable) to implement right now w.r.t. the 
current architecture depends a lot of table api internals, how the parsing 
works and how the components are structured. I would defer that decision to you.
   
   For a _very oversimplified_ view, let's consider these statements:
   
   ```
   1) Flink SQL> ADD JAR 's3:///path/to/aaa.jar';
   2) Flink SQL> CREATE TEMPORARY FUNCTION lower AS 
'org.apache.flink.udf.Lower';
   3) Flink SQL> SELECT id, lower(name) FROM T;
   ```
   
   I assume that it is possible to _parse_ these statements without loading any 
user-code.
   
   1) tells us we need a jar, so we add that to some list for later.
   2) is just some function definition that we need later, so we store the 
parsed result _somewhere_
   3) is a query (without side-effects) that requires execution -> build a new 
CL with 1), construct all the components required for execution (planner or 
something), pass statements 2+3 to them.
   
   In a way this is a sort-of pre-processing step that filters for certain 
statement (parts) and maintains some state outside of components required for 
the execution.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to