[ 
https://issues.apache.org/jira/browse/ARROW-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613332#comment-17613332
 ] 

Yaron Gvili commented on ARROW-16211:
-------------------------------------

Weston is correct that the main use case I designed nested registries for is 
embedded UDFs, or at least UDFs that are used in the context of a particular 
scope, such as a single plan's execution. However, see also [the case I noted 
here|https://issues.apache.org/jira/browse/ARROW-16211?focusedCommentId=17539044&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17539044].

> But that seems not very practical that you need to remember for each function 
> in which registry it lives to ensure you pass the correct one when calling it.

I believe you are referring to a burden on the user, but I don't think the user 
would actually need to do this. A normal case is when a first piece of code 
gets passed in a registry and wants to pass along a registry to a second piece 
of code. So, the first piece of code has two basic things it can do:
 # Pass the registry unmodified; this keeps working in the same scope.
 # Pass a nested registry wrapping it with some modifications; this creates a 
nested scope.

In these cases, the registry being used at each piece of code is managed on 
stack, so when the second piece of code returns the first piece of code 
continues with its original registry, and the user need not manage registries 
manually. It is straightforward to extend this to other cases, like passing 
from a main thread to working threads.

> > There is another case, where some set of UDFs are predefined and then 
> > referenced (e.g. by name) in incoming plans. In that scenario I think a 
> > nested registry is considerably less useful and the ability to unregister 
> > or override would be helpful.
> Yes, and that is the use case that I was talking about (since that is what 
> the pyarrow register_scalar_function enabled you to do)

Even in such a case where one must remove functions, I'd recommend creating a 
new registry instance with the desired functions removed (perhaps by 
initializing a builder from an existing registry instance) then to edit an 
existing registry instance that may be in use elsewhere, in order to make it 
much harder to (inadvertently) create side-effect or race conditions. Still, 
note that it is very easy for a nested registry to effectively support function 
removal, basically by registering a null function on a name, that overrides the 
same-named function of the parent registry; as usual, the nested registry can 
stay fixed once set up.

> [C++][Python] Unregister compute functions
> ------------------------------------------
>
>                 Key: ARROW-16211
>                 URL: https://issues.apache.org/jira/browse/ARROW-16211
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++, Python
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> In general, when using UDFs, the user defines a function expecting a 
> particular outcome. When building the program, there needs to be a way to 
> update existing function kernels if it expands beyond what is planned before. 
> In such situations, there should be a way to remove the existing definition 
> and add a new definition. To enable this, the unregister functionality has to 
> be included. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to