[ 
https://issues.apache.org/jira/browse/ARROW-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530945#comment-17530945
 ] 

Weston Pace commented on ARROW-15582:
-------------------------------------

I'd have a slight preference for #2 (Arrow has two ternary functions at the 
moment, if_else, and replace_with_mask, and it shouldn't be too bad to add 
more).

I think #1 is something that will happen a lot at some point but I feel like it 
lives in the realm of the query planner/optimizer.  So I'd almost want to say 
"Arrow doesn't support that function" before we get into the realm of 
"equivalent but not identical plans".

Having something like #3 in Substrait would possible enable something like #1 
to happen in a query planner.  One could then imagine the following 
conversation between planner and consumer:
 * Planner: Do you support clip?
 * Consumer: No
 * Planner: Do you support clip_lower and clip_upper?
 * Consumer: Yes
 * Planner produces plan with clip_lower and clip_upper.

I'm happy for any compromise / alternatives for the short term.  There is also 
a related JIRA ( ARROW-15535 ) which covers automatic generation of YAML.

The way I think about it is that the "standard Substrait namespace" will 
require a high degree of manual mapping.  However, the "Arrow namespace" should 
be automatically generated.  Having the Arrow namespace will likely be good 
enough for initial development tasks and quickly expose the entirety of Arrow's 
compute functionality while we wait for the longer standards-approved 
methodologies to roll in.

For example, the automatic generation should be able to easily map 
{{min_element_wise}} and {{max_element_wise}} so that we can use those while 
testing other features and prototyping.  Then any prototypes can switch over to 
using "clip" as they need to start supporting multi-consumer support, etc.

I think we could also have a short term solution for the standard Substrait 
namespace too if someone wants to put together a PR.

> [C++] Add support for registering tricky functions with the Substrait 
> consumer (or add a bunch of substrait meta functions)
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-15582
>                 URL: https://issues.apache.org/jira/browse/ARROW-15582
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Sanjiban Sengupta
>            Priority: Major
>              Labels: substrait
>
> Sometimes one Substrait function will map to multiple Arrow functions.  For 
> example, the Substrait {{add}} function might be referring to Arrow's {{add}} 
> or {{add_checked}}.  We need to figure out how to register this correctly 
> (e.g. one possible approach would be a {{substrait_add}} meta function).
> Other times a substrait function will encode something Arrow considers an 
> "option" as a function argument.  For example, the is_in Arrow function is 
> unary with an option for the lookup set.  The substrait function is binary 
> but the second argument must be constant and be the lookup set.  Neither of 
> which is to be confused with a truly binary is_in function which takes in a 
> different set at every row.
> It's possible there is no work to do here other than adding a bunch of 
> substrait_ meta functions in Arrow.  In that case all the work will be done 
> in other JIRAs.  Or, it is possible that there is some kind of extension we 
> can make to the function registry that bypasses the need for the meta 
> functions.  I'm leaving this JIRA open so future contributors can consider 
> this second option.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to