[ https://issues.apache.org/jira/browse/ARROW-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552569#comment-17552569 ]
Yaron Gvili commented on ARROW-16796: ------------------------------------- If coding safety is a major concern here (IMHO it is), I'd suggest that in the longer-term Arrow code should distinguish between simplification of expressions with and without functions/execution, where only the former requires an ExecContext whereas only the latter will fail if a function exists in the expression. Perhaps the simplest, though likely not ideal, code-change for this is by defaulting ExecContext to an implementation that fails. The purpose of the PR is just to fix in the short-term. Follow-up issues can be created for what remains. > [C++] Fix bad defaulting of ExecContext argument > ------------------------------------------------ > > Key: ARROW-16796 > URL: https://issues.apache.org/jira/browse/ARROW-16796 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Yaron Gvili > Assignee: Yaron Gvili > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In several places in Arrow code, invocations of Expression::Bind() default > the ExecContext argument. This leads to the default function registry being > used in expression manipulations, and this becomes a problem when the user > wishes to use a non-default function registry, e.g., when passing one to the > ExecContext of an ExecPlan, which is how I discovered this issue. The > problematic places I found for such Expression::Bind() invocation are: > * cpp/src/arrow/dataset/file_parquet.cc > * cpp/src/arrow/dataset/scanner.cc > * cpp/src/arrow/compute/exec/project_node.cc > * cpp/src/arrow/compute/exec/hash_join_node.cc > * cpp/src/arrow/compute/exec/filter_node.cc > There are also other places in test and benchmark code (grep for 'Bind()'). > Another case of bad defaulting of an ExecContext argument is in > Inequality::simplifies_to in cpp/src/compute/exec/expression.cc where a fresh > ExecContext is created, instead of being received from the caller, and passed > to BindNonRecursive. > I'd argue that an ExecContext variable should not be allowed to default, > except perhaps in the highest-level/user-facing APIs. -- This message was sent by Atlassian Jira (v8.20.7#820007)