westonpace commented on PR #34834: URL: https://github.com/apache/arrow/pull/34834#issuecomment-1624731983
> And substrait doesn't have an "is_in" like function? (or are there plans for that?) > (this conversion seems unfortunate, as "is_in" can be more efficient than the equivalent or-list) It's an interesting point. We have things like this outside of expressions too. For example, the "join" node doesn't distinguish between an equality join (which can be done efficiently with a hashmap) and a non-equality join (which cannot). In that case we actually have both representations. The one people typically use is the "JoinRel" which is a logical operator and thus allowed to be more generic without concern for efficiency and the other one is the "HashJoinRel" which is more specific / physical, but typically not created by producers (instead planners or optimizers convert from one to the other). I think this is interesting because "is_in" vs. "singular-or-list" is basically a logical vs physical distinction for expressions which I don't think I've really considered before, but I agree with you its valid. In any case, it will be easy enough in Acero's converter, to recognize the cases that can collapse to `is_in` and use it where appropriate. I've created https://github.com/apache/arrow/issues/36535 to track this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
