pitrou commented on PR #47377:
URL: https://github.com/apache/arrow/pull/47377#issuecomment-4304952483
First I forgot another problem: the PR currently uses 32-bit indices, but we
really want 64-bit indices, right? (perhaps UInt64 to match what sort_indices
outputs, though that's unnecessary).
> I agree the execution layer is already overly complicated.
Separately from this PR, can we think about ways to make it simpler? Perhaps
there are internal execution "options" that aren't really useful.
> On performance: the selection path is only intended to be exercised from
the upcoming special-form work (#47374), as a narrow and semantically explicit
entry point.
Yes, but we would like it to be more generally useful for execution engines,
right?
> in summary, sparse execution wins strongly at low selectivity, while the
worst regressions (up to ~4x) are from the generic dense fallback when a kernel
doesn’t provide `selective_exec` (extra gather/scatter calls)
The regression might be much worse on chunked inputs?
> If you have a preferred API shape (hybrid runs+indices, a generalized
“selection” object, or a different exec signature), I’d really appreciate
guidance - I’d rather adjust before we cement the API.
I'm not sure what it should look like, and we can probably add some
complexity piecewise if we agree the API remains experimental.
Ideally I'd like something that can be used internally for take/filter as
well.
A conceptual sketch could look like:
```c++
struct ContiguousSpan {
int64_t start_offset;
int64_t length;
};
struct FilteredSpan {
int64_t start_offset;
int64_t length;
/* followed by a filter bitmap with `length` bits */
};
struct DiscreteSpan {
int64_t length;
/* followed by `length` 64-bit indices */
};
using SelectionSpan = std::variant<ContiguousSpan, FilteredSpan,
DiscreteSpan>;
```
(but SelectionSpan would actually be encoded using some bit-twiddling and a
selection vector would be a Buffer containing a number of SelectionSpans)
That's of course quite a bit of work and DiscreteSpan might be the only
implemented variant at the start.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]