Skyhook [1] enables efficient predicate and projection pushdown from
Arrow Dataset to a Ceph storage cluster. This is very cool
functionality, but it's tightly coupled to the Arrow C++ Dataset
implementation in a way which blocks refactoring. In the Arrow C++
codebase today, Acero is designed specifically to handle projection
and filtration in a more modular fashion, and to accept configuration
from standardized plan/expression formats like Substrait. In light of
improvements to Dataset which are not possible while maintaining
Skyhook in its current form, we need volunteers to update Skyhook.
Please reply to let us know if you are actively using Skyhook or if
you are interested in helping to refactor Skyhook.

Sincerely,
Ben Kietzman

[1]
https://arrow.apache.org/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/

Reply via email to