I'll put recommendations for the design on the issue. Thanks!

On Fri, Mar 15, 2024 at 2:03 PM Aldrin <octalene....@pm.me.invalid> wrote:

> I created a new issue [1] to track the refactoring. Could you clarify the
> request (here or in the issue)?
>
> My understanding is that the Skyhook file format code [2] should be
> refactored to use a higher-level interface rather than using
> dataset::FileFormat and dataset::FragmentScanOptions directly [3].
>
> I am assuming the reference to Acero and Substrait to be only for context
> and not necessarily a preferred direction. If that is the preferred
> direction, there is something much more general in progress that we can
> perhaps specialize as a replacement for the Skyhook file format, but I'm
> not sure that's what's actually being requested.
>
> Thank you!
>
>
> [1]: https://github.com/apache/arrow/issues/40583
> [2]: https://github.com/apache/arrow/tree/main/cpp/src/skyhook
> [3]:
> https://github.com/apache/arrow/blob/main/cpp/src/skyhook/cls/cls_skyhook.cc#L153-L156
>
>
>
> # ------------------------------
>
> # Aldrin
>
>
> https://github.com/drin/
>
> https://gitlab.com/octalene
>
> https://keybase.io/octalene
>
>
> On Thursday, March 14th, 2024 at 09:10, Jayjeet Chakraborty <
> jayjeetchakrabort...@gmail.com> wrote:
>
> > Hi Ben, I am willing to help out with the refactor too !
> >
>
> > On Wed, Mar 13, 2024 at 9:25 PM Aldrin octalene....@pm.me.invalid wrote:
> >
>
> > > I am interested in helping to refactor!
> > >
>
> > > -Aldrin
> > >
>
> > > On Wed, Mar 13, 2024 at 08:54, Benjamin Kietzman <bengil...@gmail.com
> > > <On+Wed,+Mar+13,+2024+at+08:54,+Benjamin+Kietzman+%3C%3Ca+href=>>
> wrote:
> > >
>
> > > Skyhook [1] enables efficient predicate and projection pushdown from
> > > Arrow Dataset to a Ceph storage cluster. This is very cool
> > > functionality, but it's tightly coupled to the Arrow C++ Dataset
> > > implementation in a way which blocks refactoring. In the Arrow C++
> > > codebase today, Acero is designed specifically to handle projection
> > > and filtration in a more modular fashion, and to accept configuration
> > > from standardized plan/expression formats like Substrait. In light of
> > > improvements to Dataset which are not possible while maintaining
> > > Skyhook in its current form, we need volunteers to update Skyhook.
> > > Please reply to let us know if you are actively using Skyhook or if
> > > you are interested in helping to refactor Skyhook.
> > >
>
> > > Sincerely,
> > > Ben Kietzman
> > >
>
> > > [1]
> > >
>
> > >
> https://arrow.apache.org/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/
> >
>
> >
>
> > --
> > Jayjeet Chakraborty
> > CS PhD student
> > UC Santa Cruz
> > California, USA

Reply via email to