Re: [DISCUSS] Drill 2 and plug-in organisation

Ted Dunning Thu, 13 Jan 2022 22:17:48 -0800

The bigger reason for a separate plug-in world is the enhancement of
community.

I would recommend looking at the Julia community for examples of
effective ways to drive plug in structure.

At the core, for any pure julia package, you can simply add a package by
referring to the github repository where the package is stored. For
packages that are "registered" (i.e. a path and a checksum is recorded in a
well known data store), you can add a package by simply naming it without
knowing the path.  All such plugins are tested by the authors and the
project records all dependencies with version constraints so that cascading
additions are easy. The community leaders have made tooling available so
that you can test your package against a range of versions of Julia by
pretty simple (to use) Github actions.

The result has been an absolute explosion in the number of pure Julia
packages.

For packages that include C or Fortran (or whatever) code, there is some
amazing tooling available that lets you record a build process on any of
the supported platforms (Linux, LinuxArm, 32 or 64 bit, windows, BSD, OSX
and so on). WHen you register such a package, it is automagically built on
all the platforms you indicate and the binary results are checked into a
central repository known as Yggdrasil.

All of these registration events for different packages are recorded in a
central registry as I mentioned. That registry is recorded in Github as
well which makes it easy to propagate changes.

On Thu, Jan 13, 2022 at 8:45 PM James Turton <dz...@apache.org> wrote:

> Hello dev community
>
> Discussions about reorganising the Drill source code to better position
> the project to support plug-ins for the "long tail" of weird and
> wonderful systems and data formats have been coming up here and there
> for a few months, e.g. in https://github.com/apache/drill/pull/2359.
>
> A view which I personally share is that adding too large a number and
> variety of plug-ins to the main tree would create a lethal maintenance
> burden for developers working there and lead down a road of accumulating
> technical debt.  The Maven tricks we must employ to harmonise the
> growing set of dependencies of the main tree to keep it buildable are
> already enough, as is the size of our distributable and the count of
> open bug reports.
>
>
> Thus, the idea of splitting out "/contrib" into a new
> apache/drill-contrib repo after selecting a subset of plugins to remain
> in apache/drill.  I'll now volunteer a set of criteria to decide whether
> a plug-in should live in this notional apache/drill-contrib.
>
>  1. The plug-in queries an unstructured data format (even if it only
>     reads metadata fields) e.g. Image format plug-in.
>  2. The plug-in queries a data format that was designed for human
>     consumption e.g. Excel format plug-in.
>  3. The plug-in cannot be expected to run with speed and reliability
>     comparable to querying structured data on the local network e.g.
>     Dropbox storage plugin.
>  4. The plug-in queries an obscure system or format e.g. we receive a
>     plug-in for some data format used only on old Cray supercomputers.
>  5. The plug-in can for some reason not be well supported by the Drill
>     devs e.g. it has a JNI dependency on some difficult native libs.
>
>
> Any one of those suggests that an apache/drill-contrib is the better
> home to me, but what is your view?  Would we apply significantly more
> relaxed standards when reviewing PRs to apache/drill-contrib?  Would we
> tag, build and test apache/drill-contrib with every release of
> apache/drill, or would it run on its own schedule, perhaps with users
> downloading builds made continuously from snapshots of HEAD?
>
>
> Regards
> James
>
>
>

Re: [DISCUSS] Drill 2 and plug-in organisation

Reply via email to