Hello dev community
Discussions about reorganising the Drill source code to better position
the project to support plug-ins for the "long tail" of weird and
wonderful systems and data formats have been coming up here and there
for a few months, e.g. in https://github.com/apache/drill/pull/2359.
A view which I personally share is that adding too large a number and
variety of plug-ins to the main tree would create a lethal maintenance
burden for developers working there and lead down a road of accumulating
technical debt. The Maven tricks we must employ to harmonise the
growing set of dependencies of the main tree to keep it buildable are
already enough, as is the size of our distributable and the count of
open bug reports.
Thus, the idea of splitting out "/contrib" into a new
apache/drill-contrib repo after selecting a subset of plugins to remain
in apache/drill. I'll now volunteer a set of criteria to decide whether
a plug-in should live in this notional apache/drill-contrib.
1. The plug-in queries an unstructured data format (even if it only
reads metadata fields) e.g. Image format plug-in.
2. The plug-in queries a data format that was designed for human
consumption e.g. Excel format plug-in.
3. The plug-in cannot be expected to run with speed and reliability
comparable to querying structured data on the local network e.g.
Dropbox storage plugin.
4. The plug-in queries an obscure system or format e.g. we receive a
plug-in for some data format used only on old Cray supercomputers.
5. The plug-in can for some reason not be well supported by the Drill
devs e.g. it has a JNI dependency on some difficult native libs.
Any one of those suggests that an apache/drill-contrib is the better
home to me, but what is your view? Would we apply significantly more
relaxed standards when reviewing PRs to apache/drill-contrib? Would we
tag, build and test apache/drill-contrib with every release of
apache/drill, or would it run on its own schedule, perhaps with users
downloading builds made continuously from snapshots of HEAD?
Regards
James