Hi Ted, Thanks for the explanation, makes sense.
Ideally, the client side would be somewhat agnostic about the repo it pulls from. In a corporate setting, it should pull from the "JFrog Repository" that everyone seems to use (but which I know basically nothing.) Oh, lord, a plugin architecture for the repo for the plugin architecture? - Paul On Mon, Jan 17, 2022 at 1:46 PM Ted Dunning <ted.dunn...@gmail.com> wrote: > > Paul, > > I understood your suggestion. My point is that publishing to Maven > central is a bit of a pain while publishing by posting to Github is nearly > painless. In particular, because Github inherently produces a relatively > difficult to fake hash for each commit, referring to a dependency using > that hash is relatively safe which saves a lot of agony regarding keys and > trust. > > Further, Github or any comparable service provides the same "already > exists" benefit as does Maven. > > > > On Mon, Jan 17, 2022 at 1:30 PM Paul Rogers <par0...@gmail.com> wrote: > >> Hi Ted, >> >> Well said. Just to be clear, I wasn't suggesting that we use >> Maven-the-build-tool to distribute plugins. Rather, I was simply observing >> that building a global repo is a bit of a project and asked, "what could we >> use that already exists?" The Python repo? No. The Ubuntu/RedHat/whatever >> Linux repos? Maybe. Maven's repo? Why not? >> >> The idea would be that Drill might have a tool that says, "install the >> FooBlaster" plugin. It downloads from a repo (Maven central, say) and puts >> the plugin in the proper plugins directory. In a cluster, either it does >> that on every node, or the work is done as part of preparing a Docker >> container which is then pushed to every node. >> >> The key thought is just to make the problem simpler by avoiding the need >> to create and maintain a Drill-specific repo when we can barely have enough >> resources to keep Drill itself afloat. >> >> None of this can happen, however, unless we clean up the plugin APIs and >> ensure plugins can be built outside of the Drill repo. (That means, say, >> that Drill needs an API library that resides in Maven.) >> >> There are probably many ways this has been done. Anyone know of any good >> examples we can learn from? >> >> Thanks, >> >> - Paul >> >> >> On Mon, Jan 17, 2022 at 9:40 AM Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> >>> >>> I don't think that Maven is a forced move just because Drill is in Java. >>> It may be a good move, but it isn't a forgone conclusion. For one thing, >>> the conventions that Maven uses are pretty hard-wired and it may be >>> difficult to have a reliable deny-list of known problematic plugins. >>> Publishing to Maven is more of a pain than simply pushing to github. >>> >>> The usability here is paramount both for the ultimate Drill user, but >>> also for the writer of plugins. >>> >>> >>> >>> On Mon, Jan 17, 2022 at 5:06 AM James Turton <dz...@apache.org> wrote: >>> >>>> Thank you Ted and Paul for the feedback. Since Java is compiled, Maven >>>> is probably better fit than GitHub for distribution? If Drillbits can >>>> write to their jars/3rdparty directory then I can imagine Drill gaining >>>> the ability to fetch and install plugins itself without too much >>>> trouble, at least for Drill clusters with Internet access. >>>> "Sideloading" by downloading from Maven and copying manually would >>>> always remain possible. >>>> >>>> @Paul I'll try to get a little time with you to get some ideas about >>>> designing a plugin API. >>>> >>>> On 2022/01/14 23:20, Paul Rogers wrote: >>>> > Hi All, >>>> > >>>> > James raises an important issue, I've noticed that it used to be easy >>>> to >>>> > build and test Drill, now it is a struggle, because of the many odd >>>> > external dependencies we have introduced. That acts as a big damper on >>>> > contributions: none of us get paid enough to spend more time fighting >>>> > builds than developing the code... >>>> > >>>> > Ted is right that we need a good way to install plugins. There are two >>>> > parts. Ted is talking about the high-level part: make it easy to >>>> point to >>>> > some repo and use the plugin. Since Drill is Java, the Maven repo >>>> could be >>>> > a good mechanism. In-house stuff is often in an internal repo that >>>> does >>>> > whatever Maven needs. >>>> > >>>> > The reason that plugins are in the Drill project now is that Drill's >>>> "API" >>>> > is all of Drill. Plugins can (and some do) access all of Drill though >>>> the >>>> > fragment context. The API to Calcite and other parts of Drill are >>>> wide, and >>>> > tend to be tightly coupled with Drill internals. By contrast, other >>>> tools, >>>> > such as Presto/Trino, have defined very clean APIs that extensions >>>> use. In >>>> > Druid, everything is integrated via Google Guice and an extension can >>>> > replace any part of Druid (though, I'm not convinced that's actually >>>> a good >>>> > idea.) I'm sure there are others we can learn from. >>>> > >>>> > So, we need to define a plugin API for Drill. I started down that >>>> route a >>>> > while back: the first step was to refactor the plugin registry so it >>>> is >>>> > ready for extensions. The idea was to use the same mechanism for all >>>> kinds >>>> > of extensions (security, UDFs, metastore, etc.) The next step was to >>>> build >>>> > something that roughly followed Presto, but that kind of stalled out. >>>> > >>>> > In terms of ordering, we'd first need to define the plugin API. Then, >>>> we >>>> > can shift plugins to use that. Once that is done, we can move plugins >>>> to >>>> > separate projects. (The metastore implementation can also move, if we >>>> > want.) Finally, figure out a solution for Ted's suggestion to make it >>>> easy >>>> > to grab new extensions. Drill is distributed, so adding a new plugin >>>> has to >>>> > happen on all nodes, which is a bit more complex than the typical >>>> > Julia/Python/R kind of extension. >>>> > >>>> > The reason we're where we're at is that it is the path of least >>>> resistance. >>>> > Creating a good extension mechanism is hard, but valuable, as Ted >>>> noted. >>>> > >>>> > Thanks, >>>> > >>>> > - Paul >>>> > >>>> > On Thu, Jan 13, 2022 at 10:18 PM Ted Dunning<ted.dunn...@gmail.com> >>>> wrote: >>>> > >>>> >> The bigger reason for a separate plug-in world is the enhancement of >>>> >> community. >>>> >> >>>> >> I would recommend looking at the Julia community for examples of >>>> >> effective ways to drive plug in structure. >>>> >> >>>> >> At the core, for any pure julia package, you can simply add a >>>> package by >>>> >> referring to the github repository where the package is stored. For >>>> >> packages that are "registered" (i.e. a path and a checksum is >>>> recorded in a >>>> >> well known data store), you can add a package by simply naming it >>>> without >>>> >> knowing the path. All such plugins are tested by the authors and the >>>> >> project records all dependencies with version constraints so that >>>> cascading >>>> >> additions are easy. The community leaders have made tooling >>>> available so >>>> >> that you can test your package against a range of versions of Julia >>>> by >>>> >> pretty simple (to use) Github actions. >>>> >> >>>> >> The result has been an absolute explosion in the number of pure Julia >>>> >> packages. >>>> >> >>>> >> For packages that include C or Fortran (or whatever) code, there is >>>> some >>>> >> amazing tooling available that lets you record a build process on >>>> any of >>>> >> the supported platforms (Linux, LinuxArm, 32 or 64 bit, windows, >>>> BSD, OSX >>>> >> and so on). WHen you register such a package, it is automagically >>>> built on >>>> >> all the platforms you indicate and the binary results are checked >>>> into a >>>> >> central repository known as Yggdrasil. >>>> >> >>>> >> All of these registration events for different packages are recorded >>>> in a >>>> >> central registry as I mentioned. That registry is recorded in Github >>>> as >>>> >> well which makes it easy to propagate changes. >>>> >> >>>> >> >>>> >> >>>> >> On Thu, Jan 13, 2022 at 8:45 PM James Turton<dz...@apache.org> >>>> wrote: >>>> >> >>>> >>> Hello dev community >>>> >>> >>>> >>> Discussions about reorganising the Drill source code to better >>>> position >>>> >>> the project to support plug-ins for the "long tail" of weird and >>>> >>> wonderful systems and data formats have been coming up here and >>>> there >>>> >>> for a few months, e.g. inhttps://github.com/apache/drill/pull/2359. >>>> >>> >>>> >>> A view which I personally share is that adding too large a number >>>> and >>>> >>> variety of plug-ins to the main tree would create a lethal >>>> maintenance >>>> >>> burden for developers working there and lead down a road of >>>> accumulating >>>> >>> technical debt. The Maven tricks we must employ to harmonise the >>>> >>> growing set of dependencies of the main tree to keep it buildable >>>> are >>>> >>> already enough, as is the size of our distributable and the count of >>>> >>> open bug reports. >>>> >>> >>>> >>> >>>> >>> Thus, the idea of splitting out "/contrib" into a new >>>> >>> apache/drill-contrib repo after selecting a subset of plugins to >>>> remain >>>> >>> in apache/drill. I'll now volunteer a set of criteria to decide >>>> whether >>>> >>> a plug-in should live in this notional apache/drill-contrib. >>>> >>> >>>> >>> 1. The plug-in queries an unstructured data format (even if it >>>> only >>>> >>> reads metadata fields) e.g. Image format plug-in. >>>> >>> 2. The plug-in queries a data format that was designed for human >>>> >>> consumption e.g. Excel format plug-in. >>>> >>> 3. The plug-in cannot be expected to run with speed and >>>> reliability >>>> >>> comparable to querying structured data on the local network >>>> e.g. >>>> >>> Dropbox storage plugin. >>>> >>> 4. The plug-in queries an obscure system or format e.g. we >>>> receive a >>>> >>> plug-in for some data format used only on old Cray >>>> supercomputers. >>>> >>> 5. The plug-in can for some reason not be well supported by the >>>> Drill >>>> >>> devs e.g. it has a JNI dependency on some difficult native >>>> libs. >>>> >>> >>>> >>> >>>> >>> Any one of those suggests that an apache/drill-contrib is the better >>>> >>> home to me, but what is your view? Would we apply significantly >>>> more >>>> >>> relaxed standards when reviewing PRs to apache/drill-contrib? >>>> Would we >>>> >>> tag, build and test apache/drill-contrib with every release of >>>> >>> apache/drill, or would it run on its own schedule, perhaps with >>>> users >>>> >>> downloading builds made continuously from snapshots of HEAD? >>>> >>> >>>> >>> >>>> >>> Regards >>>> >>> James >>>> >>> >>>> >>> >>>> >>> >>>> >>>>