Hi All,
Yes, we can work around Protobuf: rather than using a simple int to identify an
operator, we'd need an extensible solution, maybe a string key or some such.
The larger challenge is that plugin instances need to be clearly separated from
the rest of Drill: all instances must be created via an implementation of an
interface (which is what the ServiceLoader enables.) So, some restructuring
needed and this would be a bit of a longer-term solution rather than a quick
fix. I might revisit the "Base" storage plugin PR to see if I can add such an
interface.
That said, having a proper plugin architecture would generally be a Good Thing.
Thanks,
- Paul
On Wednesday, January 8, 2020, 05:28:03 AM PST, Arina Yelchiyeva
<[email protected]> wrote:
If I am not mistaken, protobuf is not a big deal, well, Web UI won’t display
operator name but other than that code should work.
Kind regards,
Arina
> On Jan 8, 2020, at 2:13 AM, Charles Givre <[email protected]> wrote:
>
> Hi Paul,
> In principle, I like the idea. Currently, the main sticking point with
> format and storage plugins is the protobuf. But for that, they would be
> completely “pluggable".
>
>
>> On Jan 7, 2020, at 5:54 PM, Paul Rogers <[email protected]> wrote:
>>
>> Hi All,
>>
>> Wanted to chime in on this topic. We've long talked about the idea of
>> building plugins separately from Drill itself; but have never had the
>> resources to achieve this goal. Turns out Presto has a nice, simple way to
>> build plugins separately from Presto itself. [1]
>>
>> If Drill were to adopt something similar, then we could divide plugins into
>> two tiers: core plugins supported by the Drill project (with the kind of
>> testing Arena suggests), and true contributed plugins that are maintained by
>> others, using whatever form of testing works for them.
>>
>> Over time, if we find that a plugin becomes "stale" (lack of support or
>> usage), we can perhaps "demote" it to contributed status as a way of
>> managing the fact that we can't/won't do full testing on stale plugins.
>>
>> Presto uses the Java ServiceLoader [2] class to load plugins in a class
>> loader separate from that for Presto itself. A side benefit of such an
>> approach is that, with the right class loader, we get isolation of plugin
>> and Drill dependencies: different plugins can use different Guava versions
>> without the conflicts that we ran into in the run-up to the recent release.
>>
>> At some point we can make a simple trade-off: it might cost less to follow
>> Presto's lead than to continue to deal with the conflicts and ambiguities in
>> our current "everything in the Drill core" approach.
>>
>> Thanks,
>> - Paul
>>
>> [1] https://prestodb.io/docs/current/develop/spi-overview.html
>>
>> [2] https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html
>>
>>
>>
>>
>>
>>
>> On Friday, January 3, 2020, 11:04:53 AM PST, Charles Givre
>><[email protected]> wrote:
>>
>> Arina,
>> Thanks for the response. See responses inline:
>>
>>> On Jan 3, 2020, at 4:40 AM, Arina Yelchiyeva <[email protected]>
>>> wrote:
>>>
>>> Hi Charles,
>>>
>>> I would expect contributor to provide efficient way to test storage
>>> plugins. Depending on the source, some have embedded versions, some don’t.
>>> So contributor should provide instructions how plugin can be tested.
>>> Ideally, tests should be able to run without dependency on external system.
>>> Kudu is old plugin and does not comply with Drill coding standards, so the
>>> fact it does not have unit tests does not mean that other won’t have.
>>
>> I wasn't suggesting that new plugins shouldn't have good testing ;-). Far
>> from it. What I am wondering is whether it is necessary to have tests for
>> every class, or can we go for good coverage by providing a lot of query unit
>> tests that test a lot of different cases?
>>
>> Let's say I'm writing a plugin for Druid and Druid doesn't have an embedded
>> mode. (I have no idea whether Druid does or doesn't, just using it as an
>> example.) Would it be acceptable for a developer to provide a Docker
>> container loaded with Druid and a small dataset to serve as a testbed for
>> the unit tests? Just FYI, for the API plugin, I wrote 16 or so unit tests
>> using a mock webserver, but that one was relatively easy because there are
>> no shortage of libraries for testing HTTP responses.
>>
>>
>>>
>>> I would suggest we won’t accept storage plugins that does not have good
>>> unit tests coverage, documentations and code quality. Since plugins are
>>> easily pluggable, if plugin does not qualify Drill coding standards, it can
>>> be build manually and added as jar to the Drill classpath.
>>
>> I completely agree. One of my biggest frustrations when writing the Drill
>> book was the amount of undocumented functionality in Drill. One thing
>> however, is that I don't think plugins are truly "pluggable" yet because
>> most if not all require updates to the protobufs.
>>
>>> From my perspective, Drill users (or any users) expect if code is available
>>> from official build, it means it fully tested and works. Otherwise, users
>>> would complain that somethings does not work and say the Drill product is
>>> of bad quality.
>>
>>>
>>> Regarding code standards, this should apply to all PRs. We don’t have many
>>> code reviewers for the project and submitting PRs that have poor code
>>> quality would mean that these people (who voluntarily spend time reviewing
>>> and understanding someone code) would have to spend lots of time reviewing
>>> and correcting simple things. Many Apache projects have high code quality
>>> standards, much higher than Drill, so I think it would be fair if
>>> contributor would spend more time making code better than expect reviewer
>>> to point to every trivial issue.
>>>
>>> Kind regards,
>>> Arina
>>>
>>>> On Jan 2, 2020, at 8:50 PM, Charles Givre <[email protected]> wrote:
>>>>
>>>> Hello Drill Devs,
>>>> I wanted to ask a question about testing storage plugins. Currently there
>>>> are PRs for storage plugins for Apache Druid
>>>> (https://github.com/apache/drill/pull/1888
>>>> <https://github.com/apache/drill/pull/1888>), and a generic REST API
>>>> plugin (https://github.com/apache/drill/pull/1892
>>>> <https://github.com/apache/drill/pull/1892>).
>>>>
>>>> I wanted to ask about what is necessary with respect to testing to get a
>>>> storage plugin accepted? I looked at some of the others, and some, like
>>>> the HBase plugin, have many unit tests and others, like the Kudu plugin
>>>> barely have any. Also, since obviously, these unit tests require an
>>>> external system (or at least a docker container of one) what should a
>>>> contributor provide with a storage plugin? Should they provide a docker
>>>> container with a data set pre-loaded, or should the unit tests be marked
>>>> "Ignore"?
>>>>
>>>> When we do releases, are we running the tests against external systems?
>>>> Thanks,
>>>> -- C
>>>>
>>>>
>>>>
>>>>
>>>
>