Hi Charles,

Excellent question. The short answer is "no", but the longer answer is "we 
should fix this so the answer is yes."

The reason the answer is "no" today is that Drill uses some odd magic to match 
up scan batch creators with plugins. In particular, the class of the second 
argument to the scan batch creator constructor is used to match the creator to 
the serialized SubScan object. Doing so saved having to do any configuration: 
Drill just looks at the code to figure it out. The drawback is that there must 
be a separate scan batch creator for each sub scan.

One solution is to do what the Easy framework does: have a single SubScan for 
all formats. Works OK for Easy, not so well for storage plugins.

A better solution is to make the association explicit through some form of API, 
configuration, etc. Presto, for example, has a set of interfaces that create 
the objects required for a connector. No magic; just implement a method.

At your prompting, I'll go back and look at the "Base" framework to see if we 
can apply some of these ideas to Drill. For example, we could replace the scan 
batch creator with a method call that says, "here is your Sub Scan. Give me 
back a Scan operator." Since most of the scan setup is generic, we could 
standardize this with another call that says, "here is your sub scan. Give me 
back an iterator over readers."

Thanks,
- Paul

 

    On Friday, January 17, 2020, 12:07:20 PM PST, Charles Givre 
<cgi...@gmail.com> wrote:  
 
 Hey Paul, 
In looking through the storage plugins, it seems as if the scan batch creator 
is virtually identical EXCEPT for arguments passed to the RecordReader class.  
I'm wondering if that could be abstracted in the Base Storage PR as well. 
-- C  

Reply via email to