Re: Dynamic UDFs support

Keys Botzum Fri, 22 Jul 2016 09:16:24 -0700

No disagreement on deferral but I raised my initial concern precisely because 
I'm concerned about the practicality of the "restart the cluster" option. I  
sighted my concerns about laptops and development clusters.  I was wondering if 
there might be some small things Drill could do to help. If there is nothing 
that can be done to make this easier, so be it, but I think that's going to be 
a big impedance.


Keys
_______________________________
Keys Botzum 
Senior Principal Technologist
[email protected] <mailto:[email protected]>
443-718-0098
MapR Technologies 
http://www.mapr.com <http://www.mapr.com/>
> On Jul 22, 2016, at 1:37 AM, Neeraja Rentachintala 
> <[email protected]> wrote:
> 
> It seems like we are reaching a conclusion here in terms of starting with a
> simpler implementation i.e being able to deploy UDFs dynamically without
> Drillbit restarts based off a jars in DFS location.  Dropping functions
> dynamically is out of scope for version 1 of this feature (we assume
> development of UDFs is happening on user laptop or a dev cluster where its
> ok to have restart).
> 
> -Neeraja
> 
> On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum <[email protected]> wrote:
> 
>> Recognize the difficulty. Not suggesting this be addressed in first
>> version. Just suggesting some thought about how a real user will
>> workaround. Maybe some doc and/or small changes can make this easier.
>> 
>> Keys
>> _______________________________
>> Keys Botzum
>> Senior Principal Technologist
>> [email protected]
>> 443-718-0098
>> MapR Technologies
>> http://www.mapr.com
>> On Jul 21, 2016 1:45 PM, "Paul Rogers" <[email protected]> wrote:
>> 
>>> Hi All,
>>> 
>>> Adding a dynamic DROP would, of course, be a great addition! The reason
>>> for suggesting we skip that was to control project scope.
>>> 
>>> Dynamic DROP requires a synchronization step. Here’s the scenario:
>>> 
>>> * Foreman A starts a query using UDF U.
>>> * Foreman B receives a request to drop UDF U, followed by a request to
>> add
>>> a new version of U, U’.
>>> 
>>> How do we drop a function that may be in use? There are some tricky bits
>>> to work out, which seemed too overwhelming to consider all in one go.
>>> 
>>> Clearly just dropping U and adding a new version of U with the same name
>>> leads to issues if not synchronized. If a Drillbit D is running a query
>>> with U when it receives notice to drop U, should D complete the query or
>>> fail it? If the query completes, then how does D deal with the request to
>>> register U’, which has the same name?
>>> 
>>> Do we globally synchronize function deletion? (The foreman B that
>> receives
>>> the drop request waits for all queries using U to finish.) But, how do we
>>> know which queries use U?
>>> 
>>> An eventually consistent approach is to track the age of the oldest
>>> running query. Suppose B drops U at time T. Any query received after T
>> that
>>> uses U will fail in planning. A new U’ can’t be registered until all
>>> queries that started before T complete.
>>> 
>>> The primary challenge we face in both the CREATE and DROP cases is that
>>> Drill is distributed with little central coordination. That’s great for
>>> scale, but makes it hard to design features that require coordination.
>> Some
>>> other tools solve this problem with a data dictionary (or “metastore").
>>> Alas, Drill does not have such a concept. So a seemingly simple feature
>>> like dynamic UDF becomes a major design challenge to get right.
>>> 
>>> Thanks,
>>> 
>>> - Paul
>>> 
>>>> On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala <
>>> [email protected]> wrote:
>>>> 
>>>> The whole point of this feature is to avoid Drill cluster restarts as
>> the
>>>> name indicates 'Dynamic' UDFs.
>>>> So any design that requires restarts I would think would beat the
>>> purpose.
>>>> 
>>>> I also think this is an example of a feature we start with a simple
>>> design
>>>> to serve the purpose, take feedback on how it is being deployed/used in
>>>> real user situations and improve it in subsequent releases.
>>>> 
>>>> -thanks
>>>> Neeraja
>>>> 
>>>> On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <[email protected]>
>>> wrote:
>>>> 
>>>>> I think there are a lot of great ideas here. My one concern is the
>> lack
>>> of
>>>>> unload and thus presumably replace functionality. I'm just thinking
>>> about
>>>>> typical actual usage.
>>>>> 
>>>>> In a typical development cycle someone writes something, tries it,
>>> learns,
>>>>> changes it, and tries again. Assuming I understand the design that
>>> change
>>>>> step requires a full Drill cluster restart. That is going to be very
>>>>> disruptive and will make UDF work nearly impossible without a
>> dedicated
>>>>> "private" cluster for Drill. I realize that people should have access
>> to
>>>>> the data they need and Drill in a development cluster but even then
>>>>> restarts can be hard since development clusters are often shared - and
>>>>> that's assuming such a cluster exists. I realize of course Drill can
>> be
>>> run
>>>>> as a standalone Drillbit but I'm not convinced that desktops will have
>>>>> adequate access to the needed data.
>>>>> 
>>>>> Having dealt with Java classloading over the years, I'm not claiming
>>> class
>>>>> replacement is an easy thing so I'll defer to others on the priority
>> of
>>>>> that, but I'm wondering if there isn't some way to make UDF
>>> experimentation
>>>>> a bit easier/practical.
>>>>> 
>>>>> Given the above, let me toss out some possibly naive ideas that maybe
>>> are
>>>>> workable:
>>>>> * can I easily run a standalone Drillbit on a Hadoop cluster node that
>>> is
>>>>> already running Drill servers? I'm sure this can be done, but is it
>>> easy?
>>>>> Could we perhaps make this clearer as an explicit kind of thing?
>>>>> * is there a way that when I deploy a UDF I can constrain the # of
>> bits
>>> it
>>>>> is loaded into and perhaps even specify the bits?
>>>>> * Obvious correlarary is I'd want my query to run on those bits and a
>>>>> not too disruptive way to restart just those bits
>>>>> 
>>>>> The above may be obvious to Drill experts. If it is then perhaps the
>> UDF
>>>>> docs could just point out how to easily develop UDFs in an iterative
>>>>> fashion.
>>>>> 
>>>>> Keys
>>>>> _______________________________
>>>>> Keys Botzum
>>>>> Senior Principal Technologist
>>>>> [email protected] <mailto:[email protected]>
>>>>> 443-718-0098
>>>>> MapR Technologies
>>>>> http://www.mapr.com <http://www.mapr.com/>
>>> 
>>

Re: Dynamic UDFs support

Reply via email to