No disagreement on deferral but I raised my initial concern precisely because I'm concerned about the practicality of the "restart the cluster" option. I sighted my concerns about laptops and development clusters. I was wondering if there might be some small things Drill could do to help. If there is nothing that can be done to make this easier, so be it, but I think that's going to be a big impedance.
Keys _______________________________ Keys Botzum Senior Principal Technologist [email protected] <mailto:[email protected]> 443-718-0098 MapR Technologies http://www.mapr.com <http://www.mapr.com/> > On Jul 22, 2016, at 1:37 AM, Neeraja Rentachintala > <[email protected]> wrote: > > It seems like we are reaching a conclusion here in terms of starting with a > simpler implementation i.e being able to deploy UDFs dynamically without > Drillbit restarts based off a jars in DFS location. Dropping functions > dynamically is out of scope for version 1 of this feature (we assume > development of UDFs is happening on user laptop or a dev cluster where its > ok to have restart). > > -Neeraja > > On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum <[email protected]> wrote: > >> Recognize the difficulty. Not suggesting this be addressed in first >> version. Just suggesting some thought about how a real user will >> workaround. Maybe some doc and/or small changes can make this easier. >> >> Keys >> _______________________________ >> Keys Botzum >> Senior Principal Technologist >> [email protected] >> 443-718-0098 >> MapR Technologies >> http://www.mapr.com >> On Jul 21, 2016 1:45 PM, "Paul Rogers" <[email protected]> wrote: >> >>> Hi All, >>> >>> Adding a dynamic DROP would, of course, be a great addition! The reason >>> for suggesting we skip that was to control project scope. >>> >>> Dynamic DROP requires a synchronization step. Here’s the scenario: >>> >>> * Foreman A starts a query using UDF U. >>> * Foreman B receives a request to drop UDF U, followed by a request to >> add >>> a new version of U, U’. >>> >>> How do we drop a function that may be in use? There are some tricky bits >>> to work out, which seemed too overwhelming to consider all in one go. >>> >>> Clearly just dropping U and adding a new version of U with the same name >>> leads to issues if not synchronized. If a Drillbit D is running a query >>> with U when it receives notice to drop U, should D complete the query or >>> fail it? If the query completes, then how does D deal with the request to >>> register U’, which has the same name? >>> >>> Do we globally synchronize function deletion? (The foreman B that >> receives >>> the drop request waits for all queries using U to finish.) But, how do we >>> know which queries use U? >>> >>> An eventually consistent approach is to track the age of the oldest >>> running query. Suppose B drops U at time T. Any query received after T >> that >>> uses U will fail in planning. A new U’ can’t be registered until all >>> queries that started before T complete. >>> >>> The primary challenge we face in both the CREATE and DROP cases is that >>> Drill is distributed with little central coordination. That’s great for >>> scale, but makes it hard to design features that require coordination. >> Some >>> other tools solve this problem with a data dictionary (or “metastore"). >>> Alas, Drill does not have such a concept. So a seemingly simple feature >>> like dynamic UDF becomes a major design challenge to get right. >>> >>> Thanks, >>> >>> - Paul >>> >>>> On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala < >>> [email protected]> wrote: >>>> >>>> The whole point of this feature is to avoid Drill cluster restarts as >> the >>>> name indicates 'Dynamic' UDFs. >>>> So any design that requires restarts I would think would beat the >>> purpose. >>>> >>>> I also think this is an example of a feature we start with a simple >>> design >>>> to serve the purpose, take feedback on how it is being deployed/used in >>>> real user situations and improve it in subsequent releases. >>>> >>>> -thanks >>>> Neeraja >>>> >>>> On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <[email protected]> >>> wrote: >>>> >>>>> I think there are a lot of great ideas here. My one concern is the >> lack >>> of >>>>> unload and thus presumably replace functionality. I'm just thinking >>> about >>>>> typical actual usage. >>>>> >>>>> In a typical development cycle someone writes something, tries it, >>> learns, >>>>> changes it, and tries again. Assuming I understand the design that >>> change >>>>> step requires a full Drill cluster restart. That is going to be very >>>>> disruptive and will make UDF work nearly impossible without a >> dedicated >>>>> "private" cluster for Drill. I realize that people should have access >> to >>>>> the data they need and Drill in a development cluster but even then >>>>> restarts can be hard since development clusters are often shared - and >>>>> that's assuming such a cluster exists. I realize of course Drill can >> be >>> run >>>>> as a standalone Drillbit but I'm not convinced that desktops will have >>>>> adequate access to the needed data. >>>>> >>>>> Having dealt with Java classloading over the years, I'm not claiming >>> class >>>>> replacement is an easy thing so I'll defer to others on the priority >> of >>>>> that, but I'm wondering if there isn't some way to make UDF >>> experimentation >>>>> a bit easier/practical. >>>>> >>>>> Given the above, let me toss out some possibly naive ideas that maybe >>> are >>>>> workable: >>>>> * can I easily run a standalone Drillbit on a Hadoop cluster node that >>> is >>>>> already running Drill servers? I'm sure this can be done, but is it >>> easy? >>>>> Could we perhaps make this clearer as an explicit kind of thing? >>>>> * is there a way that when I deploy a UDF I can constrain the # of >> bits >>> it >>>>> is loaded into and perhaps even specify the bits? >>>>> * Obvious correlarary is I'd want my query to run on those bits and a >>>>> not too disruptive way to restart just those bits >>>>> >>>>> The above may be obvious to Drill experts. If it is then perhaps the >> UDF >>>>> docs could just point out how to easily develop UDFs in an iterative >>>>> fashion. >>>>> >>>>> Keys >>>>> _______________________________ >>>>> Keys Botzum >>>>> Senior Principal Technologist >>>>> [email protected] <mailto:[email protected]> >>>>> 443-718-0098 >>>>> MapR Technologies >>>>> http://www.mapr.com <http://www.mapr.com/> >>> >>
