maybe rename this module to "builder" will more make sense:-) 2015-01-15 8:09 GMT+08:00 Henry Saputra <[email protected]>:
> I believe the job engine here is a cube builder which is the component > that manages submissions to different distributed platform (MR, Flink, > Spark) that actually execute the jobs in different machine. > It primary function is to manage the "job" submission and act as > reverse proxy for status, scheduling, and metadata access for the > operations. > > I had worked similar like this in my pref role =) > > - Henry > > On Wed, Jan 14, 2015 at 9:40 AM, Julian Hyde <[email protected]> > wrote: > > Still worth considering an existing tool. The simplest code is the code > you don’t maintain. :) > > > > On Jan 14, 2015, at 2:57 AM, Li Yang <[email protected]> wrote: > > > >> Sorry I'm late, just a recap. > >> > >> The "Job Engine" here only manages long running tasks lifecycle and > >> dependencies, it oversees task sequences, like cube build is made up of > >> several mapreduces, and allow user to start/stop/pause/resume. > >> > >> It does not do scheduling or fancy workflow, that's why many existing > >> products like quartz or oozie overkill. We want keep Kylin overall > >> architecture simple and be easy to deploy and debug. > >> > >> The purpose of this refactoring is to separate the manager role and the > >> worker role which previous impl mixed up. Once done, replacing a worker > >> shall become easy. We will be free to explore other cube building > workers, > >> like Flink and Spark mentioned. > >> > >> Cheers > >> Yang > >> > >> On Wed, Jan 14, 2015 at 10:08 AM, Zhou, Qianhao <[email protected]> > wrote: > >> > >>> Thanks Ted for the advice. > >>> I think the right way to do is to take more options into consideration, > >>> then make decision. > >>> Whichever solution is used, we are going to learn something that will > >>> benefit sooner or later. > >>> > >>> Best Regard > >>> Zhou QianHao > >>> > >>> > >>> > >>> > >>> > >>> On 1/14/15, 12:37 AM, "Ted Dunning" <[email protected]> wrote: > >>> > >>>> OK. > >>>> > >>>> On Tue, Jan 13, 2015 at 10:30 AM, 周千昊 <[email protected]> wrote: > >>>> > >>>>> As I mentioned, we don't want extra dependency because that will make > >>>>> the > >>>>> deployment more complex. > >>>>> As for Aurora, the users will have an extra step for installation. > >>>>> However > >>>>> so far, kylin will only need a war package and a hadoop cluster. > >>>>> On Tue Jan 13 2015 at 10:26:50 PM Ted Dunning <[email protected] > > > >>>>> wrote: > >>>>> > >>>>>> I understand you want to write your own job engine. But why not use > >>>>> one > >>>>>> that already exists? > >>>>>> > >>>>>> Given that you mention quartz, it sounds like Aurora might be a good > >>>>> fit. > >>>>>> Why not use it? > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Jan 13, 2015 at 3:34 AM, Zhou, Qianhao <[email protected]> > >>>>> wrote: > >>>>>> > >>>>>>> What we want is that: > >>>>>>> > >>>>>>> 1. A lightweight job engine, easy to start, stop and check jobs > >>>>>>> Because most of the heavyweight job is map-reduce which is > >>>>> already > >>>>>>> running on the cluster, so we don’t need the job engine to run > >>>>> on a > >>>>>>> cluster. > >>>>>>> > >>>>>>> 2. Kylin already has a job engine based on Quartz, however, only a > >>>>> very > >>>>>>> small > >>>>>>> part of functionalities are used, so we can easily replace it > >>>>> with > >>>>>>> standard java api. > >>>>>>> Thus there will be no extra dependency which means easier to > >>>>> deploy. > >>>>>>> > >>>>>>> Currently a very simple job engine implementation will meet the > >>>>> kylin’s > >>>>>>> needs. > >>>>>>> So I think at this timing just keep it simple would be the better > >>>>> choice. > >>>>>>> > >>>>>>> > >>>>>>> Best Regard > >>>>>>> Zhou QianHao > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote: > >>>>>>> > >>>>>>>> So why are the following systems unsuitable? > >>>>>>>> > >>>>>>>> - mesos + (aurora or chronos) > >>>>>>>> - spark > >>>>>>>> - yarn > >>>>>>>> - drill's drillbits > >>>>>>>> > >>>>>>>> These options do different things. I know that. I am not > entirely > >>>>>> clear > >>>>>>>> on what you want, however, so I present these different options so > >>>>> that > >>>>>>>> you > >>>>>>>> can tell me better what you want. > >>>>>>>> > >>>>>>>> Mesos provides very flexible job scheduling. With Aurora, it has > >>>>>> support > >>>>>>>> for handling long-running and periodic jobs. With Chronos, it has > >>>>> the > >>>>>>>> equivalent of a cluster level cron. > >>>>>>>> > >>>>>>>> Spark provides the ability for a program to spawn lots of parallel > >>>>>>>> execution. This is different than what most people mean by job > >>>>>>>> scheduling, > >>>>>>>> but in conjunction with a queuing system combined with spark > >>>>> streaming, > >>>>>>>> you > >>>>>>>> can get remarkably close to a job scheduler. > >>>>>>>> > >>>>>>>> Yarn can run jobs, but has no capabilities to schedule recurring > >>>>> jobs. > >>>>>> It > >>>>>>>> can adjudicate the allocation of cluster resources. This is > >>>>> different > >>>>>>>> from > >>>>>>>> what either spark or mesos does. > >>>>>>>> > >>>>>>>> Drill's drillbits do scheduling of queries across a parallel > >>>>> execution > >>>>>>>> environment. It currently has no user impersonation, but does do > >>>>> an > >>>>>>>> interesting job of scheduling parts of parallel queries. > >>>>>>>> > >>>>>>>> Each of these could be considered like a job scheduler. Only a > >>>>> very > >>>>> few > >>>>>>>> are likely to be what you are talking about. > >>>>>>>> > >>>>>>>> Which is it? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >>> > > >
