What we want is that: 1. A lightweight job engine, easy to start, stop and check jobs Because most of the heavyweight job is map-reduce which is already running on the cluster, so we don’t need the job engine to run on a cluster.
2. Kylin already has a job engine based on Quartz, however, only a very small part of functionalities are used, so we can easily replace it with standard java api. Thus there will be no extra dependency which means easier to deploy. Currently a very simple job engine implementation will meet the kylin’s needs. So I think at this timing just keep it simple would be the better choice. Best Regard Zhou QianHao On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote: >So why are the following systems unsuitable? > >- mesos + (aurora or chronos) >- spark >- yarn >- drill's drillbits > >These options do different things. I know that. I am not entirely clear >on what you want, however, so I present these different options so that >you >can tell me better what you want. > >Mesos provides very flexible job scheduling. With Aurora, it has support >for handling long-running and periodic jobs. With Chronos, it has the >equivalent of a cluster level cron. > >Spark provides the ability for a program to spawn lots of parallel >execution. This is different than what most people mean by job >scheduling, >but in conjunction with a queuing system combined with spark streaming, >you >can get remarkably close to a job scheduler. > >Yarn can run jobs, but has no capabilities to schedule recurring jobs. It >can adjudicate the allocation of cluster resources. This is different >from >what either spark or mesos does. > >Drill's drillbits do scheduling of queries across a parallel execution >environment. It currently has no user impersonation, but does do an >interesting job of scheduling parts of parallel queries. > >Each of these could be considered like a job scheduler. Only a very few >are likely to be what you are talking about. > >Which is it? > > > > >On Tue, Jan 13, 2015 at 1:53 AM, Zhou, Qianhao <[email protected]> wrote: > >> The goal of this job engine is that: >> Provide unified interface for all job execution, query. >> Here job can be for example Kylin query, Building Cube, GC etc. >> As the old job engine is hard to support jobs other than Building Cube, >> I think it is mandatory before we introduce new realization of data >>model, >> such as inverted-index. >> >> Best Regard >> Zhou QianHao >> >> >> >> >> >> On 1/13/15, 3:42 PM, "Ted Dunning" <[email protected]> wrote: >> >> >What is the goal of this job engine? >> > >> >To just run Kylin queries? >> > >> > >> > >> >On Tue, Jan 13, 2015 at 12:31 AM, Henry Saputra >><[email protected]> >> >wrote: >> > >> >> I believe we do not care about Spark client APIs for the distributed >> >> execution engine, so I would recommend to take a look also at Apache >> >> Flink [1]. >> >> >> >> Similar to Spark, it has execution engine that could run standalone >>or >> >> on YARN as DAG. >> >> But since we want to focus mostly on backend, it has some special >> >> features like built-in iteration operator, heap memory management, >>and >> >> also cost optimizer for execution plan. >> >> >> >> - Henry >> >> >> >> [1] http://flink.apache.org/ >> >> >> >> On Mon, Jan 12, 2015 at 10:17 PM, Li Yang <[email protected]> wrote: >> >> > Agree. We shall proceed to refactor the job engine. It needs to be >> >>more >> >> > extensible and friendly to add new jobs and steps. This is a >> >>prerequisite >> >> > for Kylin to explore other opportunities for faster cube build, >>like >> >> Spark >> >> > and >> >> > >> >> > Please update with finer designs. >> >> > >> >> > On Fri, Jan 9, 2015 at 10:07 AM, 周千昊 <[email protected]> wrote: >> >> > >> >> >> Currently Kylin has its own Job Engine to schedule cubing process. >> >> However >> >> >> there are some demerits >> >> >> 1. It is too tightly couple with cubing process, thus cannot >>support >> >> other >> >> >> kind of jobs easily >> >> >> 2. It is hard to expand or to integrate with other techniques (for >> >> example >> >> >> Spark) >> >> >> Thus I have proposed a refactor for the current job engine. >> >> >> Below is the wiki page in Github >> >> >> >> https://github.com/KylinOLAP/Kylin/wiki/%5BProposal%5D-New-Job-Engine >> >> >> >> >> >> >>
