I understand you want to write your own job engine. But why not use one that already exists?
Given that you mention quartz, it sounds like Aurora might be a good fit. Why not use it? On Tue, Jan 13, 2015 at 3:34 AM, Zhou, Qianhao <[email protected]> wrote: > What we want is that: > > 1. A lightweight job engine, easy to start, stop and check jobs > Because most of the heavyweight job is map-reduce which is already > running on the cluster, so we don’t need the job engine to run on a > cluster. > > 2. Kylin already has a job engine based on Quartz, however, only a very > small > part of functionalities are used, so we can easily replace it with > standard java api. > Thus there will be no extra dependency which means easier to deploy. > > Currently a very simple job engine implementation will meet the kylin’s > needs. > So I think at this timing just keep it simple would be the better choice. > > > Best Regard > Zhou QianHao > > > > > > On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote: > > >So why are the following systems unsuitable? > > > >- mesos + (aurora or chronos) > >- spark > >- yarn > >- drill's drillbits > > > >These options do different things. I know that. I am not entirely clear > >on what you want, however, so I present these different options so that > >you > >can tell me better what you want. > > > >Mesos provides very flexible job scheduling. With Aurora, it has support > >for handling long-running and periodic jobs. With Chronos, it has the > >equivalent of a cluster level cron. > > > >Spark provides the ability for a program to spawn lots of parallel > >execution. This is different than what most people mean by job > >scheduling, > >but in conjunction with a queuing system combined with spark streaming, > >you > >can get remarkably close to a job scheduler. > > > >Yarn can run jobs, but has no capabilities to schedule recurring jobs. It > >can adjudicate the allocation of cluster resources. This is different > >from > >what either spark or mesos does. > > > >Drill's drillbits do scheduling of queries across a parallel execution > >environment. It currently has no user impersonation, but does do an > >interesting job of scheduling parts of parallel queries. > > > >Each of these could be considered like a job scheduler. Only a very few > >are likely to be what you are talking about. > > > >Which is it? > > > > > > >
