Re: Proposal for new Job Engine

Zhou, Qianhao Tue, 13 Jan 2015 01:35:39 -0800

What we want is that:

1. A lightweight job engine, easy to start, stop and check jobs
   Because most of the heavyweight job is map-reduce which is already
   running on the cluster, so we don’t need the job engine to run on a
cluster.


2. Kylin already has a job engine based on Quartz, however, only a very
small 
   part of functionalities are used, so we can easily replace it with
   standard java api.
   Thus there will be no extra dependency which means easier to deploy.

Currently a very simple job engine implementation will meet the kylin’s
needs. 
So I think at this timing just keep it simple would be the better choice.


Best Regard
Zhou QianHao





On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote:

>So why are the following systems unsuitable?
>
>- mesos + (aurora or chronos)
>- spark
>- yarn
>- drill's drillbits
>
>These options do different things.  I know that.  I am not entirely clear
>on what you want, however, so I present these different options so that
>you
>can tell me better what you want.
>
>Mesos provides very flexible job scheduling.  With Aurora, it has support
>for handling long-running and periodic jobs.  With Chronos, it has the
>equivalent of a cluster level cron.
>
>Spark provides the ability for a program to spawn lots of parallel
>execution.  This is different than what most people mean by job
>scheduling,
>but in conjunction with a queuing system combined with spark streaming,
>you
>can get remarkably close to a job scheduler.
>
>Yarn can run jobs, but has no capabilities to schedule recurring jobs.  It
>can adjudicate the allocation of cluster resources.  This is different
>from
>what either spark or mesos does.
>
>Drill's drillbits do scheduling of queries across a parallel execution
>environment.  It currently has no user impersonation, but does do an
>interesting job of scheduling parts of parallel queries.
>
>Each of these could be considered like a job scheduler.  Only a very few
>are likely to be what you are talking about.
>
>Which is it?
>
>
>
>
>On Tue, Jan 13, 2015 at 1:53 AM, Zhou, Qianhao <[email protected]> wrote:
>
>> The goal of this job engine is that:
>> Provide unified interface for all job execution, query.
>> Here job can be for example Kylin query, Building Cube, GC etc.
>> As the old job engine is hard to support jobs other than Building Cube,
>> I think it is mandatory before we introduce new realization of data
>>model,
>> such as inverted-index.
>>
>> Best Regard
>> Zhou QianHao
>>
>>
>>
>>
>>
>> On 1/13/15, 3:42 PM, "Ted Dunning" <[email protected]> wrote:
>>
>> >What is the goal of this job engine?
>> >
>> >To just run Kylin queries?
>> >
>> >
>> >
>> >On Tue, Jan 13, 2015 at 12:31 AM, Henry Saputra
>><[email protected]>
>> >wrote:
>> >
>> >> I believe we do not care about Spark client APIs for the distributed
>> >> execution engine, so I would recommend to take a look also at Apache
>> >> Flink [1].
>> >>
>> >> Similar to Spark, it has execution engine that could run standalone
>>or
>> >> on YARN as DAG.
>> >> But since we want to focus mostly on backend, it has some special
>> >> features like built-in iteration operator, heap memory management,
>>and
>> >> also cost optimizer for execution plan.
>> >>
>> >> - Henry
>> >>
>> >> [1] http://flink.apache.org/
>> >>
>> >> On Mon, Jan 12, 2015 at 10:17 PM, Li Yang <[email protected]> wrote:
>> >> > Agree. We shall proceed to refactor the job engine. It needs to be
>> >>more
>> >> > extensible and friendly to add new jobs and steps. This is a
>> >>prerequisite
>> >> > for Kylin to explore other opportunities for faster cube build,
>>like
>> >> Spark
>> >> > and
>> >> >
>> >> > Please update with finer designs.
>> >> >
>> >> > On Fri, Jan 9, 2015 at 10:07 AM, 周千昊 <[email protected]> wrote:
>> >> >
>> >> >> Currently Kylin has its own Job Engine to schedule cubing process.
>> >> However
>> >> >> there are some demerits
>> >> >> 1. It is too tightly couple with cubing process, thus cannot
>>support
>> >> other
>> >> >> kind of jobs easily
>> >> >> 2. It is hard to expand or to integrate with other techniques (for
>> >> example
>> >> >> Spark)
>> >> >> Thus I have proposed a refactor for the current job engine.
>> >> >> Below is the wiki page in Github
>> >> >>
>> https://github.com/KylinOLAP/Kylin/wiki/%5BProposal%5D-New-Job-Engine
>> >> >>
>> >>
>>
>>

Re: Proposal for new Job Engine

Reply via email to