Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Jeff Zhang
1. Zeppelin-3563 force FAIR scheduling and just allow to specify the pool
2. scheduler can not to figure out the dependencies between paragraphs.
That's why SparkInterpreter use FIFOScheduler.
If you use per user scoped mode. SparkContext is shared between users but
SparkInterpreter is not shared. That means there's multiple
SparkInterpreter instances that share the same SparkContext but they
doesn't share the same FIFOScheduler, each SparkInterpreter use its own
FIFOScheduler.

Ankit Jain 于2018年7月25日周三 下午12:58写道:

> Thanks for the quick feedback Jeff.
>
> Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may
> want to force FAIR execution instead of letting user control it.
>
> Re:2 - Is there an architecture issue here or we just need better thread
> safety? Ideally scheduler should be able to figure out the dependencies and
> run whatever can be parallel.
>
> Re:Interpreter mode, I may not have been clear but we are running per user
> scoped mode - so Spark context is shared among all users.
>
> Doesn't that mean all jobs from different users go to one FIFOScheduler
> forcing all small jobs to block on a big one? That is specifically we are
> trying to avoid.
>
> Thanks
> Ankit
>
> On Tue, Jul 24, 2018 at 5:40 PM, Jeff Zhang  wrote:
>
>> Regarding 1.  ZEPPELIN-3563 should be helpful. See
>> https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently
>> for more details.
>> https://issues.apache.org/jira/browse/ZEPPELIN-3563
>>
>> Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may
>> hit weird issues if your paragraph has dependency between each other. e.g.
>> paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
>> order of paragraph execution matters here, and ParallelScheduler can
>> not guarantee the order of execution.
>> That's why we use FIFOScheduler for SparkInterpreter.
>>
>> In your scenario where multiple users share the same sparkcontext, I
>> would suggest you to use scoped per user mode. Then each user will share
>> the same sparkcontext which means you can save resources, and also they are
>> in each FIFOScheduler which is isolated from each other.
>>
>> Ankit Jain 于2018年7月25日周三 上午8:14写道:
>>
>>> Forgot to mention this is for shared scoped mode, so same Spark
>>> application and context for all users on a single Zeppelin instance.
>>>
>>> Thanks
>>> Ankit
>>>
>>> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
>>>
>>> Hi,
>>> I am playing around with execution policy of Spark jobs(and all Zeppelin
>>> paragraphs actually).
>>>
>>> Looks like there are couple of control points-
>>> 1) Spark scheduling - FIFO vs Fair as documented in
>>> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
>>> .
>>>
>>> Since we are still on .7 version and don't have
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully
>>> doing sc.setLocalProperty("spark.scheduler.pool", "fair");
>>> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>>>
>>> Also because we are exposing Zeppelin to multiple users we may not
>>> actually want users to hog the cluster and always use FAIR.
>>>
>>> This may complicate our merge to .8 though.
>>>
>>> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
>>> have a scheduler queue. Each task is submitted to a FIFOScheduler except
>>> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
>>> is turned on.
>>>
>>> I am changing SparkInterpreter.java to use ParallelScheduler too and
>>> that seems to do the trick.
>>>
>>> Now multiple notebooks are able to run in parallel.
>>>
>>> My question is if other people have tested SparkInterpreter with 
>>> ParallelScheduler?
>>> Also ideally this should be configurable. User should be specify fifo or
>>> parallel.
>>>
>>> Executing all paragraphs does add more complication and maybe
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep
>>> the execution order sane.
>>>
>>>
>>> Thoughts?
>>>
>>> --
>>> Thanks & Regards,
>>> Ankit.
>>>
>>>
>
>
> --
> Thanks & Regards,
> Ankit.
>


Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Thanks for the quick feedback Jeff.

Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may
want to force FAIR execution instead of letting user control it.

Re:2 - Is there an architecture issue here or we just need better thread
safety? Ideally scheduler should be able to figure out the dependencies and
run whatever can be parallel.

Re:Interpreter mode, I may not have been clear but we are running per user
scoped mode - so Spark context is shared among all users.

Doesn't that mean all jobs from different users go to one FIFOScheduler
forcing all small jobs to block on a big one? That is specifically we are
trying to avoid.

Thanks
Ankit

On Tue, Jul 24, 2018 at 5:40 PM, Jeff Zhang  wrote:

> Regarding 1.  ZEPPELIN-3563 should be helpful. See
> https://github.com/apache/zeppelin/blob/master/docs/
> interpreter/spark.md#running-spark-sql-concurrently
> for more details.
> https://issues.apache.org/jira/browse/ZEPPELIN-3563
>
> Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may
> hit weird issues if your paragraph has dependency between each other. e.g.
> paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
> order of paragraph execution matters here, and ParallelScheduler can
> not guarantee the order of execution.
> That's why we use FIFOScheduler for SparkInterpreter.
>
> In your scenario where multiple users share the same sparkcontext, I would
> suggest you to use scoped per user mode. Then each user will share the same
> sparkcontext which means you can save resources, and also they are in each
> FIFOScheduler which is isolated from each other.
>
> Ankit Jain 于2018年7月25日周三 上午8:14写道:
>
>> Forgot to mention this is for shared scoped mode, so same Spark
>> application and context for all users on a single Zeppelin instance.
>>
>> Thanks
>> Ankit
>>
>> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
>>
>> Hi,
>> I am playing around with execution policy of Spark jobs(and all Zeppelin
>> paragraphs actually).
>>
>> Looks like there are couple of control points-
>> 1) Spark scheduling - FIFO vs Fair as documented in
>> https://spark.apache.org/docs/2.1.1/job-scheduling.
>> html#fair-scheduler-pools.
>>
>> Since we are still on .7 version and don't have https://issues.apache.
>> org/jira/browse/ZEPPELIN-3563, I am forcefully doing sc.setLocalProperty(
>> "spark.scheduler.pool", "fair");
>> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>>
>> Also because we are exposing Zeppelin to multiple users we may not
>> actually want users to hog the cluster and always use FAIR.
>>
>> This may complicate our merge to .8 though.
>>
>> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
>> have a scheduler queue. Each task is submitted to a FIFOScheduler except
>> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
>> is turned on.
>>
>> I am changing SparkInterpreter.java to use ParallelScheduler too and
>> that seems to do the trick.
>>
>> Now multiple notebooks are able to run in parallel.
>>
>> My question is if other people have tested SparkInterpreter with 
>> ParallelScheduler?
>> Also ideally this should be configurable. User should be specify fifo or
>> parallel.
>>
>> Executing all paragraphs does add more complication and maybe
>>
>> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep
>> the execution order sane.
>>
>>
>> Thoughts?
>>
>> --
>> Thanks & Regards,
>> Ankit.
>>
>>


-- 
Thanks & Regards,
Ankit.


Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Jeff Zhang
Regarding 1.  ZEPPELIN-3563 should be helpful. See
https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently
for more details.
https://issues.apache.org/jira/browse/ZEPPELIN-3563

Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may hit
weird issues if your paragraph has dependency between each other. e.g.
paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
order of paragraph execution matters here, and ParallelScheduler can
not guarantee the order of execution.
That's why we use FIFOScheduler for SparkInterpreter.

In your scenario where multiple users share the same sparkcontext, I would
suggest you to use scoped per user mode. Then each user will share the same
sparkcontext which means you can save resources, and also they are in each
FIFOScheduler which is isolated from each other.

Ankit Jain 于2018年7月25日周三 上午8:14写道:

> Forgot to mention this is for shared scoped mode, so same Spark
> application and context for all users on a single Zeppelin instance.
>
> Thanks
> Ankit
>
> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
>
> Hi,
> I am playing around with execution policy of Spark jobs(and all Zeppelin
> paragraphs actually).
>
> Looks like there are couple of control points-
> 1) Spark scheduling - FIFO vs Fair as documented in
> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
> .
>
> Since we are still on .7 version and don't have
> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully
> doing sc.setLocalProperty("spark.scheduler.pool", "fair");
> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>
> Also because we are exposing Zeppelin to multiple users we may not
> actually want users to hog the cluster and always use FAIR.
>
> This may complicate our merge to .8 though.
>
> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
> have a scheduler queue. Each task is submitted to a FIFOScheduler except
> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
> is turned on.
>
> I am changing SparkInterpreter.java to use ParallelScheduler too and that
> seems to do the trick.
>
> Now multiple notebooks are able to run in parallel.
>
> My question is if other people have tested SparkInterpreter with 
> ParallelScheduler?
> Also ideally this should be configurable. User should be specify fifo or
> parallel.
>
> Executing all paragraphs does add more complication and maybe
>
> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the
> execution order sane.
>
>
> Thoughts?
>
> --
> Thanks & Regards,
> Ankit.
>
>


Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Forgot to mention this is for shared scoped mode, so same Spark application and 
context for all users on a single Zeppelin instance.

Thanks
Ankit

> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
> 
> Hi,
> I am playing around with execution policy of Spark jobs(and all Zeppelin 
> paragraphs actually).
> 
> Looks like there are couple of control points-
> 1) Spark scheduling - FIFO vs Fair as documented in 
> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools.
> 
> Since we are still on .7 version and don't have 
> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully doing 
> sc.setLocalProperty("spark.scheduler.pool", "fair");
> in both SparkInterpreter.java and SparkSqlInterpreter.java.
> 
> Also because we are exposing Zeppelin to multiple users we may not actually 
> want users to hog the cluster and always use FAIR.
> 
> This may complicate our merge to .8 though.
> 
> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to have 
> a scheduler queue. Each task is submitted to a FIFOScheduler except 
> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag 
> is turned on.
> 
> I am changing SparkInterpreter.java to use ParallelScheduler too and that 
> seems to do the trick.
> 
> Now multiple notebooks are able to run in parallel.
> 
> My question is if other people have tested SparkInterpreter with 
> ParallelScheduler? Also ideally this should be configurable. User should be 
> specify fifo or parallel.
> 
> Executing all paragraphs does add more complication and maybe
> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the 
> execution order sane.
> 
> Thoughts?
> 
> -- 
> Thanks & Regards,
> Ankit.


Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Hi,
I am playing around with execution policy of Spark jobs(and all Zeppelin
paragraphs actually).

Looks like there are couple of control points-
1) Spark scheduling - FIFO vs Fair as documented in
https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
.

Since we are still on .7 version and don't have
https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully doing
sc.setLocalProperty("spark.scheduler.pool", "fair");
in both SparkInterpreter.java and SparkSqlInterpreter.java.

Also because we are exposing Zeppelin to multiple users we may not actually
want users to hog the cluster and always use FAIR.

This may complicate our merge to .8 though.

2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
have a scheduler queue. Each task is submitted to a FIFOScheduler except
SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
is turned on.

I am changing SparkInterpreter.java to use ParallelScheduler too and that
seems to do the trick.

Now multiple notebooks are able to run in parallel.

My question is if other people have tested SparkInterpreter with
ParallelScheduler?
Also ideally this should be configurable. User should be specify fifo or
parallel.

Executing all paragraphs does add more complication and maybe

https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the
execution order sane.


Thoughts?

-- 
Thanks & Regards,
Ankit.


Re: Zeppelin distributed architecture design

2018-07-24 Thread Jongyoul Lee
Thank you.

I fully agree with you that we need a framework to support distributed
version. IMHO, we cannot afford to develop our own. I'll dig into atomix as
well.



On Tue, Jul 24, 2018 at 1:57 PM, liuxun  wrote:

> @Jongyoul Lee:
> Thank you for your attention.
>
> Indeed, as you said, the `Copycat` project has been closed and has been
> migrated to `https://github.com/atomix/atomix`
> .
>
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at
> the time, and it was not considered too long.
>
> Today, I took a look at the documentation of atomix,
> https://atomix.io/docs/latest/user-manual/ ,
> which has a lot of features, such as broadcasting messages in the cluster,
> detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to
> use atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is
> not difficult to modify.
>
> Struggle for zeppelin!!! :-)
>
>
> 在 2018年7月24日,上午9:35,Jongyoul Lee  写道:
>
> First of all, thank you for your effort and contribution.
>
> I read it carefully today, and personally, it's a very nice feature and
> idea.
>
> Let's discuss it and improve more concretely. I also left comments on the
> doc.
>
> And I have a simple question.
>
> `Copycat`, which you used to implement it, is deprecated by owner[1] and
> moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
> have any reason to use this library? It's even SNAPSHOT version.
>
> Regards,
> JL
>
> [1]: https://github.com/atomix/copycat
>
> On Sat, Jul 21, 2018 at 2:07 AM, liuxun  wrote:
>
> HI:
>
> In order to more intuitively express the actual use of distributed
> zeppelin clusters.
> I updated this design document, starting with the 16th page of the
> document, adding 2 GIF animations showing the operation record screen of
> the zeppelin cluster we are using now.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit#  1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>
> Distributed clustered zeppelin is already in use at our company, and the
> recorded screens are all real.
> The first recorded screens GIF shows the following
> Create a cluster of three zeppelin servers
> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
> zeppelin-site.xml to create a cluster
> Start these 3 servers at the same time
> Open the web pages of these 3 servers and prepare for the notebook
> operation.
>
>
> The second recorded screens GIF shows the following
> Create an interpreter process in the cluster
> Create a notebook on host234 and execute it, This action will create an
> interpreter process in the server with free resources in the cluster.
> You can then continue editing this notebook on host235 and execute it, You
> can return results immediately without waiting for the time to create an
> interpreter process.
> Again, you can continue to edit this notebook on host236. And execute it,
> you can return results immediately without waiting for the time to create
> the interpreter process
> The same notebook will reuse the first created interpreter process, so you
> can get the execution result immediately on any server.
> By looking at the background server process, you will find that host234,
> host235, and host235 use the same interpreter process for the same
> notebook.
>
> Originally, I wanted to record the interpreter process exception. The
> cluster re-created the screenshot of the interpreter process in the idle
> server, but I am too tired now.
> There is time to record later.
>
>
> 在 2018年7月19日,上午7:36,Ruslan Dautkhanov  写道:
>
> Thank you luxun,
>
> I left a couple of comments in that google document.
>
> --
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 11:30 PM liuxun 
> neliu...@163.com>> wrote:
>
> hi,Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added
>
> 3 schematics to illustrate.
>
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
>
> document and updated it with Google Docs.
>
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>
> VDKCRRBm-Qa3Bw/edit?usp=sharing  1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>
>
>
> 在 2018年7月18日,下午1:03,liuxun mailto:neliu...@163.com>>
>
> 写道:
>
>
> hi,Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I
>
> added 3 schematics to illustrate.
>
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic