Re: Best practices of maintaining a long running SparkContext

Mich Talebzadeh Tue, 08 Mar 2016 13:39:06 -0800

Hi,

I have recently started experimenting with Zeppelin and run it on TCP port
21999 (configurable in zeppelin-env.sh). The daemon seems to be stable.
However, I have noticed that it goes stale from time to time and also
killing the UI does not stop the job properly. Sometime it is also
necessary to kill the connections at OS level as "zeppelin-daemon.sh stop"
does not stop it. The logs are pretty informative. There is a spark-context
created whenever you start a new jon on UI as below


------ Create new SparkContext local[*] -------
[Stage 5:==>                                                 (688 + 12) /
12544]

The interpreter screen is pretty useful for changing the configuration
parameters. To be honest I am not sure how many concurrent Spark context
one can run. In my case only one process runs and the rest are pending in
the queue until the one running completes

I would be interested to know how many concurrent jobs you can run through
UI.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 20:41, Zhong Wang <wangzhong....@gmail.com> wrote:

> +spark-users
>
> We are using Zeppelin (http://zeppelin.incubator.apache.org) as our UI to
> run spark jobs. Zeppelin maintains a long running SparkContext, and we run
> into a couple of issues:
> --
> 1. Dynamic resource allocation keeps removing and registering executors,
> even though no jobs are running
> 2. EventLogging doesn't work due to HDFS lease issue. Similar to this:
> https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3ccae6kwsp_c00gksmnx0obu5aouxphdjs-syqywt-jfi3psvc...@mail.gmail.com%3E
> 3. SparkUI is getting slower due to large number of history jobs
> 4. Cached data is gone mystically (shown in the Storage page, but not in
> the Executor page)
>
> The aim of this thread is not resolve specific issues (though any ideas on
> the listed issue will be welcome), but to hear suggestions about the best
> practices of maintaining a long running SparkContext from both the Spark
> and Zeppelin community.
>
> Thanks,
> Zhong
>
> On Tue, Mar 8, 2016 at 11:13 AM, Zhong Wang <wangzhong....@gmail.com>
> wrote:
>
>> Thanks for your insights, Deenar. I think this is really helpful to users
>> who want to run Zeppelin as a service.
>>
>> The caching issue we experienced seems to be a Spark bug, because I see
>> some inconsistent states through the SparkUI, but thanks for pointing out
>> the potential reasons.
>>
>> I am still interested in for the people who run Zeppelin as a service,
>> whether you have experienced bugs or memory leaks, and how did you deal
>> with these.
>>
>> Thanks!
>>
>> Zhong
>>
>> On Tue, Mar 8, 2016 at 8:17 AM, Deenar Toraskar <
>> deenar.toras...@gmail.com> wrote:
>>
>>> 1) You should turn dynamic allocation on see
>>> http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
>>> to maximise utilisation of your cluster resources. This might be a reason
>>> you are seeing cached data disappearing.
>>> 2) If other processes cache data and the amount of data cached is larger
>>> than your cluster memory, Spark will evict some cached data from memory.
>>> 3) If you are using Kerberos authentication, you need a process that
>>> renews tickets.
>>>
>>> Deenar
>>>
>>> On 8 March 2016 at 01:35, Zhong Wang <wangzhong....@gmail.com> wrote:
>>>
>>>> Hi zeppelin-users,
>>>>
>>>> Because Zeppelin relies on a long running SparkContext, it is quite
>>>> important to make it stable to improve availability. From my experience, I
>>>> run into a couple of issues if I run a SparkContext for several days,
>>>> including:
>>>> --
>>>> 1. EventLoggong doest work due to HDFS lease issue. Similar to this:
>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3ccae6kwsp_c00gksmnx0obu5aouxphdjs-syqywt-jfi3psvc...@mail.gmail.com%3E
>>>> 2. SparkUI is getting slower due to large number of history jobs
>>>> 3. Cached data is gone mystically
>>>>
>>>> They may not be Zeppelin issues, but I would like to hear the problems
>>>> you run into, and your experience of how to deal with maintaining a long
>>>> running SparkContext.
>>>>
>>>> I know that we can do some cleanups periodically by restarting the
>>>> spark interpreter, but I am wondering whether there are better ways.
>>>>
>>>> Thanks!
>>>>
>>>> Zhong
>>>>
>>>
>>>
>>
>

Re: Best practices of maintaining a long running SparkContext

Reply via email to