Re: [RESULT][Vote][LIVY-718] Support multi-active high availability in Livy

2019-12-29 Thread Meisam Fathi
I also have a few questions/comments, which I posted on the JIRA.

Thanks,
Meisam

On Sun, Dec 29, 2019 at 1:58 AM Bikas Saha  wrote:

> Sorry for coming late to this thread.
>
> I have put my comments on the jira ticket here -
> https://issues.apache.org/jira/browse/LIVY-718?focusedCommentId=17004728=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17004728
>
> It would be nice if we could consider the issues raised in the comments
> and see if it makes sense to accommodate for those in the design proposal.
>
> Thanks
> Bikas
>
> 
> From: Marco Gaido 
> Sent: Wednesday, December 18, 2019 10:52 PM
> To: dev@livy.incubator.apache.org 
> Subject: Re: [RESULT][Vote][LIVY-718] Support multi-active high
> availability in Livy
>
> Thank you for this proposal and your work. Looking forward to it.
> Thanks,
> Marco
>
> On Thu, 19 Dec 2019, 07:27 Yiheng Wang,  wrote:
>
> > Hi All
> >
> > Thanks for participating in the vote. Here's the result:
> > +1 (binding)
> > ajbozarth
> > zjffdu
> > mgaido91
> > jerryshao
> >
> > +1 (no-binding)
> > 5
> >
> > The vote passes. I will create subtasks for it.
> >
> > Thanks
> > Yiheng
> >
> > On Thu, Dec 12, 2019 at 11:30 AM Yiheng Wang  wrote:
> >
> > > Dear Community
> > >
> > > I'd like to call a vote on LIVY-718
> > > 
> "Support
> > > multi-active high availability in Livy".
> > >
> > > Currently, Livy only supports single node recovery. This is not
> > sufficient
> > > in some production environments. In our scenario, the Livy server
> serves
> > > many notebook and JDBC services. We want to make Livy service more
> > > fault-tolerant and scalable.
> > >
> > > There're already some proposals in the community for high availability.
> > > But they're not so complete or just for active-standby high
> availability.
> > > So we propose a multi-active high availability design to achieve the
> > > following goals:
> > >
> > >- One or more servers will serve the client requests at the same
> time.
> > >- Sessions are allocated among different servers.
> > >- When one node crashes, the affected sessions will be moved to
> other
> > >active services.
> > >
> > > Please find the design doc here
> > > <
> >
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
> > >,
> > > which has been reviewed for several weeks.
> > >
> > > This vote is open until next Wednesday (Dec. 18).
> > >
> > > [] +1: Accept the proposal
> > > [] +0
> > > [] -1: I don't think this is a good idea because ...
> > >
> > > Thank you
> > >
> > > Yiheng
> > >
> >
>


Re: Accessing Detailed Livy Session Information (session name?)

2019-04-15 Thread Meisam Fathi
Hi Peter,

Are you using ZooKeeper for recovery store?
If yes, in conf/livy.conf, is livy.server.recovery.zk-state-store.key-prefix
set to different values in different Livy instances? If not, all of Livy
instances will read/write the recovery data from/to the same path, which is
default is /livy/v1 by default.

@dev mailing list:
This behavior is not documented in livy.conf nor on the website. It might
be a good idea to document it somewhere.

Thanks,
Meisam

On Fri, Apr 12, 2019 at 3:20 PM Meisam Fathi  wrote:

> Hi Peter,
>
> Livy 0.6 has a new feature to give each session a name:
> https://github.com/apache/incubator-livy/pull/48
>
> Would this feature be useful in your usecase?
>
> Thanks,
> Meisam
>
> On Fri, Apr 12, 2019, 8:51 AM Peter Wicks (pwicks) 
> wrote:
>
>> Greetings,
>>
>>
>>
>> I have a custom service that connects to Livy, v0.4 soon to be v0.5 once
>> we go to HDP3. If sessions already exist it logs the session ID’s and
>> starts using them, if sessions don’t exist it creates new ones. The problem
>> is the account used to launch the Livy sessions is not unique to this
>> service, nor is the kind of session. So sometimes it grabs other people’s
>> sessions and absconds off with them. Also, there are multiple instances of
>> the service, running under the same account, and they are not supposed to
>> use each other’s sessions… that’s not working out so well.
>>
>>
>>
>> The service names the sessions, but I can’t find any way to retrieve
>> detailed session data so that I can update the service to check if the Livy
>> Session belongs to the service or not.
>>
>>
>>
>> I found some older comments 2016/2017 about retrieving Livy sessions by
>> name. I don’t really need that, I just want to be able to read the name
>> through the regular sessions REST call.
>>
>>
>>
>> Any REST calls I missed, or undocumented calls… that can help?
>>
>>
>>
>> Thanks,
>>
>>   Peter
>>
>>
>>
>> Ref:
>> https://github.com/meisam/livy/wiki/Design-doc-for-Livy-41:-Accessing-sessions-by-name,
>> https://issues.cloudera.org/browse/LIVY-41
>>
>>
>>
>>
>>
>


Re: the relation between session and sparkContext

2019-02-21 Thread Meisam Fathi
Use the session ID to submit statements to a shared Livy session.

POST /sessions/{sessionId}/statements


On Mon, Feb 18, 2019, 10:24 PM wangfei  wrote:

> Hi guys:
>
> I recently looked at the source code of livy about interactiveSession.
> I found that each interactiveSession need start a remoteSparkContext.
> So i have a question, how to share a remoteSparkContext for several
> interactiveSessions of the same user.
> Thanks!
> | |
> hzfeiwang
> |
> |
> hzfeiw...@163.com
> |
> 签名由网易邮箱大师定制
>
>


Re: Making travis logs less verbose

2019-02-11 Thread Meisam Fathi
I noticed a couple other things in .travis.yml, but I am not sure why they
are needed.

   - The base image is ubuntu:trusty. Can we update it to a later LTS
   version of ubuntu like Xenial?
   - Travis builds Livy twice: Is there a reason why two mvn builds are
   needed?

install:  - mvn $MVN_FLAG install -Dskip -DskipTests -DskipITs
-Dmaven.javadoc.skip=true -B -V
script:  - mvn $MVN_FLAG verify -e


On Mon, Feb 11, 2019 at 5:52 PM Saisai Shao  wrote:

> Problem is that there's no better way to get detailed log on travis without
> printing in out on screen. I was struggling on it when debugging on travis.
>
> Thanks
> Saisai
>
> Meisam Fathi  于2019年2月9日周六 下午12:19写道:
>
> > This may do the trick for maven
> >
> > mvn -Dorg.slf4j.simpleLogger.defaultLogLevel=warn ...
> >
> > Tanks,
> > Meisam
> >
> > On Fri, Feb 8, 2019 at 2:11 PM Marcelo Vanzin
>  > >
> > wrote:
> >
> > > If you know how to silence messages from the setup phase (apt / pip /
> > > git), go for it. Those seem kinda hidden by Travis, but maybe there's
> > > a setting I'm not familiar with.
> > >
> > > Maven also has a -B option that makes things a little less verbose in
> > > non-interactive terminals. I think -quiet might be a little overkill.
> > >
> > > On Thu, Feb 7, 2019 at 2:40 PM Meisam Fathi 
> > > wrote:
> > > >
> > > > Each build on travis generates 10K+ lines of log. Should we make
> build
> > > > commands less verbose by passing --quiet to them?
> > > >
> > > > As an example, apt-get installs and pip installs generate 3K+ lines
> on
> > > > their own. Maven generates another 6K+ lines of log, but I am not
> sure
> > if
> > > > scilencing maven is a good idea. Passing --quiet to Maven silences
> > scalac
> > > > warnging.
> > > >
> > > > Having said all of that, should we make travis logs less verbose? If
> > > yes, I
> > > > can send a PR.
> > > >
> > > > Thanks,
> > > > Meisam
> > >
> > >
> > >
> > > --
> > > Marcelo
> > >
> >
>


Re: Making travis logs less verbose

2019-02-08 Thread Meisam Fathi
This may do the trick for maven

mvn -Dorg.slf4j.simpleLogger.defaultLogLevel=warn ...

Tanks,
Meisam

On Fri, Feb 8, 2019 at 2:11 PM Marcelo Vanzin 
wrote:

> If you know how to silence messages from the setup phase (apt / pip /
> git), go for it. Those seem kinda hidden by Travis, but maybe there's
> a setting I'm not familiar with.
>
> Maven also has a -B option that makes things a little less verbose in
> non-interactive terminals. I think -quiet might be a little overkill.
>
> On Thu, Feb 7, 2019 at 2:40 PM Meisam Fathi 
> wrote:
> >
> > Each build on travis generates 10K+ lines of log. Should we make build
> > commands less verbose by passing --quiet to them?
> >
> > As an example, apt-get installs and pip installs generate 3K+ lines on
> > their own. Maven generates another 6K+ lines of log, but I am not sure if
> > scilencing maven is a good idea. Passing --quiet to Maven silences scalac
> > warnging.
> >
> > Having said all of that, should we make travis logs less verbose? If
> yes, I
> > can send a PR.
> >
> > Thanks,
> > Meisam
>
>
>
> --
> Marcelo
>


Making travis logs less verbose

2019-02-07 Thread Meisam Fathi
Each build on travis generates 10K+ lines of log. Should we make build
commands less verbose by passing --quiet to them?

As an example, apt-get installs and pip installs generate 3K+ lines on
their own. Maven generates another 6K+ lines of log, but I am not sure if
scilencing maven is a good idea. Passing --quiet to Maven silences scalac
warnging.

Having said all of that, should we make travis logs less verbose? If yes, I
can send a PR.

Thanks,
Meisam


Re: A new link request to my project and one question

2018-06-25 Thread Meisam Fathi
What is the use case for passing the proxy user to LivyClientBuilder?

On Fri, Jun 15, 2018 at 9:02 AM Marcelo Vanzin 
wrote:

> re: proxy user, you have to be extremely careful with that.
>
> Livy currently supports proxy user, but for the server only. It allows
> the server to impersonate anyone, so that sessions can run as the
> requesting user.
>
> If you let the user decide who the session will be run as, you'll need
> to add configuration, just as those available in HDFS, YARN, etc, to
> tell Livy which users can impersonate which other users. Otherwise
> you're basically making authentication meaningless.
>
>
> On Thu, Jun 14, 2018 at 7:36 PM, Saisai Shao 
> wrote:
> > Sure, I will merge the website code, thanks!
> >
> > For proxyUser thing, I think there's no particular reason not adding it,
> > maybe we just forgot to add the proxyUser support.
> >
> > It would be better if you could create a JIRA to track this issue. If
> > you're familiar with Livy code, you can also submit a PR about it.
> >
> > Thanks
> > Jerry
> >
> > Takeshi Yamamuro  于2018年6月15日周五 上午7:33写道:
> >
> >> Hi, Livy dev,
> >>
> >> I opened a new pr in incubator-livy-website to add a new link in
> >> third-party-projects.md. It'd be great if you could check this;
> >> https://github.com/apache/incubator-livy-website/pull/23
> >>
> >> Btw, I have one question; currently, we cannot pass proxyUser
> >> in LivyClientBuilder. Any reason not to add code for that?
> >> I know we can handle this in an application side by adding a bit code
> like
> >>
> >>
> https://github.com/maropu/spark-sql-server/blob/master/sql/sql-server/src/main/java/org/apache/livyclient/common/CreateClientRequestWithProxyUser.java
> >> But, If Livy itself supported this, it'd be nice to me.
> >>
> >> Best,
> >> takeshi
> >>
> >> --
> >> ---
> >> Takeshi Yamamuro
> >>
>
>
>
> --
> Marcelo
>


Re: Query on creating multiple livy sessions in parallel

2018-03-20 Thread Meisam Fathi
How many sessions are you creating? Have you tried to throttle down session
creation? What is the value for `livy.server.session.max-creation` in your
setup? Also check that you are not running out of resources (particularly
memory) on the Livy server node. Each session creation starts a new JVM
process which can easily take a lot of memory.

Thanks,
Meisam


On Mon, Mar 19, 2018 at 8:15 PM Saisai Shao  wrote:

> This might be a BUG. If possible can you please create a JIRA to track this
> issue. Thanks!
>
> Best,
> Jerry
>
> 2018-03-19 20:18 GMT+08:00 Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com>:
>
> > Hi,
> >
> > We're trying to create multiple livy sessions in parallel and then using
> > them. But when we try to create the sessions continuously, we're seeing
> > that few sessions are entering to dead state. We see the below exception
> in
> > the logs.
> >
> > 18/02/27 10:30:20 WARN RSCClient: Client RPC channel closed unexpectedly.
> > 18/02/27 10:30:20 WARN RSCClient: Error stopping RPC.
> > io.netty.util.concurrent.BlockingOperationException:
> > DefaultChannelPromise@7a828ea3(uncancellable)
> >at io.netty.util.concurrent.DefaultPromise.checkDeadLock(
> > DefaultPromise.java:390)
> >at io.netty.channel.DefaultChannelPromise.checkDeadLock(
> > DefaultChannelPromise.java:157)
> >at io.netty.util.concurrent.DefaultPromise.await(
> > DefaultPromise.java:251)
> >at io.netty.channel.DefaultChannelPromise.await(
> > DefaultChannelPromise.java:129)
> >at io.netty.channel.DefaultChannelPromise.await(
> > DefaultChannelPromise.java:28)
> >at io.netty.util.concurrent.DefaultPromise.sync(
> > DefaultPromise.java:218)
> >at io.netty.channel.DefaultChannelPromise.sync(
> > DefaultChannelPromise.java:117)
> >at io.netty.channel.DefaultChannelPromise.sync(
> > DefaultChannelPromise.java:28)
> >at com.cloudera.livy.rsc.rpc.Rpc.close(Rpc.java:307)
> >at
> com.cloudera.livy.rsc.RSCClient.stop(RSCClient.java:225)
> >at com.cloudera.livy.rsc.RSCClient$2$1.onSuccess(
> > RSCClient.java:122)
> >at com.cloudera.livy.rsc.RSCClient$2$1.onSuccess(
> > RSCClient.java:116)
> >at com.cloudera.livy.rsc.Utils$2.
> > operationComplete(Utils.java:108)
> >at
> io.netty.util.concurrent.DefaultPromise.notifyListener0(
> > DefaultPromise.java:680)
> >at
> io.netty.util.concurrent.DefaultPromise.notifyListeners(
> > DefaultPromise.java:567)
> >at io.netty.util.concurrent.DefaultPromise.trySuccess(
> > DefaultPromise.java:406)
> >at io.netty.channel.DefaultChannelPromise.trySuccess(
> > DefaultChannelPromise.java:82)
> >at io.netty.channel.AbstractChannel$CloseFuture.
> > setClosed(AbstractChannel.java:956)
> >at io.netty.channel.AbstractChannel$
> > AbstractUnsafe.doClose0(AbstractChannel.java:608)
> >at io.netty.channel.AbstractChannel$AbstractUnsafe.close(
> > AbstractChannel.java:586)
> >at io.netty.channel.nio.AbstractNioByteChannel$
> > NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
> >at io.netty.channel.nio.AbstractNioByteChannel$
> > NioByteUnsafe.read(AbstractNioByteChannel.java:158)
> >at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> > NioEventLoop.java:511)
> >at io.netty.channel.nio.NioEventLoop.
> > processSelectedKeysOptimized(NioEventLoop.java:468)
> >at io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> > NioEventLoop.java:382)
> >at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.
> > java:354)
> >at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> > run(SingleThreadEventExecutor.java:111)
> >at java.lang.Thread.run(Thread.java:748)
> > 18/02/27 10:30:20 DEBUG RSCClient: Disconnected from context
> > dad7c668-3c09-4ad2-9810-28f684c5ec49, shutdown = false.
> >
> > However, when we create the sessions one after the other (Create session
> 1
> > after session 0 is in Idle state), it works fine.
> > We wanted to know if there is any known restriction in livy for creating
> > multiple sessions in parallel.
> >
> > Thanks & Regards,
> > Abhishek
> >
> >
>


Re: Development in Intellij IDEA

2018-03-14 Thread Meisam Fathi
I managed to make intellij work with a few manual tweaks after importing
the project. I think I shared the steps somewhere (most likey Livy's
mailing list preior to Apache incubation). Let me see if I can find them.

Thanks,
Meisam

On Wed, Mar 14, 2018, 10:00 AM Alex Bozarth  wrote:

> Hey Alexy,
>
> I use Intellij IDEA for my Livy development and there are a few limitation
> that I have just had to get used to. You can use it to build, I do all my
> building on the command line, and due to Livy's multiple Scala version
> support you can't follow class links into any module that has a 2.10/2.11
> split. It's pretty frustrating, but the only solution is to stop supporting
> multiple scala versions in Livy or switch to sbt instead of maven, both of
> which aren't changes we can make. I got help from Marcelo on this back when
> I first joined the project, so you're not the first to hit these issues.
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* 
> *GitHub: **github.com/ajbozarth* 
>
>
> 505 Howard Street
> 
> San Francisco, CA 94105
> 
> United States
> 
>
>
>
> [image: Inactive hide details for Alexey Romanenko ---03/14/2018 06:38:58
> AM---Hello all, I’m quite new with Livy and I have a questio]Alexey
> Romanenko ---03/14/2018 06:38:58 AM---Hello all, I’m quite new with Livy
> and I have a question regarding Livy development using Intellij I
>
> From: Alexey Romanenko 
> To: dev@livy.incubator.apache.org
> Date: 03/14/2018 06:38 AM
> Subject: Development in Intellij IDEA
> --
>
>
>
>
> Hello all,
>
> I’m quite new with Livy and I have a question regarding Livy development
> using Intellij IDEA.
>
> I imported maven project (as usually) but I can’t build and, of course,
> run it directly in IDEA since it can’t find some Scala classes and
> interfaces, like Logging and Utils, that actually exist in other module
> (core). So, I have a compile error.
>
> In the same time, when I run command "mvn package" from console it works
> well. It seems like IDEA can’t resolve a question which Scala version to
> use (since Livy supports two of them: 2.10 and 2.11).
>
> So, my question is - how people, who use Intellij IDEA for development,
> overcame this issue? Is it a well known problem?
> I’d very appreciate for any hints about that.
>
> Thank you,
> Alexey
>
>
>


Re: spark-submit command

2018-03-09 Thread Meisam Fathi
Supporting multiple version of Spark (e.g. 1.6 and 2.1) for batch jobs is
easy. But supporting multiple Spark versions for *interactive sessions*
needs major changes in Livy (and possibly in Spark). The main reason is
that, for batch jobs, purely user application code runs on the Spark/Spark
cluster. But for interactive sessions, parts of the Livy code run on the
Spark/YARN cluster. If Livy is compiled against a particular major version
of Spark (say 2.1.0), it cannot run interactive sessions on a different
Spark version (say 1.6). I am interested to know how we can get around this
restriction.

Thanks,
Meisam

On Fri, Mar 9, 2018 at 9:50 AM Marcelo Vanzin  wrote:

> On Fri, Mar 9, 2018 at 1:36 AM, Matteo Durighetto
>  wrote:
> >   I think it's correct that the Livy Admin manages the multiple
> > version of spark, but the user need to choose what version to use
> > to submit the job.
> ...
> > So a Livy Admin could manage the configuration and a Datascience or a Dev
> > could submit the job calling the "alias" ( i.e. spak_1.6 or spark_2.1 or
> > spark 2.2 ) to a different spark / java
> > and test different env for they applications or machine learning project.
>
> That sounds closer to what I had in mind originally. User asks for a
> specific version of Spark using a name defined by the admin, instead
> of providing an explicit SPARK_HOME env variable or something like
> that.
>
>
> --
> Marcelo
>


Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Meisam Fathi
> If we're using session name, how do we guarantee the uniqueness of this
> name?
>

If the requested session name already exist, Livy returns an error and does
not create the session.

Thanks,
Meisam


Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Meisam Fathi
+ dev
Is there any interest in adding this feature to Livy? I can send a PR

Ideally, it would be helpful if we could mint a session ID with a PUT
> request, something like PUT /sessions/foobar, where "foobar" is the newly
> created sessionId.
>
> I suggest we make session names unique and nonnumeric values (to guarantee
a session name does not clash with another session name or session ID).

Design doc:
https://github.com/meisam/incubator-livy/wiki/Design-doc-for-Livy-41:-Accessing-sessions-by-name
JIRA ticket: https://issues.apache.org/jira/browse/LIVY-41


Thanks,
Meisam


Re: Help to verify Apache Livy 0.4.0-incubating release

2017-08-22 Thread Meisam Fathi
The version of org.apache.httpcomponents:httpclient is different in
/pom.xml from the version in /client-http/pom.xml

/pom.xml --->  ${httpclient.version} ---> 4.5.2
/client-http/pom.xml --->  4.5.1

Is this intended?

Thanks,
Meisam

On Thu, Aug 17, 2017 at 12:34 AM, Saisai Shao 

> Hi all,
>
>
> We're under progress to make a first Apache release of Livy
>
> (0.4.0-incubating), we really hope you could verify the RC2[1] release
>
> (binary and source) locally and return us the feedbacks.
>
[1]https://dist.apache.org/repos/dist/dev/incubator/livy/0.
>


Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
Hi Marcelo,


> I'm not really familiar with how multi-node HA was implemented (I
> stopped at session recovery), but why isn't a single server doing the
> update and storing the results in ZK? Unless it's actually doing
> load-balancing, it seems like that would avoid multiple servers having
> to hit YARN.
>

We considered having one server update ZooKeeper, but the extra benefits
that we would get from polling yarn fewer times is not worth the extra
complexity needed to implement it. For example, we would have to make
servers aware of each other, and aware of each others failures. We would've
needed a voting mechanism to select a new leader to update ZooKeeper each
time the current leader had a failure. Also rolling out updates would be
tricker with servers that are aware of each other.


Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
> Just an FYI, apache mailing lists cant share attachments. If you could
> please upload the files to another file sharing site and include links
> instead.
>
Thanks for the information. I added the files to the JIRA ticket and put
the contents of the previous email as a comment. Here are the links to the
ticket and to the files:

JIRA ticket: https://issues.apache.org/jira/browse/LIVY-336
Time to complete REST calls to YARN:
https://issues.apache.org/jira/secure/attachment/12882985/transfer_time_bar_plot.png
Trends in time to complete REST calls to YARN:
https://issues.apache.org/jira/secure/attachment/12882984/transfer_time_line_plot.png
Size of response from REST calls to YARN:
https://issues.apache.org/jira/secure/attachment/12882983/size_downloaded_line_plot.png

Also, should we move the discussion to JIRA now that it is up and running?

Thanks,
Meisam


Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
I forgot to attach the first chart. Sorry about that.

[image: transfer_time_bar_plot.png]

Thanks,
Meisam

On Mon, Aug 21, 2017 at 12:21 PM Meisam Fathi <meisam.fa...@gmail.com>
wrote:

> Bottom line up front:
> 1. The cost of calling 1 individual REST calls is about two order of
> magnitude higher than calling a single batch REST call (1 * 0.05
> seconds vs. 1.4 seconds)
> 2. Time to complete a batch REST call plateaus at about 10,000 application
> reports per call.
>
> Full story:
> I experimented and measure how long it takes to fetch Application Reports
> from YARN with the REST API. My objective was to compare doing a batch REST
> call to get all ApplicationReports vs doing individual REST calls for each
> Application Report.
>
> I did the tests on 4 different cluster: 1) a test cluster, 2) a moderately
> used dev cluster, 3) a lightly used production cluster, and 4) a heavily
> used production cluster. For each cluster I made 7 REST call to get 1, 10,
> 100, 1000, 1, 10, 100 application reports respectively. I
> repeated each call 200 times to count for variations and I reported the
> median time.
> To measure the time, I used the following curl command:
>
> $ curl -o /dev/null -s -w "@curl-output-fromat.json" "http://
> $rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=$applicationTypes=$limit"
>
> The attached charts show the results. In all the charts, the x axis show
> the number of results that were request in the call.
> The bar chart show the time it takes to complete a REST call on each
> cluster.
> The first line plot also shows the same results as the bar chart on a log
> scale (it is easier to see that the time to complete the REST call plateaus
> at 10,000
> The last chart shows the size of data that is being downloaded on each
> REST call, which explains why the time plateaus  at 10,000.
>
>
> [image: transfer_time_bar_plot.png][image: transfer_time_line_plot.png][image:
> size_downloaded_line_plot.png]
>
>>
>>
> Thanks,
> Meisam
>


Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
Bottom line up front:
1. The cost of calling 1 individual REST calls is about two order of
magnitude higher than calling a single batch REST call (1 * 0.05
seconds vs. 1.4 seconds)
2. Time to complete a batch REST call plateaus at about 10,000 application
reports per call.

Full story:
I experimented and measure how long it takes to fetch Application Reports
from YARN with the REST API. My objective was to compare doing a batch REST
call to get all ApplicationReports vs doing individual REST calls for each
Application Report.

I did the tests on 4 different cluster: 1) a test cluster, 2) a moderately
used dev cluster, 3) a lightly used production cluster, and 4) a heavily
used production cluster. For each cluster I made 7 REST call to get 1, 10,
100, 1000, 1, 10, 100 application reports respectively. I
repeated each call 200 times to count for variations and I reported the
median time.
To measure the time, I used the following curl command:

$ curl -o /dev/null -s -w "@curl-output-fromat.json" "http://
$rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=$applicationTypes=$limit"

The attached charts show the results. In all the charts, the x axis show
the number of results that were request in the call.
The bar chart show the time it takes to complete a REST call on each
cluster.
The first line plot also shows the same results as the bar chart on a log
scale (it is easier to see that the time to complete the REST call plateaus
at 10,000
The last chart shows the size of data that is being downloaded on each REST
call, which explains why the time plateaus  at 10,000.


[image: transfer_time_bar_plot.png][image: transfer_time_line_plot.png][image:
size_downloaded_line_plot.png]

>
>
Thanks,
Meisam


Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
Hi Nan,

In the highlighted line
>
> https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75
>
> I assume that it will get the reports of all applications in YARN, even
> they are finished?


That's right. That line will return reports for all Spark Applications,
even applications that completed a long time ago. For us YARN retains
reports for a few thousand completed applications (not a big concern).

Livy needs to get the reports for applications that finished recently, but
I didn't find an API in YARN 2.7 to get those only reports.

Thanks,
Meisam

>
>

> On Wed, Aug 16, 2017 at 12:25 PM, Meisam Fathi <meisam.fa...@gmail.com>
> wrote:
>
> > Hi Nan,
> >
> >
> > >
> > > my question related to the undergoing discussion is simply "have you
> seen
> > > any performance issue in
> > >
> > > https://github.com/apache/incubator-livy/pull/36/files#diff-
> > a3f879755cfe10a678cc08ddbe60a4d3R75
> > > ?
> > > <https://github.com/apache/incubator-livy/pull/36/files#diff-
> > a3f879755cfe10a678cc08ddbe60a4d3R75?>
> > > "
> > >
> > > The short answer is yes. This PR fixes one part of the scalability
> > problem, which is, it prevents Livy from creating many
> > yarnAppMinotorThreads. But the two other parts are still there
> >
> > 1. one call to spark-submit for each application
> > 2. once thread that waits for the exit code of spark-submit.
> >
> > Out of these two problems, calling one spark-submit per application is
> the
> > biggest problem, but it can be solved by adding more Livy servers. We
> > modified Livy so if an application status changes on one Livy instance,
> all
> > other Livy instances get the updated information about the application.
> > From users' perspective, this is transparent because users just see the
> > load balancer.
> >
> > So, refactoring the yarn poll mechanism + a load balancer and a grid of
> > Livy servers fixed the scalability issue.
> >
> > On the performance of the code itself, we have not had an issue. The time
> > consuming parts in the code are calls to YARN and not filtering and
> > updating the data structures. On memory usage, this all needs less than
> 1GB
> > at peak time.
> >
> > I hope this answers your question.
> >
> > Thanks,
> > Meisam
> >
> >
> > > We have several scenarios that a large volume of applications are
> > submitted
> > > to YARN every day and it easily accumulates a lot to be fetched with
> this
> > > call
> > >
> > > Best,
> > >
> > > Nan
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi <meisam.fa...@gmail.com
> >
> > > wrote:
> > >
> > > > Here are my two pennies on both designs (actor-based design vs.
> > > > single-thread polling design)
> > > >
> > > > *Single-thread polling design*
> > > > We implemented a single-thread polling mechanism for Yarn here at
> > PayPal.
> > > > Our solution is more involved because we added many new features to
> > Livy
> > > > that we had to consider when we refactored Livy's YARN interface. But
> > we
> > > > are willing to hammer our changes so it suits the need of the Livy
> > > > community best :-)
> > > >
> > > > *Actor-based design*
> > > > It seems to me that the proposed actor based design (
> > > > https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P-
> > > > nbTQTdDFXl2XQhXDiwA/edit)
> > > > needs a few more messages and actors. Here is why.
> > > > Livy makes three (blocking) calls to YARN
> > > > 1. `yarnClient.getApplications`, which gives Livy `ApplicatioId`s
> > > > 2. `yarnClient.getApplicationAttemptReport(ApplicationId)`, which
> > gives
> > > > Livy `getAMContainerId`
> > > > 3. `yarnClient.getContainerReport`, which gives Livy tracking URLs
> > > >
> > > > The result of the previous call is needed to make the next call. The
> > > > proposed actor system needs to be designed to handles all these
> > blocking
> > > > calls.
> > > >
> > > > I do agree that actor based design is cleaner and more maintainable.
> > But
> > > we
> > > > had to discard it because it adds more dependencies to Livy. We faced
> > too
> > > > many dependency-version-mismatc

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
That is true, but I was under the impression that this will be implemented
with Akka (maybe because it is mentioned in the design doc).

On Wed, Aug 16, 2017 at 11:21 AM Marcelo Vanzin <van...@cloudera.com> wrote:

> On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi <meisam.fa...@gmail.com>
> wrote:
> > I do agree that actor based design is cleaner and more maintainable. But
> we
> > had to discard it because it adds more dependencies to Livy.
>
> I've been reading "actor system" as a design pattern, not as
> introducing a new dependency to Livy.
>
> If the document is actually proposing using Akka (instead of just
> using Akka as an example of an actor system implementation), then I'm
> a -1 on that.
>
> --
> Marcelo
>