Re: livy and security concerns

2019-10-31 Thread Meisam Fathi
No, it does not. The code will run on a Spark session. The sandboxing
should be supported, at least partially, by Spark. But as far as I know,
Spark dies not have such a feature.

Thanks,
Meisam

On Wed, Oct 30, 2019, 5:02 PM mhd wrk  wrote:

> Considering that Livy supports Interactive Scala or Python, does it
> provides any sand-boxing feature to protect the back-end against submitted
> code?
>


Re: What if Livy Server dies?

2019-06-01 Thread Meisam Fathi
In the comments bellow, by batch I mean a Livy job that is not interactive.
This is not to be confused with batch vs streaming jobs in Spark. A Livy
batch job could be a Spark streaming or batch job. A Livy interactive job
could be a Spark batch or streaming job as well:

On Wed, May 29, 2019 at 5:23 PM Ravindra Chandrakar <
ravindra.chandra...@gmail.com> wrote:

> Hello,
>
> I would like to understand following.
>
> 1. What if livy server dies? What will happen to existing jobs? Will they
> still be in running state in spark cluster? If yes, how to track their
> status?
>

If HA is enabled and the jobs are running on YARN, Livy recovers the jobs
on restart.

> 2. Is there High Availability mode deployment available for Apache Livy
> server like secondary Livy server or something?
>
There is not primary-secondary HA, but Livy can recover jobs after a crash
or restart.


> 3. Can we submit more than one job using batches api? If yes, is there any
> limit on upper number? How to submit more than one job using single batches
> api call?
>
Yes. Each jobs gets its own unique ID. The upper limit for now is about 2
billion (When Integer.MaxValue overflows) if the jobs are not submitted in
a burst, otherwise Livy has a configurable rate limiter.


> 4. If multiple job submission in single batches api call is allowed then
>
1. Are those jobs going to run in parallel or sequential?
> 2. Can i define dependency between these jobs?
>
Each Livy batch job is independent of other batch jobs. Livy is not aware
of dependencies between batch (or interactive) jobs. Dependencies should be
handled by the user outside Livy.

Each batch job runs

> 5. How can i debug a job that I've submitted using batches API?
>
> Start with Spark UI or YARN Resource Manager.


>
> Thanks,
> Ravindra Chandrakar
>


Re: Livy with scala spark examples and documents

2019-05-26 Thread Meisam Fathi
Hi Amit,

I am not sure I understand your usecase, but Kafka and Livy do completely
different things and one cannot be replaced by the other.
Can you give more details on how you are using Kafka and Spark?

For documentation, please check the Livy website:
http://livy.incubator.apache.org/docs/latest/programmatic-api.html

Thanks,
Meisam

On Sun, May 26, 2019 at 9:48 AM Amit Sharma  wrote:

> I have spark standalone cluster. We are currenly using Kafka to interact
> with spark job. I want to replace kafka and want to use Livy. Currently i
> do not see any scala example how to submit spark job also i my case my
> spark job is ready kafka streaming which is nothing but a input for spark
> job. Please provide some documents for scala with livy or books or examples.
>
>
> Thanks
> Amit
>


Re: Using external libraries in request

2019-05-04 Thread Meisam Fathi
Hi Argenis,

What do you exactly mean by every "every request"? Do you mean every
interactive session?

Thanks,
Meisam

On Sat, May 4, 2019 at 11:46 AM Argenis Leon  wrote:

> Hi,
> I want to use a library  https://github.com/ironmussa/optimus in every
> request, but I don't want to instantiate it every time.
>
> Is there a way to make Optimus persistent in every request?
>
> --
>
> Ing. Argenis Leon, Dr.
> CDMX, Mexico
> Movil: 525541356726
>


Re: Code stops working after a few executions

2019-04-23 Thread Meisam Fathi
What is the status of the job in Livy or in YARN?
Also, can you share your code, please?

Thanks,
Meisam

On Tue, Apr 23, 2019 at 9:08 PM Peter Wicks (pwicks) 
wrote:

> I am working in Livy v0.4 with Python.  I’m using sessions.  If I run the
> same Python code over and over again it will work around four times, then
> on the next time I get the error:
>
>
>
> Unexpected character ('#' (code 35)): expected a valid value (number,
> String, array, object, 'true', 'false' or 'null')
>
> at [Source: #; line: 1, column: 2]
>
>
>
> There aren’t any #’s in my code, and the code is identical between runs
> anyways… I made it identical following user reports of runs failing with
> this error message to try and reproduce the error.
>
>
>
> Once this error starts appearing the session is useless, running code on
> it again does not succeed, though the status still shows as idle.
>
>
>
> Any ideas on what might be going on here?
>
>
>
> Thanks!
>
>   Peter
>


Re: Livy with Standalone Spark Master

2019-04-20 Thread Meisam Fathi
Yes. Livy works with standalone Spark, except for the recovery features.
The recovery features need Yarn.

Thanks,
Meisam

On Sat, Apr 20, 2019, 2:48 PM Pat Ferrel  wrote:

> Does Livy work with a Standalone Spark Master?
>
>


Re: Accessing Detailed Livy Session Information (session name?)

2019-04-15 Thread Meisam Fathi
Hi Peter,

Are you using ZooKeeper for recovery store?
If yes, in conf/livy.conf, is livy.server.recovery.zk-state-store.key-prefix
set to different values in different Livy instances? If not, all of Livy
instances will read/write the recovery data from/to the same path, which is
default is /livy/v1 by default.

@dev mailing list:
This behavior is not documented in livy.conf nor on the website. It might
be a good idea to document it somewhere.

Thanks,
Meisam

On Fri, Apr 12, 2019 at 3:20 PM Meisam Fathi  wrote:

> Hi Peter,
>
> Livy 0.6 has a new feature to give each session a name:
> https://github.com/apache/incubator-livy/pull/48
>
> Would this feature be useful in your usecase?
>
> Thanks,
> Meisam
>
> On Fri, Apr 12, 2019, 8:51 AM Peter Wicks (pwicks) 
> wrote:
>
>> Greetings,
>>
>>
>>
>> I have a custom service that connects to Livy, v0.4 soon to be v0.5 once
>> we go to HDP3. If sessions already exist it logs the session ID’s and
>> starts using them, if sessions don’t exist it creates new ones. The problem
>> is the account used to launch the Livy sessions is not unique to this
>> service, nor is the kind of session. So sometimes it grabs other people’s
>> sessions and absconds off with them. Also, there are multiple instances of
>> the service, running under the same account, and they are not supposed to
>> use each other’s sessions… that’s not working out so well.
>>
>>
>>
>> The service names the sessions, but I can’t find any way to retrieve
>> detailed session data so that I can update the service to check if the Livy
>> Session belongs to the service or not.
>>
>>
>>
>> I found some older comments 2016/2017 about retrieving Livy sessions by
>> name. I don’t really need that, I just want to be able to read the name
>> through the regular sessions REST call.
>>
>>
>>
>> Any REST calls I missed, or undocumented calls… that can help?
>>
>>
>>
>> Thanks,
>>
>>   Peter
>>
>>
>>
>> Ref:
>> https://github.com/meisam/livy/wiki/Design-doc-for-Livy-41:-Accessing-sessions-by-name,
>> https://issues.cloudera.org/browse/LIVY-41
>>
>>
>>
>>
>>
>


Re: Accessing Detailed Livy Session Information (session name?)

2019-04-12 Thread Meisam Fathi
Hi Peter,

Livy 0.6 has a new feature to give each session a name:
https://github.com/apache/incubator-livy/pull/48

Would this feature be useful in your usecase?

Thanks,
Meisam

On Fri, Apr 12, 2019, 8:51 AM Peter Wicks (pwicks) 
wrote:

> Greetings,
>
>
>
> I have a custom service that connects to Livy, v0.4 soon to be v0.5 once
> we go to HDP3. If sessions already exist it logs the session ID’s and
> starts using them, if sessions don’t exist it creates new ones. The problem
> is the account used to launch the Livy sessions is not unique to this
> service, nor is the kind of session. So sometimes it grabs other people’s
> sessions and absconds off with them. Also, there are multiple instances of
> the service, running under the same account, and they are not supposed to
> use each other’s sessions… that’s not working out so well.
>
>
>
> The service names the sessions, but I can’t find any way to retrieve
> detailed session data so that I can update the service to check if the Livy
> Session belongs to the service or not.
>
>
>
> I found some older comments 2016/2017 about retrieving Livy sessions by
> name. I don’t really need that, I just want to be able to read the name
> through the regular sessions REST call.
>
>
>
> Any REST calls I missed, or undocumented calls… that can help?
>
>
>
> Thanks,
>
>   Peter
>
>
>
> Ref:
> https://github.com/meisam/livy/wiki/Design-doc-for-Livy-41:-Accessing-sessions-by-name,
> https://issues.cloudera.org/browse/LIVY-41
>
>
>
>
>


Re: Run spark application in different clusters

2019-02-21 Thread Meisam Fathi
Currently this feature is not available in Livy.
The values of HADOOP_CONF_DIR and YARN_CONF_DIR are read and set when Livy
starts and never change.

Thanks,
Meisam

On Thu, Feb 21, 2019, 12:47 AM Praveen Muthusamy 
wrote:

> Hi,
>
> Currently HADOOP_CONF_DIR and YARN_CONF_DIR are used to find out the yarn
> on which the application will be run.
> Can a single instance of livy server interface multiple yarn/hadoop
> clusters? Is it possible to do spark submit to two different hadoop
> clusters from a single livy server?
>
> Regards,
> Praveen M
>


Re: Retain livy batch session info longer

2018-12-13 Thread Meisam Fathi
I propose the following configurations to support this feature:

# How long to retain an inactive interactive session before cleaning it up
livy.server.session.timeout

# How long to retain an interactive session that ran successfully
livy.server.session.success.retaintion

# How long to retain an interactive session that did NOT run successfully
livy.server.session.failure.retaintion

# How long to retain an batch session that ran successfully
livy.server.batch.success.retaintion

# How long to retain an batch session that did NOT ran successfully
livy.server.batch.failure.retaintion

We added a variation of these configs to Livy here at PayPal, which helps
us manage our Spark jobs with more control. Particularly, these configs
help us clean up interactive sessions created on Notebooks more
efficiently. They also help us clean up failed jobs quicker.

If there is an interest, I can send a PR.

On Tue, Dec 11, 2018 at 4:11 AM Praveen Muthusamy 
wrote:

> Hi,
>
> Is there any setting in livy server that we can keep, so that the batch
> session info will be retained longer ? When a batch application is
> submitted and gets successful, that session is cleared, there is no way for
> a external program to get the status of this batch session post that.
>
> Regards,
> Praveen M
>


Re: Use existing SparkSession in POST/batches request

2018-11-16 Thread Meisam Fathi
Hi Shubham!

POST/sessions is to create a session and submit statements to it. Once the
session is up, it can accept as many statements as needed. Users can
interactively submit new statements to it.
POST/batches is to send one Spark application. Anything that the job needs
to run is submit with the POST request (jar files, or a python script, or).
Once the job is submitted, users cannot interactively send new statements
to the application to run.

The only interesting case is when you submit a streaming job with
POST/batches which runs forever (or until it fails/dies).

Thanks,
Meisam

On Fri, Nov 16, 2018 at 4:43 AM Shubham Gupta 
wrote:

> Hi committers and contributors
>
> I have this query since long, which I think I know the answer to, but am
> still looking for confirmation.
>
> Can I make use of the SparkSession that I created using POST/sessions request
> for submitting my Spark job using POST/batches request?
>
> Here's the link  to my complete
> (elaborated) question on StackOverflow.
> *Luqman Ghani'*s response
> 
> already hints that this isn't possible, but
>
>- Is it on the roadmap?
>- Any workarounds?
>
>
> Thanks
>
> *Shubham Gupta*
> Software Engineer
>  zomato
>


Re: Some questions about RscDriver

2018-07-16 Thread Meisam Fathi
>
>
> Within an interactive session, Livy communicate with Spark through RPC.
> According to some architecture diagrams, LivyServer has a RscClient, and
> Spark has a RscDriver. My understanding is that RscDriver is one of
> components belonging to Spark, and RscDriver has existed before the Livy
> project was created. Is that right?
>
> No. RscDriver and RscClient are both part of Livy. RscDriver starts the
Spark Driver. RscDriver stars a Spark context, gets the statements from
RscClient, runs them on SparkContext, and returns the results to RscClient.


Thanks,
Meisam

>


Re: Can livy execute code when session is in busy state?

2018-06-25 Thread Meisam Fathi
If you create a new session, the session will get a new Spark Context and
will run completely in isolation from the first session.

Thanks,
Meisam

On Fri, Jun 22, 2018 at 5:24 PM JF Chen  wrote:

> so I need to create a new session if resource is enough?
>
> Regard,
> Junfeng Chen
>
>
> On Fri, Jun 22, 2018 at 12:35 PM Saisai Shao 
> wrote:
>
>> No, busy means currently there's job running in Spark, so the follow-up
>> code will wait until the previous job is done.
>>
>> JF Chen  于2018年6月22日周五 上午11:53写道:
>>
>>> Can livy execute code when session  is in busy state?
>>>
>>> Regard,
>>> Junfeng Chen
>>>
>>


Re: job execution logs

2018-06-25 Thread Meisam Fathi
If you are using YARN the driver logs are always available at YARN resource
manager.
You can also see the logs from /sessions/{sessionid}/logs

Thanks,
Meisam

On Sun, Jun 24, 2018 at 1:37 PM Abbass  wrote:

> Guys any idea about where Livy keeps job execution logs (Spark driver
> logs) ?
>
> I saw a couple of lines here :
>
> https://github.com/apache/incubator-livy/blob/master/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala#L90
> to redirect stdout & stderr somewhere.
>
> Is there anyway to persist these logs on the filesystem ?
>
> Thanks,
>
> As a recipient of an email from Talend, your contact personal data will be
> on our systems. Please see our contacts privacy notice at Talend, Inc. <
> https://www.talend.com/contacts-privacy-policy/>
>
>
>


Re: What happens if Livy server crashes ? All the spark jobs are gone?

2018-03-20 Thread Meisam Fathi
If you are running on cluster mode, the application should keep running on
YRAN.

On Tue, Mar 20, 2018 at 3:34 PM kant kodali  wrote:

> @Meisam Fathi I am running with yarn and zookeeper as a state store. I
> spawned a job via livy that reads from kafka and writes to Kafka
> but the moment I kill the livy server the job also is getting killed. not
> sure why? I believe once the livy server crashes the spark context also
> get's killed so do I need to need to set the livy.spark.deploy.mode ? if
> so, what value should I set it to?
>
>
> On Mon, Mar 12, 2018 at 12:30 PM, Meisam Fathi 
> wrote:
>
>> On YARN, your application keeps running even if the launcher fails. So
>> after recovery, Livy reconnects to the application. On Spark standalone, I
>> am not sure what happens to the application of the launcher fails.
>>
>> Thanks,
>> Meisam
>>
>> On Mon, Mar 12, 2018 at 10:34 AM kant kodali  wrote:
>>
>>> can someone please explain how YARN helps here? And why not spark master?
>>>
>>> On Mon, Mar 12, 2018 at 3:41 AM, Matteo Durighetto <
>>> m.durighe...@miriade.it> wrote:
>>>
>>>>
>>>>
>>>> 2018-03-12 9:58 GMT+01:00 kant kodali :
>>>>
>>>>> Sorry I see there is a recovery mode and also I can set state store to
>>>>> zookeeper but looks like I need YARN? because I get the error message 
>>>>> below
>>>>>
>>>>> "requirement failed: Session recovery requires YARN"
>>>>>
>>>>>
>>>>> I am using spark standalone and I don't use YARN anywhere in my
>>>>> cluster. is there any other option for recovery in this case?
>>>>>
>>>>>
>>>>> On Sun, Mar 11, 2018 at 11:57 AM, kant kodali 
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> When my live server crashes it looks like all my spark jobs are gone.
>>>>>> I am trying to see how I can make it more resilient? other words, I would
>>>>>> like spark jobs that were spawned by Livy to be running even if my Livy
>>>>>> server crashes because in theory Livy server can crash anytime and Spark
>>>>>> Jobs should run for weeks or months in my case. How can I achieve this?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>> Hello,
>>>>  to enable recovery in Livy you need Spark on YARN
>>>>
>>>> ( https://spark.apache.org/docs/latest/running-on-yarn.html )
>>>>
>>>>
>>>>
>>>> Kind Regards
>>>>
>>>
>>>
>


Re: What happens if Livy server crashes ? All the spark jobs are gone?

2018-03-12 Thread Meisam Fathi
On YARN, your application keeps running even if the launcher fails. So
after recovery, Livy reconnects to the application. On Spark standalone, I
am not sure what happens to the application of the launcher fails.

Thanks,
Meisam

On Mon, Mar 12, 2018 at 10:34 AM kant kodali  wrote:

> can someone please explain how YARN helps here? And why not spark master?
>
> On Mon, Mar 12, 2018 at 3:41 AM, Matteo Durighetto <
> m.durighe...@miriade.it> wrote:
>
>>
>>
>> 2018-03-12 9:58 GMT+01:00 kant kodali :
>>
>>> Sorry I see there is a recovery mode and also I can set state store to
>>> zookeeper but looks like I need YARN? because I get the error message below
>>>
>>> "requirement failed: Session recovery requires YARN"
>>>
>>>
>>> I am using spark standalone and I don't use YARN anywhere in my cluster.
>>> is there any other option for recovery in this case?
>>>
>>>
>>> On Sun, Mar 11, 2018 at 11:57 AM, kant kodali 
>>> wrote:
>>>
 Hi All,

 When my live server crashes it looks like all my spark jobs are gone. I
 am trying to see how I can make it more resilient? other words, I would
 like spark jobs that were spawned by Livy to be running even if my Livy
 server crashes because in theory Livy server can crash anytime and Spark
 Jobs should run for weeks or months in my case. How can I achieve this?

 Thanks!


>>> Hello,
>>  to enable recovery in Livy you need Spark on YARN
>>
>> ( https://spark.apache.org/docs/latest/running-on-yarn.html )
>>
>>
>>
>> Kind Regards
>>
>
>


Re: When I submit a livy job am I running in a client mode or cluster mode ?

2018-03-08 Thread Meisam Fathi
client mode and cluster mode are for YARN. if you are using Spark
standalone, your application will run in "client" mode.

Thanks,
Meisam

On Wed, Mar 7, 2018 at 9:41 PM kant kodali  wrote:

> livy.spark.master yarn ?? Do I need to have yarn ? can I use spark
> standalone master?
>
> On Mon, Feb 12, 2018 at 1:19 AM, Jeff Zhang  wrote:
>
>> livy.spark.master yarn
>> livy.spark.deploy-mode  cluster
>>
>>
>> kant kodali 于2018年2月12日周一 下午4:36写道:
>>
>>> I checked my livy.conf and livy.spark.deploy-mode is not set at all. so
>>> I wonder which mode it runs by default?
>>>
>>> On Sun, Feb 11, 2018 at 11:50 PM, Jeff Zhang  wrote:
>>>

 Via livy.spark.master & livy.spark.deploy-mode in livy.conf



 kant kodali 于2018年2月12日周一 下午3:49写道:

> Hi All,
>
> When I submit a livy job to livy server am I running in a client mode
> or cluster mode ? How can I switch from one mode to another?
>
> Thanks!
>
>
>>>
>


Re: Livy Installation

2018-02-13 Thread Meisam Fathi
On Mon, Feb 12, 2018 at 2:01 AM Vinod Kancharana 
wrote:

> Is there a recommended cluster server (edge node or master node or cluster
> node) for Livy Installation? Right now, I installed Livy on the master node
> running Yarn Resource Manager.
>
> I am not the best person to answer this question, but here is my two
cents. Livy is a lightweight process. Even with thousands of sessions
running on it, it needs less than 1GB memory, and not much CPU. The only
time Livy becomes a bottleneck is, when many sessions/jobs are submitted to
it in a short burst (because each new session/job launches a new JVM to
submit the job to YARN).


> Also, can we enable Load Balancing with two Livy instances running on the
> same cluster?
>
> No. You cannot because the two Livy instances won't coordinate. For
example, if a session is created on one Livy instance, it won't be
available on the second Livy instance.

PS: I have configured Livy to use ZooKeeper for state-store.
>
> Thanks !!
>


Re: How do I set spark conf parameters?

2018-01-31 Thread Meisam Fathi
On a more general note, with or without Livy, Spark configurations should
be set before creating the Spark contex (sc). Setting configs after sc is
created has no effect.

Thanks,
Meisam

On Wed, Jan 31, 2018 at 3:14 AM Stefan Miklosovic 
wrote:

> livyClient = new LivyClientBuilder()
>   .setAll(withSparkProperties())
>   .setURI(new URI(LIVY_URI))
>   .build()
>
>
>
> On Wed, Jan 31, 2018 at 12:13 PM, Stefan Miklosovic 
> wrote:
>
>> Yes, I am using setAll(Properties).
>>
>> On Wed, Jan 31, 2018 at 11:45 AM, kant kodali  wrote:
>>
>>> Sorry. I found an answer online.It should be something like this
>>>
>>> new LivyClientBuilder().setConf("spark.es.index.auto.create", 
>>> "true").setConf("spark.cassandra.connection.host", "127.0.0.01").build();
>>>
>>>
>>> On Wed, Jan 31, 2018 at 1:16 AM, kant kodali  wrote:
>>>
 Hi All,


 How do I set Spark Conf Parameters ? The below doesnt seem to get
 picked up? If so, how can I change my program such that it can pick it up.
 I am not seeing a way if sparkcontext is already created?

 public String call(JobContext ctx) throws Exception {
 ctx.sc().setLogLevel("INFO");
 ctx.sc().getConf()
 .set("spark.cassandra.connection.host", 
 config.getString("cassandra.host"))
 .set("spark.cassandra.auth.username", 
 config.getString("cassandra.user"))
 .set("spark.cassandra.auth.password", 
 config.getString("cassandra.pass"))
 .set("es.index.auto.create", "true");

 }


 Thanks!


>>>
>>
>>
>> --
>> Stefan Miklosovic
>>
>
>
>
> --
> Stefan Miklosovic
>


Re: How to scale livy servers?

2017-12-17 Thread Meisam Fathi
> 1) I have a couple of livy servers that are submitting jobs and say one of
> them crashes the session id's again start from 0 which can coincide with
> the non-faulty running livy servers. I think it would be nice to have
> session id's as UUID. isn't it?
>

If you enable recovery, the session IDs won't restart from 0 after recovery.


> 2) Is there a way to get job progress periodically or get notified if it
> dies and so on ?
>

 In the REST API, not as far as I know. You have to poll session/job status.

Thanks,
Meisam


Re: How to submit a streaming query that runs forever using Livy?

2017-12-06 Thread Meisam Fathi
Please find some of the answers inlline.

> SparkSession sparkSession = ctx.sparkSession();
>
> ctx.sc().getConf().setAppName("MyJob"); // *This app name is not getting 
> set when I go http://localhost:8998/sessions *
>
> By this time the spark session is already created. You should set the
configs before starting SparkContext.


>
> *   // I can see my query but Appid is 
> always set to null*
>
> AppId should be given to Livy by YARN (if you are running on YARN). It may
take a while to get response if YARN is busy. If you are not getting an
appID at all, then your application was not submitted correctly. You may
want to check your cluster manager UI for more information


>
> *System.out.println("READING STREAM");*
>
> This will be executed on the driver node. If you are running Spark in
standalone mode or in client mode, the driver node is the same node that
runs Spark. If you are running Spark in cluster mode, the driver is a
random node on the cluster assigned by YARN.


>
>
> df.printSchema(); // *Where does these print statements go ?*
>
>
>
Same as above.


> awaitTermination(); // *This thing blocks forever and I don't want to set a 
> timeout. *
>
>
> *  // so what should I do to fire and 
> forget a streaming job ? *
>
> I believe you can call .start()

> livy.submit(new StreamingJob()).get(); // *This will block forever*
>
> Livy.submit.get returns a value only if the job succeeds. You may want to
use
onJobFailed(JobHandle job, Throwable cause) as well to handle errors and
get a better idea why the job is not returning.



> System.out.println("SUBMITTED JAR!");  // *The control will never get 
> here so I can't submit another job.*\
>
> See above.

Thanks,
Meisam


Re: How to submit the job via REST ?

2017-12-01 Thread Meisam Fathi
You should compile and package PiJar before running this code snippet. It
does not need to be a separate app/project. You can put the PiJob code
right next to the code snippet to run it. MVN/sbt/gradle can create the jar
for you and I assume there is a way to call them programmatically, but that
is not needed. You can use the path to the jar file as piJar.

I hope this answers your question.

Thanks,
Meisam

import org.apache.spark.api.java.function.*;

import org.apache.livy.*;

public class PiJob implements Job, Function,
Function2 {

  private final int samples;

  public PiJob(int samples) {
this.samples = samples;
  }

  @Override
  public Double call(JobContext ctx) throws Exception {
List sampleList = new ArrayList();
for (int i = 0; i < samples; i++) {
  sampleList.add(i + 1);
}

return 4.0d *
ctx.sc().parallelize(sampleList).map(this).reduce(this) / samples;
  }

  @Override
  public Integer call(Integer v1) {
double x = Math.random();
double y = Math.random();
return (x*x + y*y < 1) ? 1 : 0;
  }

  @Override
  public Integer call(Integer v1, Integer v2) {
return v1 + v2;
  }

}


On Fri, Dec 1, 2017 at 1:09 AM kant kodali  wrote:

> Hi All,
>
> I am looking at the following snippet of code and I wonder where and how
> do I create piJar ? can I create programmatically if so how? is there a
> complete hello world example somewhere where I can follow steps and see how
> this works?
>
> Concerning line
>
> client.uploadJar(new File(piJar)).get();
>
>
>
> Code snippet
>
> LivyClient client = new LivyClientBuilder()
>   .setURI(new URI(livyUrl))
>   .build();
> try {
>   System.err.printf("Uploading %s to the Spark context...\n", piJar);
>   client.uploadJar(new File(piJar)).get();
>
>   System.err.printf("Running PiJob with %d samples...\n", samples);
>   double pi = client.submit(new PiJob(samples)).get();
>
>   System.out.println("Pi is roughly: " + pi);
> } finally {
>   client.stop(true);
> }
>
>


Re: Does Apache Livy support Spark Structured Streaming 2.2.0?

2017-11-29 Thread Meisam Fathi
That's right.

Sorry I am new to this. By Job API I assume you meant programmatic API
>  what
> is interactive query?
>


Re: Livy POST Sessions can not work with conf

2017-11-29 Thread Meisam Fathi
I am curious to know if this is resolved yet. I see that in the original
email you used "conf" : {"spark.dynamicAllocation.enabled":false
,"spark.shuffle.service.enabled":false}. But in the second email, you
used "conf"
: {"spark.dynamicAllocation.enabled":"false", "spark.
shuffle.service.enabled":"false"}.  Was that the issue?

Thanks,
Meisam

On Fri, Nov 3, 2017 at 3:18 AM 王峰  wrote:

> I have try to use livy 0.4 with POST requst
> POST :livyurl/sessions
> Content-Type: application/json; charset=utf-8
> {
>   "kind" : "spark",
>   "proxyUser" : "root",
>   "executorMemory" : "4G",
>   "executorCores": 4,
>   "numExecutors" : 4,
>"conf" :
> {"spark.dynamicAllocation.enabled":"false","spark.shuffle.service.enabled":"false"}
>
> }
>
> But it doesn`t come into force...maybe I should try another way...
>
> 2017-11-03 17:54 GMT+08:00 Saisai Shao :
>
>> I think it should be worked, can you please test with 0.4 version of
>> Livy. Also "conf" should be a map of string key to string value.
>>
>> "conf" : {"spark.dynamicAllocation.enabled":"false","spark.shuffle.
>> service.enabled":"false"}
>>
>> Besides, please be aware in the current Livy we only tested on local and
>> yarn mode, we don't guarantee the correct behavior using Mesos cluster
>> manager.
>>
>> On Fri, Nov 3, 2017 at 5:36 PM, 王峰  wrote:
>>
>>> Hello everyone , I have meet a problem about Livy-0.3 when I run `POST
>>> /sessions` to create a new interactive spark session
>>>
>>>
>>> here is the post body
>>> ```Content-Type: application/json; charset=utf-8
>>> {
>>>   "kind" : "spark",
>>>   "proxyUser" : "root",
>>>   "executorMemory" : "4G",
>>>   "executorCores": 4,
>>>   "numExecutors" : 4,
>>>"conf" :
>>> {"spark.dynamicAllocation.enabled":false,"spark.shuffle.service.enabled":false}
>>> }
>>> ```
>>>
>>> However, I found that this Livy session  allocated all of resources in
>>> Mesos UI as Pic shows
>>>
>>> [image: 内嵌图片 1]
>>>
>>> It seem like that `conf` did not worked but numExecutors , executorCores
>>> and executorMemory worked well..
>>>
>>> please help me  thanks...
>>>
>>
>>


Re: Spark job via livy issue from a docker container

2017-10-20 Thread Meisam Fathi
Hi Sarjeet!

Is livy.rsc.rpc.server.address set in the conf/livy-client.conf? Or in
conf/livy.conf if you are using an older version of Livy?

Thanks,
Meisam

2017-10-19 12:44:59,688 WARN [Driver] rsc.RSCConf: Your hostname, node1.lab
> (*valid hostname*), resolves to a loopback address, but we couldn't find
> any external IP address!
> 2017-10-19 12:44:59,688 WARN [Driver] rsc.RSCConf: Set
> livy.rsc.rpc.server.address if you need to bind to another address.
> 2017-10-19 12:44:59,704 ERROR [Driver] yarn.ApplicationMaster:
>



> Hence, the issue is only isolated to livy or any configuration I may be
> missing here? Let me know if there is any other information helpful in
> debugging the issue.
>
> - Sarjeet Singh
>


Re: user defined sessionId / URI for Livy sessions

2017-09-13 Thread Meisam Fathi
>
>
> E.g. something like this may be useful in an active-active livy
> configuration because its not clear how sequential numeric id's would work
> in that context. Perhaps UUID's would also suffice for the active-active
> setup.
>

For active-active, sequential numeric IDs could be generated by
`StateStore`/`SessionStore` to guarantee uniqueness. I see how having UUIDs
can be a different way of solving the problem.

Thanks,
Meisam


Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Meisam Fathi
> If we're using session name, how do we guarantee the uniqueness of this
> name?
>

If the requested session name already exist, Livy returns an error and does
not create the session.

Thanks,
Meisam


Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Meisam Fathi
+ dev
Is there any interest in adding this feature to Livy? I can send a PR

Ideally, it would be helpful if we could mint a session ID with a PUT
> request, something like PUT /sessions/foobar, where "foobar" is the newly
> created sessionId.
>
> I suggest we make session names unique and nonnumeric values (to guarantee
a session name does not clash with another session name or session ID).

Design doc:
https://github.com/meisam/incubator-livy/wiki/Design-doc-for-Livy-41:-Accessing-sessions-by-name
JIRA ticket: https://issues.apache.org/jira/browse/LIVY-41


Thanks,
Meisam


Re: Livy jobs keeps runing forever

2017-07-26 Thread Meisam Fathi
Livy garbage collects jobs. These setting in conf/livy.com are available to
control how Livy garbage collects sessions.
livy.server.session.timeout-check
ivy.server.session.timeout
livy.server.session.state-retain.sec

In our clusters, we have seen sometimes batch jobs fail to start and remain
in the starting state forever. With the current logic for garbage
collection in Livy, such jobs are never garbage collected. We modified the
garbage collection logic so that all jobs are eventually garbage collected
unless they are in running state.

Thanks,
Meisam

On Tue, Jul 25, 2017 at 11:00 AM Marcelo Vanzin  wrote:

> On Tue, Jul 25, 2017 at 8:35 AM, Joaquín Silva 
> wrote:
> > 17/07/25 14:21:46 ERROR yarn.ApplicationMaster: User class threw
> exception:
> > java.lang.OutOfMemoryError: PermGen space
> >
> > So in order to solve this issue I increased the executor and driver
> memory:
> > "driverMemory":"15g","executorMemory":"15g". But I still seen this error.
>
> That error won't be fixed by adding more memory; you need to set
> "XX:MaxPermSize=blah" to fix it, or use Java 8.
>
> Still it shouldn't cause the app to just hang, it should eventually
> fail. So perhaps there's a bug in Livy's error handling path
> somewhere.
>
> --
> Marcelo
>


Re: Multiple Livy instances and load balancing

2017-07-26 Thread Meisam Fathi
Hi Vivek,

We are running multiple instances of Livy on our clusters. Our users can
create and access jobs on any of the Livy instances.

To answer your particular question

*1. Is this feature available in the 0.3 release?* No. The feature is not
available out of the box–we modified Livy to add this feature.
*2. How would I name/number the multiple instances I bring up?* The load
balancer knows all the instances. Users only interact with the load
balancer.
*3. How does one load balance and send requests across the multiple
instances?* Livy instances do not know that requests come from a load
balancer. Each processes requests it receives and updates a shared *"session
store"*, which we implemented on top of ZooKeeper.
*4. Does Livy have a heartbeat mechanism to understand which or how many
instances are up?* No. There is a heartbeat mechanism in Livy, but we do
not using it to detect live/dead Livy instances. Each instance only reacts
to the updates to the ZooKeeper *"session store"*, which is shared by all
instances of Livy.

Thanks,
Meisam

On Wed, Jul 26, 2017 at 1:36 AM Vivek  wrote:

> Ok.so if I start multiple instances how will I know which instance to send
> the request to?
> And multiple instances would then be only controlled by the port id?
>
>
> Sent from my iPhone
>
> On 26 Jul 2017, at 4:04 PM, Saisai Shao  wrote:
>
> Current Livy doesn't support the things you mentioned here. You can start
> multiple Livy in the cluster, but each LivyServer is a standalone service
> doesn't aware the existence of others.
>
> On Wed, Jul 26, 2017 at 10:27 AM, Vivek  wrote:
>
>> Hi,
>>
>> We are now considering moving into a uat environment using Livy at my
>> company.
>>
>> Has anyone implemented multiple Livy instances on a single cluster with
>> load balancing?
>>
>> A few questions.
>> 1. Is this feature available in the 0.3 release?
>> 2. How would I name/number the multiple instances I bring up?
>> 3. How does one load balance and send requests across the multiple
>> instances?
>> 4. Does Livy have a heartbeat mechanism to understand which or how many
>> instances are up?
>>
>> Any answers would be appreciated.
>>
>> Regards
>> Vivek
>>
>>
>> Sent from my iPhone
>>
>
>