RE: How to design a long live spark application

2015-02-06 Thread Shuai Zheng
Thanks. I think about it, yes, the DAG engine should not have issue to build 
the right graph in different threads (at least in theory, it is not an issue).

 

So now I have another question: if I have a context initiated, but there is no 
operation on it for very long time, will there a timeout on it? How Spark to 
control/maintain/detect the live of the client spark context?

Do I need to setup something special?

 

Regards,

 

Shuai

 

From: Eugen Cepoi [mailto:cepoi.eu...@gmail.com] 
Sent: Thursday, February 05, 2015 5:39 PM
To: Shuai Zheng
Cc: Corey Nolet; Charles Feduke; user@spark.apache.org
Subject: Re: How to design a long live spark application

 

Yes you can submit multiple actions from different threads to the same 
SparkContext. It is safe.

Indeed what you want to achieve is quite common. Expose some operations over a 
SparkContext through HTTP.

I have used spray for this and it just worked fine.

At bootstrap of your web app, start a sparkcontext, maybe preprocess some data 
and cache it, then start accepting requests against this sc. Depending where 
you place the initialization code, you can block the server from initializing 
until your context is ready. This is nice if you don't want to accept requests 
while the context is being prepared.

 

 

Eugen

 

 

2015-02-05 23:22 GMT+01:00 Shuai Zheng szheng.c...@gmail.com:

This example helps a lot J

 

But I am thinking a below case:

 

Assume I have a SparkContext as a global variable. 

Then if I use multiple threads to access/use it. Will it mess up?

 

For example:

 

My code:

 

public static ListTuple2Integer, Double run(JavaSparkContext sparkContext, 
MapInteger, ListExposureInfo cache, Properties prop, ListEghInfo el)

 throws IOException, InterruptedException {

JavaRDDEghInfo lines = sparkContext.parallelize(el, 100);

Lines.map(…)

…

Lines.count()

}

 

If I have two threads call this method at the same time and pass in the same 
SparkContext.

 

Will SparkContext be thread-safe? I am a bit worry here, in traditional java, 
it should be, but in Spark context, I am not 100% sure. 

 

Basically the sparkContext need to smart enough to differentiate the different 
method context (RDD add to it from different methods), so create two different 
DAG for different method. 

 

Anyone can confirm this? This is not something I can easily test with code. 
Thanks!

 

Regards,

 

Shuai

 

From: Corey Nolet [mailto:cjno...@gmail.com] 
Sent: Thursday, February 05, 2015 11:55 AM
To: Charles Feduke
Cc: Shuai Zheng; user@spark.apache.org
Subject: Re: How to design a long live spark application

 

Here's another lightweight example of running a SparkContext in a common java 
servlet container: https://github.com/calrissian/spark-jetty-server

 

On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com 
wrote:

If you want to design something like Spark shell have a look at:

 

http://zeppelin-project.org/

 

Its open source and may already do what you need. If not, its source code will 
be helpful in answering the questions about how to integrate with long running 
jobs that you have.

 

On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com wrote:

You can check out https://github.com/spark-jobserver/spark-jobserver - this 
allows several users to upload their jars and run jobs with a REST interface.

 

However, if all users are using the same functionality, you can write a simple 
spray server which will act as the driver and hosts the spark context+RDDs, 
launched in client mode.

 

On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com wrote:

Hi All,

 

I want to develop a server side application:

 

User submit request à Server run spark application and return (this might take 
a few seconds).

 

So I want to host the server to keep the long-live context, I don’t know 
whether this is reasonable or not.

 

Basically I try to have a global JavaSparkContext instance and keep it there, 
and initialize some RDD. Then my java application will use it to submit the job.

 

So now I have some questions:

 

1, if I don’t close it, will there any timeout I need to configure on the spark 
server?

2, In theory I want to design something similar to Spark shell (which also host 
a default sc there), just it is not shell based. 

 

Any suggestion? I think my request is very common for application development, 
here must someone has done it before?

 

Regards,

 

Shawn

 

 

 



Re: How to design a long live spark application

2015-02-05 Thread Eugen Cepoi
Yes you can submit multiple actions from different threads to the same
SparkContext. It is safe.
Indeed what you want to achieve is quite common. Expose some operations
over a SparkContext through HTTP.
I have used spray for this and it just worked fine.

At bootstrap of your web app, start a sparkcontext, maybe preprocess some
data and cache it, then start accepting requests against this sc. Depending
where you place the initialization code, you can block the server from
initializing until your context is ready. This is nice if you don't want to
accept requests while the context is being prepared.


Eugen


2015-02-05 23:22 GMT+01:00 Shuai Zheng szheng.c...@gmail.com:

 This example helps a lot J



 But I am thinking a below case:



 Assume I have a SparkContext as a global variable.

 Then if I use multiple threads to access/use it. Will it mess up?



 For example:



 My code:



 *public* *static* ListTuple2Integer, Double run(JavaSparkContext
 sparkContext, MapInteger, ListExposureInfo cache, Properties prop,
 ListEghInfo el)

  *throws* IOException, InterruptedException {

 JavaRDDEghInfo lines = sparkContext.parallelize(el, 100);

 Lines.map(…)

 …

 Lines.count()

 }



 If I have two threads call this method at the same time and pass in the
 same SparkContext.



 Will SparkContext be thread-safe? I am a bit worry here, in traditional
 java, it should be, but in Spark context, I am not 100% sure.



 Basically the sparkContext need to smart enough to differentiate the
 different method context (RDD add to it from different methods), so create
 two different DAG for different method.



 Anyone can confirm this? This is not something I can easily test with
 code. Thanks!



 Regards,



 Shuai



 *From:* Corey Nolet [mailto:cjno...@gmail.com]
 *Sent:* Thursday, February 05, 2015 11:55 AM
 *To:* Charles Feduke
 *Cc:* Shuai Zheng; user@spark.apache.org
 *Subject:* Re: How to design a long live spark application



 Here's another lightweight example of running a SparkContext in a common
 java servlet container: https://github.com/calrissian/spark-jetty-server



 On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com
 wrote:

 If you want to design something like Spark shell have a look at:



 http://zeppelin-project.org/



 Its open source and may already do what you need. If not, its source code
 will be helpful in answering the questions about how to integrate with long
 running jobs that you have.



 On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com
 wrote:

 You can check out https://github.com/spark-jobserver/spark-jobserver -
 this allows several users to upload their jars and run jobs with a REST
 interface.



 However, if all users are using the same functionality, you can write a
 simple spray server which will act as the driver and hosts the spark
 context+RDDs, launched in client mode.



 On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com
 wrote:

 Hi All,



 I want to develop a server side application:



 User submit request à Server run spark application and return (this might
 take a few seconds).



 So I want to host the server to keep the long-live context, I don’t know
 whether this is reasonable or not.



 Basically I try to have a global JavaSparkContext instance and keep it
 there, and initialize some RDD. Then my java application will use it to
 submit the job.



 So now I have some questions:



 1, if I don’t close it, will there any timeout I need to configure on the
 spark server?

 2, In theory I want to design something similar to Spark shell (which also
 host a default sc there), just it is not shell based.



 Any suggestion? I think my request is very common for application
 development, here must someone has done it before?



 Regards,



 Shawn







RE: How to design a long live spark application

2015-02-05 Thread Shuai Zheng
This example helps a lot J

 

But I am thinking a below case:

 

Assume I have a SparkContext as a global variable. 

Then if I use multiple threads to access/use it. Will it mess up?

 

For example:

 

My code:

 

public static ListTuple2Integer, Double run(JavaSparkContext sparkContext, 
MapInteger, ListExposureInfo cache, Properties prop, ListEghInfo el)

 throws IOException, InterruptedException {

JavaRDDEghInfo lines = sparkContext.parallelize(el, 100);

Lines.map(…)

…

Lines.count()

}

 

If I have two threads call this method at the same time and pass in the same 
SparkContext.

 

Will SparkContext be thread-safe? I am a bit worry here, in traditional java, 
it should be, but in Spark context, I am not 100% sure. 

 

Basically the sparkContext need to smart enough to differentiate the different 
method context (RDD add to it from different methods), so create two different 
DAG for different method. 

 

Anyone can confirm this? This is not something I can easily test with code. 
Thanks!

 

Regards,

 

Shuai

 

From: Corey Nolet [mailto:cjno...@gmail.com] 
Sent: Thursday, February 05, 2015 11:55 AM
To: Charles Feduke
Cc: Shuai Zheng; user@spark.apache.org
Subject: Re: How to design a long live spark application

 

Here's another lightweight example of running a SparkContext in a common java 
servlet container: https://github.com/calrissian/spark-jetty-server

 

On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com 
wrote:

If you want to design something like Spark shell have a look at:

 

http://zeppelin-project.org/

 

Its open source and may already do what you need. If not, its source code will 
be helpful in answering the questions about how to integrate with long running 
jobs that you have.

 

On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com wrote:

You can check out https://github.com/spark-jobserver/spark-jobserver - this 
allows several users to upload their jars and run jobs with a REST interface.

 

However, if all users are using the same functionality, you can write a simple 
spray server which will act as the driver and hosts the spark context+RDDs, 
launched in client mode.

 

On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com wrote:

Hi All,

 

I want to develop a server side application:

 

User submit request à Server run spark application and return (this might take 
a few seconds).

 

So I want to host the server to keep the long-live context, I don’t know 
whether this is reasonable or not.

 

Basically I try to have a global JavaSparkContext instance and keep it there, 
and initialize some RDD. Then my java application will use it to submit the job.

 

So now I have some questions:

 

1, if I don’t close it, will there any timeout I need to configure on the spark 
server?

2, In theory I want to design something similar to Spark shell (which also host 
a default sc there), just it is not shell based. 

 

Any suggestion? I think my request is very common for application development, 
here must someone has done it before?

 

Regards,

 

Shawn

 

 



Re: How to design a long live spark application

2015-02-05 Thread Charles Feduke
If you want to design something like Spark shell have a look at:

http://zeppelin-project.org/

Its open source and may already do what you need. If not, its source code
will be helpful in answering the questions about how to integrate with long
running jobs that you have.

On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com wrote:

 You can check out https://github.com/spark-jobserver/spark-jobserver -
 this allows several users to upload their jars and run jobs with a REST
 interface.

 However, if all users are using the same functionality, you can write a
 simple spray server which will act as the driver and hosts the spark
 context+RDDs, launched in client mode.

 On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com
 wrote:

 Hi All,



 I want to develop a server side application:



 User submit request à Server run spark application and return (this
 might take a few seconds).



 So I want to host the server to keep the long-live context, I don’t know
 whether this is reasonable or not.



 Basically I try to have a global JavaSparkContext instance and keep it
 there, and initialize some RDD. Then my java application will use it to
 submit the job.



 So now I have some questions:



 1, if I don’t close it, will there any timeout I need to configure on the
 spark server?

 2, In theory I want to design something similar to Spark shell (which
 also host a default sc there), just it is not shell based.



 Any suggestion? I think my request is very common for application
 development, here must someone has done it before?



 Regards,



 Shawn





Re: How to design a long live spark application

2015-02-05 Thread Corey Nolet
Here's another lightweight example of running a SparkContext in a common
java servlet container: https://github.com/calrissian/spark-jetty-server

On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com
wrote:

 If you want to design something like Spark shell have a look at:

 http://zeppelin-project.org/

 Its open source and may already do what you need. If not, its source code
 will be helpful in answering the questions about how to integrate with long
 running jobs that you have.


 On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com
 wrote:

 You can check out https://github.com/spark-jobserver/spark-jobserver -
 this allows several users to upload their jars and run jobs with a REST
 interface.

 However, if all users are using the same functionality, you can write a
 simple spray server which will act as the driver and hosts the spark
 context+RDDs, launched in client mode.

 On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com
 wrote:

 Hi All,



 I want to develop a server side application:



 User submit request à Server run spark application and return (this
 might take a few seconds).



 So I want to host the server to keep the long-live context, I don’t know
 whether this is reasonable or not.



 Basically I try to have a global JavaSparkContext instance and keep it
 there, and initialize some RDD. Then my java application will use it to
 submit the job.



 So now I have some questions:



 1, if I don’t close it, will there any timeout I need to configure on
 the spark server?

 2, In theory I want to design something similar to Spark shell (which
 also host a default sc there), just it is not shell based.



 Any suggestion? I think my request is very common for application
 development, here must someone has done it before?



 Regards,



 Shawn





Re: How to design a long live spark application

2015-02-05 Thread Chip Senkbeil
Hi,

You can also check out the Spark Kernel project:
https://github.com/ibm-et/spark-kernel

It can plug into the upcoming IPython 3.0 notebook (providing a Scala/Spark
language interface) and provides an API to submit code snippets (like the
Spark Shell) and get results directly back, rather than having to write out
your results elsewhere. A client library (
https://github.com/ibm-et/spark-kernel/wiki/Guide-for-the-Spark-Kernel-Client)
is available in Scala so you can create applications that can interactively
communicate with Apache Spark.

You can find a getting started section here:
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

If you have any more questions about the project, feel free to email me!

Signed,
Chip Senkbeil

On Thu Feb 05 2015 at 10:58:01 AM Corey Nolet cjno...@gmail.com wrote:

 Here's another lightweight example of running a SparkContext in a common
 java servlet container: https://github.com/calrissian/spark-jetty-server

 On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com
 wrote:

 If you want to design something like Spark shell have a look at:

 http://zeppelin-project.org/

 Its open source and may already do what you need. If not, its source code
 will be helpful in answering the questions about how to integrate with long
 running jobs that you have.


 On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com
 wrote:

 You can check out https://github.com/spark-jobserver/spark-jobserver -
 this allows several users to upload their jars and run jobs with a REST
 interface.

 However, if all users are using the same functionality, you can write a
 simple spray server which will act as the driver and hosts the spark
 context+RDDs, launched in client mode.

 On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com
 wrote:

 Hi All,



 I want to develop a server side application:



 User submit request à Server run spark application and return (this
 might take a few seconds).



 So I want to host the server to keep the long-live context, I don’t
 know whether this is reasonable or not.



 Basically I try to have a global JavaSparkContext instance and keep it
 there, and initialize some RDD. Then my java application will use it to
 submit the job.



 So now I have some questions:



 1, if I don’t close it, will there any timeout I need to configure on
 the spark server?

 2, In theory I want to design something similar to Spark shell (which
 also host a default sc there), just it is not shell based.



 Any suggestion? I think my request is very common for application
 development, here must someone has done it before?



 Regards,



 Shawn