Hi Andrew,
Thanks a lot for your response. I am aware of the '--master' flag in the
spark-submit command. However I would like to create the SparkContext
inside my coding.
Maybe I should elaborate a little bit further: I would like to reuse e.g.
the result of any Spark computation inside my code.
Here is the SparkPi example:
String[] jars = new String[1];
>
> jars[0] = System.getProperty("user.dir") +
>> "/target/SparkPi-1.0-SNAPSHOT.jar";
>
>
>> SparkConf conf = new SparkConf()
>
> .setAppName("JavaSparkPi")
>
> .setMaster("spark://SPARK_HOST:7077")
>
> .setJars(jars);
>
> JavaSparkContext sc = new JavaSparkContext(conf);
>
>
>> int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
>
> int n = 100 * slices;
>
> List l = new ArrayList(n);
>
> for (int i = 0; i < n; i++) {
>
> l.add(i);
>
> }
>
>
>> JavaRDD dataSet = sc.parallelize(l, slices);
>
>
>> int *count* = dataSet.map(new Function() {
>
> @Override
>
> public Integer call(Integer integer) {
>
> double x = Math.random() * 2 - 1;
>
> double y = Math.random() * 2 - 1;
>
> return (x * x + y * y < 1) ? 1 : 0;
>
> }
>
> }).reduce(new Function2() {
>
> @Override
>
> public Integer call(Integer integer, Integer integer2) {
>
> return integer + integer2;
>
> }
>
> });
>
> System.out.println("Pi is roughly " + 4.0 * *count* / n);
>
>
>> sc.stop();
>
>
As you can see, I can reuse the result (count) in my coding directly.
So my goal would be to resuse this kind of implementation in YARN mode
(client/cluster mode). However, I didn't really find a solution how to do
that, since I always have to submit my Spark code via spark-submit.
What if I want to run this code as part of a web application which renders
the result as web page?
-- Andreas
On Tue, Aug 18, 2015 at 10:50 PM, Andrew Or wrote:
> Hi Andreas,
>
> I believe the distinction is not between standalone and YARN mode, but
> between client and cluster mode.
>
> In client mode, your Spark submit JVM runs your driver code. In cluster
> mode, one of the workers (or NodeManagers if you're using YARN) in the
> cluster runs your driver code. In the latter case, it doesn't really make
> sense to call `setMaster` in your driver because Spark needs to know which
> cluster you're submitting the application to.
>
> Instead, the recommended way is to set the master through the `--master`
> flag in the command line, e.g.
>
> $ bin/spark-submit
> --master spark://1.2.3.4:7077
> --class some.user.Clazz
> --name "My app name"
> --jars lib1.jar,lib2.jar
> --deploy-mode cluster
> app.jar
>
> Both YARN and standalone modes support client and cluster modes, and the
> spark-submit script is the common interface through which you can launch
> your application. In other words, you shouldn't have to do anything more
> than providing a different value to `--master` to use YARN.
>
> -Andrew
>
> 2015-08-17 0:34 GMT-07:00 Andreas Fritzler :
>
>> Hi all,
>>
>> when runnig the Spark cluster in standalone mode I am able to create the
>> Spark context from Java via the following code snippet:
>>
>> SparkConf conf = new SparkConf()
>>>.setAppName("MySparkApp")
>>>.setMaster("spark://SPARK_MASTER:7077")
>>>.setJars(jars);
>>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>>
>> As soon as I'm done with my processing, I can just close it via
>>
>>> sc.stop();
>>>
>> Now my question: Is the same also possible when running Spark on YARN? I
>> currently don't see how this should be possible without submitting your
>> application as a packaged jar file. Is there a way to get this kind of
>> interactivity from within your Scala/Java code?
>>
>> Regards,
>> Andrea
>>
>
>