Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Andreas Fritzler
Hi Andrew,

thanks a lot for the clarification!

Regards,
Andreas

On Tue, Oct 6, 2015 at 2:23 AM, Andrew Or  wrote:

> Hi all,
>
> Both the history server and the shuffle service are backward compatible,
> but not forward compatible. This means as long as you have the latest
> version of history server / shuffle service running in your cluster then
> you're fine (you don't need multiple of them).
>
> That said, an old shuffle service (e.g. 1.2) also happens to work with say
> Spark 1.4 because the shuffle file formats haven't changed. However, there
> are no guarantees that this will remain the case.
>
> -Andrew
>
> 2015-10-05 16:37 GMT-07:00 Alex Rovner :
>
>> We are running CDH 5.4 with Spark 1.3 as our main version and that
>> version is configured to use the external shuffling service. We have also
>> installed Spark 1.5 and have configured it not to use the external
>> shuffling service and that works well for us so far. I would be interested
>> myself how to configure multiple versions to use the same shuffling service.
>>
>> *Alex Rovner*
>> *Director, Data Engineering *
>> *o:* 646.759.0052
>>
>> * <http://www.magnetic.com/>*
>>
>> On Mon, Oct 5, 2015 at 11:06 AM, Andreas Fritzler <
>> andreas.fritz...@gmail.com> wrote:
>>
>>> Hi Steve, Alex,
>>>
>>> how do you handle the distribution and configuration of
>>> the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2
>>> different Spark versions?
>>>
>>> Regards,
>>> Andreas
>>>
>>> On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran 
>>> wrote:
>>>
>>>>
>>>> > On 5 Oct 2015, at 16:48, Alex Rovner 
>>>> wrote:
>>>> >
>>>> > Hey Steve,
>>>> >
>>>> > Are you referring to the 1.5 version of the history server?
>>>> >
>>>>
>>>>
>>>> Yes. I should warn, however, that there's no guarantee that a history
>>>> server running the 1.4 code will handle the histories of a 1.5+ job. In
>>>> fact, I'm fairly confident it won't, as the events to get replayed are
>>>> different.
>>>>
>>>
>>>
>>
>


Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi Steve, Alex,

how do you handle the distribution and configuration of
the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2
different Spark versions?

Regards,
Andreas

On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran 
wrote:

>
> > On 5 Oct 2015, at 16:48, Alex Rovner  wrote:
> >
> > Hey Steve,
> >
> > Are you referring to the 1.5 version of the history server?
> >
>
>
> Yes. I should warn, however, that there's no guarantee that a history
> server running the 1.4 code will handle the histories of a 1.5+ job. In
> fact, I'm fairly confident it won't, as the events to get replayed are
> different.
>


[Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi,

I was just wondering, if it is possible to register multiple versions of
the aux-services with YARN as described in the documentation:



   1. In the yarn-site.xml on each node, add spark_shuffle to
   yarn.nodemanager.aux-services, then set
   yarn.nodemanager.aux-services.spark_shuffle.class to
   org.apache.spark.network.yarn.YarnShuffleService. Additionally, set all
   relevantspark.shuffle.service.* configurations
   .

The reason for the question is: I am trying to run multiple versions of
Spark in parallel. Does anybody have any experience on how such a dual
version operation holds up in terms of downward-compatibility?

Maybe sticking to the latest version of the aux-service will do the trick?

Regards,
Andreas

[1]
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation


Re: Programmatically create SparkContext on YARN

2015-08-19 Thread Andreas Fritzler
Hi Andrew,

Thanks a lot for your response. I am aware of the '--master' flag in the
spark-submit command. However I would like to create the SparkContext
inside my coding.

Maybe I should elaborate a little bit further: I would like to reuse e.g.
the result of any Spark computation inside my code.

Here is the SparkPi example:

String[] jars = new String[1];
>
>   jars[0] = System.getProperty("user.dir") +
>> "/target/SparkPi-1.0-SNAPSHOT.jar";
>
>
>>   SparkConf conf = new SparkConf()
>
>   .setAppName("JavaSparkPi")
>
>   .setMaster("spark://SPARK_HOST:7077")
>
>   .setJars(jars);
>
>   JavaSparkContext sc = new JavaSparkContext(conf);
>
>
>>   int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
>
>   int n = 100 * slices;
>
>   List l = new ArrayList(n);
>
>   for (int i = 0; i < n; i++) {
>
> l.add(i);
>
>   }
>
>
>>   JavaRDD dataSet = sc.parallelize(l, slices);
>
>
>>   int *count* = dataSet.map(new Function() {
>
> @Override
>
> public Integer call(Integer integer) {
>
>   double x = Math.random() * 2 - 1;
>
>   double y = Math.random() * 2 - 1;
>
>   return (x * x + y * y < 1) ? 1 : 0;
>
> }
>
>   }).reduce(new Function2() {
>
> @Override
>
> public Integer call(Integer integer, Integer integer2) {
>
>   return integer + integer2;
>
> }
>
>   });
>
>   System.out.println("Pi is roughly " + 4.0 * *count* / n);
>
>
>>   sc.stop();
>
>
As you can see, I can reuse the result (count) in my coding directly.

So my goal would be to resuse this kind of implementation in YARN mode
(client/cluster mode). However, I didn't really find a solution how to do
that, since I always have to submit my Spark code via spark-submit.

What if I want to run this code as part of a web application which renders
the result as web page?

-- Andreas

On Tue, Aug 18, 2015 at 10:50 PM, Andrew Or  wrote:

> Hi Andreas,
>
> I believe the distinction is not between standalone and YARN mode, but
> between client and cluster mode.
>
> In client mode, your Spark submit JVM runs your driver code. In cluster
> mode, one of the workers (or NodeManagers if you're using YARN) in the
> cluster runs your driver code. In the latter case, it doesn't really make
> sense to call `setMaster` in your driver because Spark needs to know which
> cluster you're submitting the application to.
>
> Instead, the recommended way is to set the master through the `--master`
> flag in the command line, e.g.
>
> $ bin/spark-submit
> --master spark://1.2.3.4:7077
> --class some.user.Clazz
> --name "My app name"
> --jars lib1.jar,lib2.jar
> --deploy-mode cluster
> app.jar
>
> Both YARN and standalone modes support client and cluster modes, and the
> spark-submit script is the common interface through which you can launch
> your application. In other words, you shouldn't have to do anything more
> than providing a different value to `--master` to use YARN.
>
> -Andrew
>
> 2015-08-17 0:34 GMT-07:00 Andreas Fritzler :
>
>> Hi all,
>>
>> when runnig the Spark cluster in standalone mode I am able to create the
>> Spark context from Java via the following code snippet:
>>
>> SparkConf conf = new SparkConf()
>>>.setAppName("MySparkApp")
>>>.setMaster("spark://SPARK_MASTER:7077")
>>>.setJars(jars);
>>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>>
>> As soon as I'm done with my processing, I can just close it via
>>
>>> sc.stop();
>>>
>> Now my question: Is the same also possible when running Spark on YARN? I
>> currently don't see how this should be possible without submitting your
>> application as a packaged jar file. Is there a way to get this kind of
>> interactivity from within your Scala/Java code?
>>
>> Regards,
>> Andrea
>>
>
>


Programmatically create SparkContext on YARN

2015-08-17 Thread Andreas Fritzler
Hi all,

when runnig the Spark cluster in standalone mode I am able to create the
Spark context from Java via the following code snippet:

SparkConf conf = new SparkConf()
>.setAppName("MySparkApp")
>.setMaster("spark://SPARK_MASTER:7077")
>.setJars(jars);
> JavaSparkContext sc = new JavaSparkContext(conf);


As soon as I'm done with my processing, I can just close it via

> sc.stop();
>
Now my question: Is the same also possible when running Spark on YARN? I
currently don't see how this should be possible without submitting your
application as a packaged jar file. Is there a way to get this kind of
interactivity from within your Scala/Java code?

Regards,
Andrea