Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-06 Thread Andreas Fritzler
Hi Andrew,

thanks a lot for the clarification!

Regards,
Andreas

On Tue, Oct 6, 2015 at 2:23 AM, Andrew Or <and...@databricks.com> wrote:

> Hi all,
>
> Both the history server and the shuffle service are backward compatible,
> but not forward compatible. This means as long as you have the latest
> version of history server / shuffle service running in your cluster then
> you're fine (you don't need multiple of them).
>
> That said, an old shuffle service (e.g. 1.2) also happens to work with say
> Spark 1.4 because the shuffle file formats haven't changed. However, there
> are no guarantees that this will remain the case.
>
> -Andrew
>
> 2015-10-05 16:37 GMT-07:00 Alex Rovner <alex.rov...@magnetic.com>:
>
>> We are running CDH 5.4 with Spark 1.3 as our main version and that
>> version is configured to use the external shuffling service. We have also
>> installed Spark 1.5 and have configured it not to use the external
>> shuffling service and that works well for us so far. I would be interested
>> myself how to configure multiple versions to use the same shuffling service.
>>
>> *Alex Rovner*
>> *Director, Data Engineering *
>> *o:* 646.759.0052
>>
>> * <http://www.magnetic.com/>*
>>
>> On Mon, Oct 5, 2015 at 11:06 AM, Andreas Fritzler <
>> andreas.fritz...@gmail.com> wrote:
>>
>>> Hi Steve, Alex,
>>>
>>> how do you handle the distribution and configuration of
>>> the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2
>>> different Spark versions?
>>>
>>> Regards,
>>> Andreas
>>>
>>> On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran <ste...@hortonworks.com>
>>> wrote:
>>>
>>>>
>>>> > On 5 Oct 2015, at 16:48, Alex Rovner <alex.rov...@magnetic.com>
>>>> wrote:
>>>> >
>>>> > Hey Steve,
>>>> >
>>>> > Are you referring to the 1.5 version of the history server?
>>>> >
>>>>
>>>>
>>>> Yes. I should warn, however, that there's no guarantee that a history
>>>> server running the 1.4 code will handle the histories of a 1.5+ job. In
>>>> fact, I'm fairly confident it won't, as the events to get replayed are
>>>> different.
>>>>
>>>
>>>
>>
>


[Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi,

I was just wondering, if it is possible to register multiple versions of
the aux-services with YARN as described in the documentation:



   1. In the yarn-site.xml on each node, add spark_shuffle to
   yarn.nodemanager.aux-services, then set
   yarn.nodemanager.aux-services.spark_shuffle.class to
   org.apache.spark.network.yarn.YarnShuffleService. Additionally, set all
   relevantspark.shuffle.service.* configurations
   .

The reason for the question is: I am trying to run multiple versions of
Spark in parallel. Does anybody have any experience on how such a dual
version operation holds up in terms of downward-compatibility?

Maybe sticking to the latest version of the aux-service will do the trick?

Regards,
Andreas

[1]
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation


Re: [Spark on YARN] Multiple Auxiliary Shuffle Service Versions

2015-10-05 Thread Andreas Fritzler
Hi Steve, Alex,

how do you handle the distribution and configuration of
the spark-*-yarn-shuffle.jar on your NodeManagers if you want to use 2
different Spark versions?

Regards,
Andreas

On Mon, Oct 5, 2015 at 4:54 PM, Steve Loughran 
wrote:

>
> > On 5 Oct 2015, at 16:48, Alex Rovner  wrote:
> >
> > Hey Steve,
> >
> > Are you referring to the 1.5 version of the history server?
> >
>
>
> Yes. I should warn, however, that there's no guarantee that a history
> server running the 1.4 code will handle the histories of a 1.5+ job. In
> fact, I'm fairly confident it won't, as the events to get replayed are
> different.
>


Re: Programmatically create SparkContext on YARN

2015-08-19 Thread Andreas Fritzler
Hi Andrew,

Thanks a lot for your response. I am aware of the '--master' flag in the
spark-submit command. However I would like to create the SparkContext
inside my coding.

Maybe I should elaborate a little bit further: I would like to reuse e.g.
the result of any Spark computation inside my code.

Here is the SparkPi example:

String[] jars = new String[1];

   jars[0] = System.getProperty(user.dir) +
 /target/SparkPi-1.0-SNAPSHOT.jar;


   SparkConf conf = new SparkConf()

   .setAppName(JavaSparkPi)

   .setMaster(spark://SPARK_HOST:7077)

   .setJars(jars);

   JavaSparkContext sc = new JavaSparkContext(conf);


   int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;

   int n = 100 * slices;

   ListInteger l = new ArrayListInteger(n);

   for (int i = 0; i  n; i++) {

 l.add(i);

   }


   JavaRDDInteger dataSet = sc.parallelize(l, slices);


   int *count* = dataSet.map(new FunctionInteger, Integer() {

 @Override

 public Integer call(Integer integer) {

   double x = Math.random() * 2 - 1;

   double y = Math.random() * 2 - 1;

   return (x * x + y * y  1) ? 1 : 0;

 }

   }).reduce(new Function2Integer, Integer, Integer() {

 @Override

 public Integer call(Integer integer, Integer integer2) {

   return integer + integer2;

 }

   });

   System.out.println(Pi is roughly  + 4.0 * *count* / n);


   sc.stop();


As you can see, I can reuse the result (count) in my coding directly.

So my goal would be to resuse this kind of implementation in YARN mode
(client/cluster mode). However, I didn't really find a solution how to do
that, since I always have to submit my Spark code via spark-submit.

What if I want to run this code as part of a web application which renders
the result as web page?

-- Andreas

On Tue, Aug 18, 2015 at 10:50 PM, Andrew Or and...@databricks.com wrote:

 Hi Andreas,

 I believe the distinction is not between standalone and YARN mode, but
 between client and cluster mode.

 In client mode, your Spark submit JVM runs your driver code. In cluster
 mode, one of the workers (or NodeManagers if you're using YARN) in the
 cluster runs your driver code. In the latter case, it doesn't really make
 sense to call `setMaster` in your driver because Spark needs to know which
 cluster you're submitting the application to.

 Instead, the recommended way is to set the master through the `--master`
 flag in the command line, e.g.

 $ bin/spark-submit
 --master spark://1.2.3.4:7077
 --class some.user.Clazz
 --name My app name
 --jars lib1.jar,lib2.jar
 --deploy-mode cluster
 app.jar

 Both YARN and standalone modes support client and cluster modes, and the
 spark-submit script is the common interface through which you can launch
 your application. In other words, you shouldn't have to do anything more
 than providing a different value to `--master` to use YARN.

 -Andrew

 2015-08-17 0:34 GMT-07:00 Andreas Fritzler andreas.fritz...@gmail.com:

 Hi all,

 when runnig the Spark cluster in standalone mode I am able to create the
 Spark context from Java via the following code snippet:

 SparkConf conf = new SparkConf()
.setAppName(MySparkApp)
.setMaster(spark://SPARK_MASTER:7077)
.setJars(jars);
 JavaSparkContext sc = new JavaSparkContext(conf);


 As soon as I'm done with my processing, I can just close it via

 sc.stop();

 Now my question: Is the same also possible when running Spark on YARN? I
 currently don't see how this should be possible without submitting your
 application as a packaged jar file. Is there a way to get this kind of
 interactivity from within your Scala/Java code?

 Regards,
 Andrea





Programmatically create SparkContext on YARN

2015-08-17 Thread Andreas Fritzler
Hi all,

when runnig the Spark cluster in standalone mode I am able to create the
Spark context from Java via the following code snippet:

SparkConf conf = new SparkConf()
.setAppName(MySparkApp)
.setMaster(spark://SPARK_MASTER:7077)
.setJars(jars);
 JavaSparkContext sc = new JavaSparkContext(conf);


As soon as I'm done with my processing, I can just close it via

 sc.stop();

Now my question: Is the same also possible when running Spark on YARN? I
currently don't see how this should be possible without submitting your
application as a packaged jar file. Is there a way to get this kind of
interactivity from within your Scala/Java code?

Regards,
Andrea