How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
Hi,

I've been running my spark streaming application in standalone mode without
any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0) but I
am having some problems.

Here are the config parameters of my application:
«
val sparkConf = new SparkConf()

sparkConf.setMaster("yarn-client")
sparkConf.set("spark.yarn.am.memory", "2g")
sparkConf.set("spark.executor.instances", "2")

sparkConf.setAppName("Benchmark")
sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
sparkConf.set("spark.executor.memory", "4g")
sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")
sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC " +
  "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ")
if (sparkConf.getOption("spark.master") == None) {
  sparkConf.setMaster("local[*]")
}
»

The jar I'm including there only contains the application classes.


Here is the log of the application: http://pastebin.com/7RSktezA

Here is the userlog on hadoop/YARN:
«
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at
org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
at
org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
»

I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
persists. Am I doing something wrong?

Thanks.


Re: How to run spark streaming application on YARN?

2015-06-04 Thread Sandy Ryza
Hi Saiph,

Are you launching using spark-submit?

-Sandy

On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa  wrote:

> Hi,
>
> I've been running my spark streaming application in standalone mode
> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
> but I am having some problems.
>
> Here are the config parameters of my application:
> «
> val sparkConf = new SparkConf()
>
> sparkConf.setMaster("yarn-client")
> sparkConf.set("spark.yarn.am.memory", "2g")
> sparkConf.set("spark.executor.instances", "2")
>
> sparkConf.setAppName("Benchmark")
>
> sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
> sparkConf.set("spark.executor.memory", "4g")
> sparkConf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
> sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
> -XX:+UseConcMarkSweepGC " +
>   "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ")
> if (sparkConf.getOption("spark.master") == None) {
>   sparkConf.setMaster("local[*]")
> }
> »
>
> The jar I'm including there only contains the application classes.
>
>
> Here is the log of the application: http://pastebin.com/7RSktezA
>
> Here is the userlog on hadoop/YARN:
> «
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/Logging
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 14 more
> »
>
> I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
> persists. Am I doing something wrong?
>
> Thanks.
>


Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought it
was the same thing since I am passing all the configurations through the
application code. Is that the problem?

On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza  wrote:

> Hi Saiph,
>
> Are you launching using spark-submit?
>
> -Sandy
>
> On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa 
> wrote:
>
>> Hi,
>>
>> I've been running my spark streaming application in standalone mode
>> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
>> but I am having some problems.
>>
>> Here are the config parameters of my application:
>> «
>> val sparkConf = new SparkConf()
>>
>> sparkConf.setMaster("yarn-client")
>> sparkConf.set("spark.yarn.am.memory", "2g")
>> sparkConf.set("spark.executor.instances", "2")
>>
>> sparkConf.setAppName("Benchmark")
>>
>> sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
>> sparkConf.set("spark.executor.memory", "4g")
>> sparkConf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer")
>> sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops
>> -XX:+UseConcMarkSweepGC " +
>>   "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ")
>> if (sparkConf.getOption("spark.master") == None) {
>>   sparkConf.setMaster("local[*]")
>> }
>> »
>>
>> The jar I'm including there only contains the application classes.
>>
>>
>> Here is the log of the application: http://pastebin.com/7RSktezA
>>
>> Here is the userlog on hadoop/YARN:
>> «
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/Logging
>> at java.lang.ClassLoader.defineClass1(Native Method)
>> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>> at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at
>> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
>> at
>> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
>> Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 14 more
>> »
>>
>> I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
>> persists. Am I doing something wrong?
>>
>> Thanks.
>>
>
>


Re: How to run spark streaming application on YARN?

2015-06-04 Thread Sandy Ryza
spark-submit is the recommended way of launching Spark applications on
YARN, because it takes care of submitting the right jars as well as setting
up the classpath and environment variables appropriately.

-Sandy

On Thu, Jun 4, 2015 at 10:30 AM, Saiph Kappa  wrote:

> No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought it
> was the same thing since I am passing all the configurations through the
> application code. Is that the problem?
>
> On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza 
> wrote:
>
>> Hi Saiph,
>>
>> Are you launching using spark-submit?
>>
>> -Sandy
>>
>> On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa 
>> wrote:
>>
>>> Hi,
>>>
>>> I've been running my spark streaming application in standalone mode
>>> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
>>> but I am having some problems.
>>>
>>> Here are the config parameters of my application:
>>> «
>>> val sparkConf = new SparkConf()
>>>
>>> sparkConf.setMaster("yarn-client")
>>> sparkConf.set("spark.yarn.am.memory", "2g")
>>> sparkConf.set("spark.executor.instances", "2")
>>>
>>> sparkConf.setAppName("Benchmark")
>>>
>>> sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
>>> sparkConf.set("spark.executor.memory", "4g")
>>> sparkConf.set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer")
>>> sparkConf.set("spark.executor.extraJavaOptions", "
>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC " +
>>>   "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300
>>> ")
>>> if (sparkConf.getOption("spark.master") == None) {
>>>   sparkConf.setMaster("local[*]")
>>> }
>>> »
>>>
>>> The jar I'm including there only contains the application classes.
>>>
>>>
>>> Here is the log of the application: http://pastebin.com/7RSktezA
>>>
>>> Here is the userlog on hadoop/YARN:
>>> «
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/Logging
>>> at java.lang.ClassLoader.defineClass1(Native Method)
>>> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>>> at
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>>> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at
>>> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
>>> at
>>> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> ... 14 more
>>> »
>>>
>>> I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
>>> persists. Am I doing something wrong?
>>>
>>> Thanks.
>>>
>>
>>
>


Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
Thanks! It is working fine now with spark-submit. Just out of curiosity,
how would you use org.apache.spark.deploy.yarn.Client? Adding that
spark_yarn jar to the configuration inside the application?

On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov  wrote:

> You should run it with spark-submit or using org
> .apache.spark.deploy.yarn.Client.
>
> 2015-06-04 20:30 GMT+03:00 Saiph Kappa :
>
>> No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought it
>> was the same thing since I am passing all the configurations through the
>> application code. Is that the problem?
>>
>> On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza 
>> wrote:
>>
>>> Hi Saiph,
>>>
>>> Are you launching using spark-submit?
>>>
>>> -Sandy
>>>
>>> On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa 
>>> wrote:
>>>
 Hi,

 I've been running my spark streaming application in standalone mode
 without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
 but I am having some problems.

 Here are the config parameters of my application:
 «
 val sparkConf = new SparkConf()

 sparkConf.setMaster("yarn-client")
 sparkConf.set("spark.yarn.am.memory", "2g")
 sparkConf.set("spark.executor.instances", "2")

 sparkConf.setAppName("Benchmark")

 sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
 sparkConf.set("spark.executor.memory", "4g")
 sparkConf.set("spark.serializer",
 "org.apache.spark.serializer.KryoSerializer")
 sparkConf.set("spark.executor.extraJavaOptions", "
 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC " +
   "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300
 ")
 if (sparkConf.getOption("spark.master") == None) {
   sparkConf.setMaster("local[*]")
 }
 »

 The jar I'm including there only contains the application classes.


 Here is the log of the application: http://pastebin.com/7RSktezA

 Here is the userlog on hadoop/YARN:
 «
 Exception in thread "main" java.lang.NoClassDefFoundError:
 org/apache/spark/Logging
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 14 more
 »

 I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
 persists. Am I doing something wrong?

 Thanks.

>>>
>>>
>>
>


Re: How to run spark streaming application on YARN?

2015-06-04 Thread Sandy Ryza
That might work, but there might also be other steps that are required.

-Sandy

On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa  wrote:

> Thanks! It is working fine now with spark-submit. Just out of curiosity,
> how would you use org.apache.spark.deploy.yarn.Client? Adding that
> spark_yarn jar to the configuration inside the application?
>
> On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov  wrote:
>
>> You should run it with spark-submit or using org
>> .apache.spark.deploy.yarn.Client.
>>
>> 2015-06-04 20:30 GMT+03:00 Saiph Kappa :
>>
>>> No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought
>>> it was the same thing since I am passing all the configurations through the
>>> application code. Is that the problem?
>>>
>>> On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza 
>>> wrote:
>>>
 Hi Saiph,

 Are you launching using spark-submit?

 -Sandy

 On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa 
 wrote:

> Hi,
>
> I've been running my spark streaming application in standalone mode
> without any worries. Now, I've been trying to run it on YARN (hadoop 
> 2.7.0)
> but I am having some problems.
>
> Here are the config parameters of my application:
> «
> val sparkConf = new SparkConf()
>
> sparkConf.setMaster("yarn-client")
> sparkConf.set("spark.yarn.am.memory", "2g")
> sparkConf.set("spark.executor.instances", "2")
>
> sparkConf.setAppName("Benchmark")
>
> sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
> sparkConf.set("spark.executor.memory", "4g")
> sparkConf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
> sparkConf.set("spark.executor.extraJavaOptions", "
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC " +
>   "-XX:+AggressiveOpts -XX:FreqInlineSize=300
> -XX:MaxInlineSize=300 ")
> if (sparkConf.getOption("spark.master") == None) {
>   sparkConf.setMaster("local[*]")
> }
> »
>
> The jar I'm including there only contains the application classes.
>
>
> Here is the log of the application: http://pastebin.com/7RSktezA
>
> Here is the userlog on hadoop/YARN:
> «
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/Logging
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 14 more
> »
>
> I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
> persists. Am I doing something wrong?
>
> Thanks.
>


>>>
>>
>


Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
Additionally, I think this document (
https://spark.apache.org/docs/latest/building-spark.html ) should mention
that the protobuf.version might need to be changed to match the one used in
the chosen hadoop version. For instance, with hadoop 2.7.0 I had to change
protobuf.version to 1.5.0 to be able to run my application.

On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza  wrote:

> That might work, but there might also be other steps that are required.
>
> -Sandy
>
> On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa 
> wrote:
>
>> Thanks! It is working fine now with spark-submit. Just out of curiosity,
>> how would you use org.apache.spark.deploy.yarn.Client? Adding that
>> spark_yarn jar to the configuration inside the application?
>>
>> On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov  wrote:
>>
>>> You should run it with spark-submit or using org
>>> .apache.spark.deploy.yarn.Client.
>>>
>>> 2015-06-04 20:30 GMT+03:00 Saiph Kappa :
>>>
 No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought
 it was the same thing since I am passing all the configurations through the
 application code. Is that the problem?

 On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza 
 wrote:

> Hi Saiph,
>
> Are you launching using spark-submit?
>
> -Sandy
>
> On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa 
> wrote:
>
>> Hi,
>>
>> I've been running my spark streaming application in standalone mode
>> without any worries. Now, I've been trying to run it on YARN (hadoop 
>> 2.7.0)
>> but I am having some problems.
>>
>> Here are the config parameters of my application:
>> «
>> val sparkConf = new SparkConf()
>>
>> sparkConf.setMaster("yarn-client")
>> sparkConf.set("spark.yarn.am.memory", "2g")
>> sparkConf.set("spark.executor.instances", "2")
>>
>> sparkConf.setAppName("Benchmark")
>>
>> sparkConf.setJars(Array("target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar"))
>> sparkConf.set("spark.executor.memory", "4g")
>> sparkConf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer")
>> sparkConf.set("spark.executor.extraJavaOptions", "
>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC " +
>>   "-XX:+AggressiveOpts -XX:FreqInlineSize=300
>> -XX:MaxInlineSize=300 ")
>> if (sparkConf.getOption("spark.master") == None) {
>>   sparkConf.setMaster("local[*]")
>> }
>> »
>>
>> The jar I'm including there only contains the application classes.
>>
>>
>> Here is the log of the application: http://pastebin.com/7RSktezA
>>
>> Here is the userlog on hadoop/YARN:
>> «
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/Logging
>> at java.lang.ClassLoader.defineClass1(Native Method)
>> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>> at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at
>> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
>> at
>> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
>> Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 14 more
>> »
>>
>> I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
>> persists. Am I doing something wrong?
>>
>> Thanks.
>>
>
>

>>>
>>
>


Re: How to run spark streaming application on YARN?

2015-06-05 Thread Saiph Kappa
I was able to run my application by just using an hadoop/YARN cluster with
1 machine. Today I tried to extend the cluster to use one more machine, but
I got some problems on the yarn node manager of that new added machine:

Node Manager Log:
«
2015-06-06 01:41:33,379 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Initializing user myuser
2015-06-06 01:41:33,382 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
from
/tmp/hadoop-myuser/nm-local-dir/nmPrivate/container_1433549642381_0004_01_03.tokens
to
/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1433549642381_0004/container_1433549642381_0004_01_03.tokens
2015-06-06 01:41:33,382 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Localizer CWD set to
/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1433549642381_0004
=
file:/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1433549642381_0004
2015-06-06 01:41:33,405 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
{
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar,
1433441011000, FILE, null } failed: Resource
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar
changed on src filesystem (expected 1433441011000, was 1433531913000
java.io.IOException: Resource
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar
changed on src filesystem (expected 1433441011000, was 1433531913000
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:255)
at
org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2015-06-06 01:41:33,405 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar(->/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/filecache/15/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar)
transitioned from DOWNLOADING to FAILED
2015-06-06 01:41:33,406 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1433549642381_0004_01_03 transitioned from
LOCALIZING to LOCALIZATION_FAILED
2015-06-06 01:41:33,406 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Container container_1433549642381_0004_01_03 sent RELEASE event on a
resource request {
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar,
1433441011000, FILE, null } not present in cache.
2015-06-06 01:41:33,406 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send rpc request to server
»

I have this jar on both machines:
/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar

However, I simply copied my-spark folder from machine1 to machine2, so that
YARN could find the jar

Any ideas of what can be wrong? Isn't this the correct way to share spark
jars across YARN cluster?

Thanks.

On Thu, Jun 4, 2015 at 7:20 PM, Saiph Kappa  wrote:

> Additionally, I think this document (
> https://spark.apache.org/docs/latest/building-spark.html ) should mention
> that the protobuf.version might need to be changed to match the one used in
> the chosen hadoop version. For instance, with hadoop 2.7.0 I had to change
> protobuf.version to 1.5.0 to be able to run my application.
>
> On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza 
> wrote:
>
>> That might work, but there might also be other steps that are required.
>>
>> -Sandy
>>
>> On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa 
>> wrote:
>>
>>> Thanks! It is working fine now with spark-submit. Just out of curiosity,
>>> how would you use org.apache.spark.deploy.yarn.Client? Adding that
>>> spark_yarn jar to the configuration inside the a