Re: Spark Streaming failing on YARN Cluster

2015-08-25 Thread Ramkumar V
yes , when i see my yarn logs for that particular failed app_id, i got the
following error.

ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting
for 10 ms. Please check earlier log output for errors. Failing the
application

For this error, I need to change the 'SparkContext', set the Master on yarn
cluster ( SetMaster(yarn-cluster) ). Its working fine in cluster mode.
Thanks for everyone.

*Thanks*,
https://in.linkedin.com/in/ramkumarcs31


On Fri, Aug 21, 2015 at 6:41 AM, Jeff Zhang zjf...@gmail.com wrote:

 AM fails to launch, could you check the yarn app logs ? You can use
 command yarn logs -your_app_id to get the yarn app logs.



 On Thu, Aug 20, 2015 at 1:15 AM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 I'm getting some spark exception. Please look this log trace ( 
 *http://pastebin.com/xL9jaRUa
 http://pastebin.com/xL9jaRUa* ).

 *Thanks*,
 https://in.linkedin.com/in/ramkumarcs31


 On Wed, Aug 19, 2015 at 10:20 PM, Hari Shreedharan 
 hshreedha...@cloudera.com wrote:

 It looks like you are having issues with the files getting distributed
 to the cluster. What is the exception you are getting now?


 On Wednesday, August 19, 2015, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 Thanks a lot for your suggestion. I had modified HADOOP_CONF_DIR in
 spark-env.sh so that core-site.xml is under HADOOP_CONF_DIR. i can
 able to see the logs like that you had shown above. Now i can able to run
 for 3 minutes and store results between every minutes. After sometimes,
 there is an exception. How to fix this exception ? and Can you please
 explain where its going wrong ?

 *Log Link : http://pastebin.com/xL9jaRUa
 http://pastebin.com/xL9jaRUa *


 *Thanks*,
 https://in.linkedin.com/in/ramkumarcs31


 On Wed, Aug 19, 2015 at 1:54 PM, Jeff Zhang zjf...@gmail.com wrote:

 HADOOP_CONF_DIR is the environment variable point to the hadoop conf
 directory.  Not sure how CDH organize that, make sure core-site.xml is
 under HADOOP_CONF_DIR.

 On Wed, Aug 19, 2015 at 4:06 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 We are using Cloudera-5.3.1. since it is one of the earlier version
 of CDH, it doesnt supports the latest version of spark. So i installed
 spark-1.4.1 separately in my machine. I couldnt able to do spark-submit 
 in
 cluster mode. How to core-site.xml under classpath ? it will be very
 helpful if you could explain in detail to solve this issue.

 *Thanks*,
 https://in.linkedin.com/in/ramkumarcs31


 On Fri, Aug 14, 2015 at 8:25 AM, Jeff Zhang zjf...@gmail.com wrote:


1. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying

 file:/home/hdfs/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.5.jar
2. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying

 file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar
3. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip
4. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip
5. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py
6.


1. diagnostics: Application application_1437639737006_3808
failed 2 times due to AM Container for 
 appattempt_1437639737006_3808_02
exited with  exitCode: -1000 due to: File
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip does not exist
2. .Failing this attempt.. Failing the application.



 The machine you run spark is the client machine, while the yarn AM
 is running on another machine. And the yarn AM complains that the files 
 are
 not found as your logs shown.
 From the logs, its seems that these files are not copied to the HDFS
 as local resources. I doubt that you didn't put core-site.xml under your
 classpath, so that spark can not detect your remote file system and 
 won't
 copy the files to hdfs as local resources. Usually in yarn-cluster mode,
 you should be able to see the logs like following.

  15/08/14 10:48:49 INFO yarn.Client: Preparing resources for our AM
 container
  15/08/14 10:48:49 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
 - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/spark.py - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark.py
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/python/lib/pyspark.zip - hdfs://

Re: Spark Streaming failing on YARN Cluster

2015-08-19 Thread Ramkumar V
I'm getting some spark exception. Please look this log trace (
*http://pastebin.com/xL9jaRUa
http://pastebin.com/xL9jaRUa* ).

*Thanks*,
https://in.linkedin.com/in/ramkumarcs31


On Wed, Aug 19, 2015 at 10:20 PM, Hari Shreedharan 
hshreedha...@cloudera.com wrote:

 It looks like you are having issues with the files getting distributed to
 the cluster. What is the exception you are getting now?


 On Wednesday, August 19, 2015, Ramkumar V ramkumar.c...@gmail.com wrote:

 Thanks a lot for your suggestion. I had modified HADOOP_CONF_DIR in
 spark-env.sh so that core-site.xml is under HADOOP_CONF_DIR. i can able
 to see the logs like that you had shown above. Now i can able to run for 3
 minutes and store results between every minutes. After sometimes, there is
 an exception. How to fix this exception ? and Can you please explain where
 its going wrong ?

 *Log Link : http://pastebin.com/xL9jaRUa http://pastebin.com/xL9jaRUa *


 *Thanks*,
 https://in.linkedin.com/in/ramkumarcs31


 On Wed, Aug 19, 2015 at 1:54 PM, Jeff Zhang zjf...@gmail.com wrote:

 HADOOP_CONF_DIR is the environment variable point to the hadoop conf
 directory.  Not sure how CDH organize that, make sure core-site.xml is
 under HADOOP_CONF_DIR.

 On Wed, Aug 19, 2015 at 4:06 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 We are using Cloudera-5.3.1. since it is one of the earlier version of
 CDH, it doesnt supports the latest version of spark. So i installed
 spark-1.4.1 separately in my machine. I couldnt able to do spark-submit in
 cluster mode. How to core-site.xml under classpath ? it will be very
 helpful if you could explain in detail to solve this issue.

 *Thanks*,
 https://in.linkedin.com/in/ramkumarcs31


 On Fri, Aug 14, 2015 at 8:25 AM, Jeff Zhang zjf...@gmail.com wrote:


1. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying

 file:/home/hdfs/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.5.jar
2. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying

 file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar
3. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip
4. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip
5. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py
6.


1. diagnostics: Application application_1437639737006_3808 failed
2 times due to AM Container for appattempt_1437639737006_3808_02 
 exited
with  exitCode: -1000 due to: File
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip does not exist
2. .Failing this attempt.. Failing the application.



 The machine you run spark is the client machine, while the yarn AM is
 running on another machine. And the yarn AM complains that the files are
 not found as your logs shown.
 From the logs, its seems that these files are not copied to the HDFS
 as local resources. I doubt that you didn't put core-site.xml under your
 classpath, so that spark can not detect your remote file system and won't
 copy the files to hdfs as local resources. Usually in yarn-cluster mode,
 you should be able to see the logs like following.

  15/08/14 10:48:49 INFO yarn.Client: Preparing resources for our AM
 container
  15/08/14 10:48:49 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
 - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/spark.py - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark.py
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/python/lib/pyspark.zip - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/pyspark.zip

 On Thu, Aug 13, 2015 at 2:50 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 Hi,

 I have a cluster of 1 master and 2 slaves. I'm running a spark
 streaming in master and I want to utilize all nodes in my cluster. i had
 specified some parameters like driver memory and executor memory in my
 code. when i give --deploy-mode cluster --master yarn-cluster in my
 spark-submit, it gives the following error.

 Log link : *http://pastebin.com/kfyVWDGR
 http://pastebin.com/kfyVWDGR*

 How to fix this issue ? Please help me if i'm doing wrong.


 *Thanks*,
 Ramkumar V




 --
 Best Regards

 Jeff Zhang





 --
 Best Regards

 Jeff Zhang




 --

 Thanks,
 Hari




Re: Spark Streaming failing on YARN Cluster

2015-08-19 Thread Ramkumar V
Thanks a lot for your suggestion. I had modified HADOOP_CONF_DIR in
spark-env.sh so that core-site.xml is under HADOOP_CONF_DIR. i can able to
see the logs like that you had shown above. Now i can able to run for 3
minutes and store results between every minutes. After sometimes, there is
an exception. How to fix this exception ? and Can you please explain where
its going wrong ?

*Log Link : http://pastebin.com/xL9jaRUa http://pastebin.com/xL9jaRUa *


*Thanks*,
https://in.linkedin.com/in/ramkumarcs31


On Wed, Aug 19, 2015 at 1:54 PM, Jeff Zhang zjf...@gmail.com wrote:

 HADOOP_CONF_DIR is the environment variable point to the hadoop conf
 directory.  Not sure how CDH organize that, make sure core-site.xml is
 under HADOOP_CONF_DIR.

 On Wed, Aug 19, 2015 at 4:06 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 We are using Cloudera-5.3.1. since it is one of the earlier version of
 CDH, it doesnt supports the latest version of spark. So i installed
 spark-1.4.1 separately in my machine. I couldnt able to do spark-submit in
 cluster mode. How to core-site.xml under classpath ? it will be very
 helpful if you could explain in detail to solve this issue.

 *Thanks*,
 https://in.linkedin.com/in/ramkumarcs31


 On Fri, Aug 14, 2015 at 8:25 AM, Jeff Zhang zjf...@gmail.com wrote:


1. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying

 file:/home/hdfs/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.5.jar
2. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying

 file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar
3. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip
4. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip
5. 15/08/12 13:24:49 INFO Client: Source and destination file
systems are the same. Not copying
file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py
6.


1. diagnostics: Application application_1437639737006_3808 failed 2
times due to AM Container for appattempt_1437639737006_3808_02 exited
with  exitCode: -1000 due to: File
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip does not exist
2. .Failing this attempt.. Failing the application.



 The machine you run spark is the client machine, while the yarn AM is
 running on another machine. And the yarn AM complains that the files are
 not found as your logs shown.
 From the logs, its seems that these files are not copied to the HDFS as
 local resources. I doubt that you didn't put core-site.xml under your
 classpath, so that spark can not detect your remote file system and won't
 copy the files to hdfs as local resources. Usually in yarn-cluster mode,
 you should be able to see the logs like following.

  15/08/14 10:48:49 INFO yarn.Client: Preparing resources for our AM
 container
  15/08/14 10:48:49 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
 - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/spark.py - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark.py
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/python/lib/pyspark.zip - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/pyspark.zip

 On Thu, Aug 13, 2015 at 2:50 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 Hi,

 I have a cluster of 1 master and 2 slaves. I'm running a spark
 streaming in master and I want to utilize all nodes in my cluster. i had
 specified some parameters like driver memory and executor memory in my
 code. when i give --deploy-mode cluster --master yarn-cluster in my
 spark-submit, it gives the following error.

 Log link : *http://pastebin.com/kfyVWDGR
 http://pastebin.com/kfyVWDGR*

 How to fix this issue ? Please help me if i'm doing wrong.


 *Thanks*,
 Ramkumar V




 --
 Best Regards

 Jeff Zhang





 --
 Best Regards

 Jeff Zhang



Re: Spark Streaming failing on YARN Cluster

2015-08-19 Thread Ramkumar V
We are using Cloudera-5.3.1. since it is one of the earlier version of CDH,
it doesnt supports the latest version of spark. So i installed spark-1.4.1
separately in my machine. I couldnt able to do spark-submit in cluster
mode. How to core-site.xml under classpath ? it will be very helpful if you
could explain in detail to solve this issue.

*Thanks*,
https://in.linkedin.com/in/ramkumarcs31


On Fri, Aug 14, 2015 at 8:25 AM, Jeff Zhang zjf...@gmail.com wrote:


1. 15/08/12 13:24:49 INFO Client: Source and destination file systems
are the same. Not copying

 file:/home/hdfs/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.5.jar
2. 15/08/12 13:24:49 INFO Client: Source and destination file systems
are the same. Not copying

 file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar
3. 15/08/12 13:24:49 INFO Client: Source and destination file systems
are the same. Not copying 
 file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip
4. 15/08/12 13:24:49 INFO Client: Source and destination file systems
are the same. Not copying
file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip
5. 15/08/12 13:24:49 INFO Client: Source and destination file systems
are the same. Not copying
file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py
6.


1. diagnostics: Application application_1437639737006_3808 failed 2
times due to AM Container for appattempt_1437639737006_3808_02 exited
with  exitCode: -1000 due to: File
file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip does not exist
2. .Failing this attempt.. Failing the application.



 The machine you run spark is the client machine, while the yarn AM is
 running on another machine. And the yarn AM complains that the files are
 not found as your logs shown.
 From the logs, its seems that these files are not copied to the HDFS as
 local resources. I doubt that you didn't put core-site.xml under your
 classpath, so that spark can not detect your remote file system and won't
 copy the files to hdfs as local resources. Usually in yarn-cluster mode,
 you should be able to see the logs like following.

  15/08/14 10:48:49 INFO yarn.Client: Preparing resources for our AM
 container
  15/08/14 10:48:49 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
 - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/spark.py - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark.py
  15/08/14 10:48:50 INFO yarn.Client: Uploading resource
 file:/Users/abc/github/spark/python/lib/pyspark.zip - hdfs://
 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/pyspark.zip

 On Thu, Aug 13, 2015 at 2:50 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 Hi,

 I have a cluster of 1 master and 2 slaves. I'm running a spark streaming
 in master and I want to utilize all nodes in my cluster. i had specified
 some parameters like driver memory and executor memory in my code. when i
 give --deploy-mode cluster --master yarn-cluster in my spark-submit, it
 gives the following error.

 Log link : *http://pastebin.com/kfyVWDGR http://pastebin.com/kfyVWDGR*

 How to fix this issue ? Please help me if i'm doing wrong.


 *Thanks*,
 Ramkumar V




 --
 Best Regards

 Jeff Zhang



Spark Streaming failing on YARN Cluster

2015-08-13 Thread Ramkumar V
Hi,

I have a cluster of 1 master and 2 slaves. I'm running a spark streaming in
master and I want to utilize all nodes in my cluster. i had specified some
parameters like driver memory and executor memory in my code. when i
give --deploy-mode cluster --master yarn-cluster in my spark-submit, it
gives the following error.

Log link : *http://pastebin.com/kfyVWDGR http://pastebin.com/kfyVWDGR*

How to fix this issue ? Please help me if i'm doing wrong.


*Thanks*,
Ramkumar V


Re: Spark Streaming failing on YARN Cluster

2015-08-13 Thread Ramkumar V
Yes. this file is available in this path in the same machine where i'm
running the spark. later i moved spark-1.4.1 folder to all other machines
in my cluster but still i'm facing the same issue.


*Thanks*,
https://in.linkedin.com/in/ramkumarcs31


On Thu, Aug 13, 2015 at 1:17 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Just make sure this file is available:

 appattempt_1437639737006_3808_02 exited with  exitCode: -1000 due to:
 File *file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip* does not exist

 Thanks
 Best Regards

 On Thu, Aug 13, 2015 at 12:20 PM, Ramkumar V ramkumar.c...@gmail.com
 wrote:

 Hi,

 I have a cluster of 1 master and 2 slaves. I'm running a spark streaming
 in master and I want to utilize all nodes in my cluster. i had specified
 some parameters like driver memory and executor memory in my code. when i
 give --deploy-mode cluster --master yarn-cluster in my spark-submit, it
 gives the following error.

 Log link : *http://pastebin.com/kfyVWDGR http://pastebin.com/kfyVWDGR*

 How to fix this issue ? Please help me if i'm doing wrong.


 *Thanks*,
 Ramkumar V