Re: Getting started : Spark on YARN issue
Hi Andrew Thanks Andrew for your suggestion. I updated the hdfs-site on server side and also on client side to use hostname instead of IP as mentioned here = http://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-running-in-ec2-using-public-ip-addresses/ . Now, I could see that the client is able to talk to the datanode. Also, I will consider submitting application from within ec2 itself so that private IP is resolvable. Thanks Praveen On Fri, Jun 20, 2014 at 2:35 AM, Andrew Or and...@databricks.com wrote: (Also, an easier workaround is to simply submit the application from within your cluster, thus saving you all the manual labor of reconfiguring everything to use public hostnames. This may or may not be applicable to your use case.) 2014-06-19 14:04 GMT-07:00 Andrew Or and...@databricks.com: Hi Praveen, Yes, the fact that it is trying to use a private IP from outside of the cluster is suspicious. My guess is that your HDFS is configured to use internal IPs rather than external IPs. This means even though the hadoop confs on your local machine only use external IPs, the org.apache.spark.deploy.yarn.Client that is running on your local machine is trying to use whatever address your HDFS name node tells it to use, which is private in this case. A potential fix is to update your hdfs-site.xml (and other related configs) within your cluster to use public hostnames. Let me know if that does the job. Andrew 2014-06-19 6:04 GMT-07:00 Praveen Seluka psel...@qubole.com: I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster mode). From my *local machine. *I have setup HADOOP_CONF_DIR environment variable correctly. ➜ spark git:(master) ✗ /bin/bash -c ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 examples/target/scala-2.10/spark-examples_*.jar 10 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 1 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default, queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 12288 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 14/06/19 14:59:43 INFO yarn.Client: Uploading file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar to hdfs:// ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ 10.180.150.66:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode 10.180.150.66:50010 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception Its able to talk to Resource Manager Then it puts the example.jar file to HDFS and it fails. Its trying to write to datanode. I verified that 50010 port is accessible through local machine. Any idea whats the issue here ? One thing thats suspicious is */10.180.150.66:50010 http://10.180.150.66:50010 - it looks like its trying to connect using private IP. If so, how can I resolve this to use public IP.* Thanks Praveen
Re: Getting started : Spark on YARN issue
Hi Praveen, Yes, the fact that it is trying to use a private IP from outside of the cluster is suspicious. My guess is that your HDFS is configured to use internal IPs rather than external IPs. This means even though the hadoop confs on your local machine only use external IPs, the org.apache.spark.deploy.yarn.Client that is running on your local machine is trying to use whatever address your HDFS name node tells it to use, which is private in this case. A potential fix is to update your hdfs-site.xml (and other related configs) within your cluster to use public hostnames. Let me know if that does the job. Andrew 2014-06-19 6:04 GMT-07:00 Praveen Seluka psel...@qubole.com: I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster mode). From my *local machine. *I have setup HADOOP_CONF_DIR environment variable correctly. ➜ spark git:(master) ✗ /bin/bash -c ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 examples/target/scala-2.10/spark-examples_*.jar 10 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 1 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default, queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 12288 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 14/06/19 14:59:43 INFO yarn.Client: Uploading file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar to hdfs:// ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ 10.180.150.66:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode 10.180.150.66:50010 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception Its able to talk to Resource Manager Then it puts the example.jar file to HDFS and it fails. Its trying to write to datanode. I verified that 50010 port is accessible through local machine. Any idea whats the issue here ? One thing thats suspicious is */10.180.150.66:50010 http://10.180.150.66:50010 - it looks like its trying to connect using private IP. If so, how can I resolve this to use public IP.* Thanks Praveen
Re: Getting started : Spark on YARN issue
(Also, an easier workaround is to simply submit the application from within your cluster, thus saving you all the manual labor of reconfiguring everything to use public hostnames. This may or may not be applicable to your use case.) 2014-06-19 14:04 GMT-07:00 Andrew Or and...@databricks.com: Hi Praveen, Yes, the fact that it is trying to use a private IP from outside of the cluster is suspicious. My guess is that your HDFS is configured to use internal IPs rather than external IPs. This means even though the hadoop confs on your local machine only use external IPs, the org.apache.spark.deploy.yarn.Client that is running on your local machine is trying to use whatever address your HDFS name node tells it to use, which is private in this case. A potential fix is to update your hdfs-site.xml (and other related configs) within your cluster to use public hostnames. Let me know if that does the job. Andrew 2014-06-19 6:04 GMT-07:00 Praveen Seluka psel...@qubole.com: I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster mode). From my *local machine. *I have setup HADOOP_CONF_DIR environment variable correctly. ➜ spark git:(master) ✗ /bin/bash -c ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 examples/target/scala-2.10/spark-examples_*.jar 10 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 1 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default, queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 12288 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 14/06/19 14:59:43 INFO yarn.Client: Uploading file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar to hdfs:// ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ 10.180.150.66:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode 10.180.150.66:50010 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception Its able to talk to Resource Manager Then it puts the example.jar file to HDFS and it fails. Its trying to write to datanode. I verified that 50010 port is accessible through local machine. Any idea whats the issue here ? One thing thats suspicious is */10.180.150.66:50010 http://10.180.150.66:50010 - it looks like its trying to connect using private IP. If so, how can I resolve this to use public IP.* Thanks Praveen