Re: Streaming of WordCount example
I am running as a yarn-client which probably means that the program that submitted the job is where the listening is also occurring? I thought that the yarn is only used to negotiate resources in yarn-client master mode. On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote: If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: Streaming of WordCount example
If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: Streaming of WordCount example
In yarn-client mode, the driver is on the machine where you ran the spark-submit. The executors are running in the YARN cluster nodes, and the socket receiver listening on port is running in one of the executors. On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running as a yarn-client which probably means that the program that submitted the job is where the listening is also occurring? I thought that the yarn is only used to negotiate resources in yarn-client master mode. On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote: If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: Streaming of WordCount example
Is it receiving any data? If so, then it must be listening. Alternatively, to test these theories, you can locally running a spark standalone cluster (one node standalone cluster in local machine), and submit your app in client mode on that to see whether you are seeing the process listening on or not. On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I've verified all the executors and I don't see a process listening on the port. However, the application seem to show as running in the yarn UI On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com wrote: In yarn-client mode, the driver is on the machine where you ran the spark-submit. The executors are running in the YARN cluster nodes, and the socket receiver listening on port is running in one of the executors. On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running as a yarn-client which probably means that the program that submitted the job is where the listening is also occurring? I thought that the yarn is only used to negotiate resources in yarn-client master mode. On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote: If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: Streaming of WordCount example
I am using the same exact code: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Submitting like this: yarn: /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class org.sony.spark.stream.test.JavaRecoverableNetworkWordCount --master yarn-client --total-executor-cores 3 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar localhost /user/ec2-user/checkpoint/ /user/ec2-user/out local: /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class org.sony.spark.stream.test.JavaRecoverableNetworkWordCount --master spark://localhost:9966 --total-executor-cores 3 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar localhost /user/ec2-user/checkpoint/ /user/ec2-user/out Even though I am running as local I see it being scheduled and managed by yarn. On Mon, Aug 10, 2015 at 12:56 PM, Tathagata Das t...@databricks.com wrote: Is it receiving any data? If so, then it must be listening. Alternatively, to test these theories, you can locally running a spark standalone cluster (one node standalone cluster in local machine), and submit your app in client mode on that to see whether you are seeing the process listening on or not. On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I've verified all the executors and I don't see a process listening on the port. However, the application seem to show as running in the yarn UI On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com wrote: In yarn-client mode, the driver is on the machine where you ran the spark-submit. The executors are running in the YARN cluster nodes, and the socket receiver listening on port is running in one of the executors. On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running as a yarn-client which probably means that the program that submitted the job is where the listening is also occurring? I thought that the yarn is only used to negotiate resources in yarn-client master mode. On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote: If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: Streaming of WordCount example
I do see this message: 15/08/10 19:19:12 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources On Mon, Aug 10, 2015 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am using the same exact code: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Submitting like this: yarn: /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class org.sony.spark.stream.test.JavaRecoverableNetworkWordCount --master yarn-client --total-executor-cores 3 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar localhost /user/ec2-user/checkpoint/ /user/ec2-user/out local: /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class org.sony.spark.stream.test.JavaRecoverableNetworkWordCount --master spark://localhost:9966 --total-executor-cores 3 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar localhost /user/ec2-user/checkpoint/ /user/ec2-user/out Even though I am running as local I see it being scheduled and managed by yarn. On Mon, Aug 10, 2015 at 12:56 PM, Tathagata Das t...@databricks.com wrote: Is it receiving any data? If so, then it must be listening. Alternatively, to test these theories, you can locally running a spark standalone cluster (one node standalone cluster in local machine), and submit your app in client mode on that to see whether you are seeing the process listening on or not. On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I've verified all the executors and I don't see a process listening on the port. However, the application seem to show as running in the yarn UI On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com wrote: In yarn-client mode, the driver is on the machine where you ran the spark-submit. The executors are running in the YARN cluster nodes, and the socket receiver listening on port is running in one of the executors. On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running as a yarn-client which probably means that the program that submitted the job is where the listening is also occurring? I thought that the yarn is only used to negotiate resources in yarn-client master mode. On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote: If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: Streaming of WordCount example
1. When you are running locally, make sure the master in the SparkConf reflects that and is not somehow set to yarn-client 2. You may not be getting any resources from YARN at all, so no executors, so no receiver running. That is why I asked the most basic question - Is it receiving data? That will eliminate a lot of uncertainities if it is indeed receiving data. TD On Mon, Aug 10, 2015 at 4:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I do see this message: 15/08/10 19:19:12 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources On Mon, Aug 10, 2015 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am using the same exact code: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Submitting like this: yarn: /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class org.sony.spark.stream.test.JavaRecoverableNetworkWordCount --master yarn-client --total-executor-cores 3 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar localhost /user/ec2-user/checkpoint/ /user/ec2-user/out local: /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class org.sony.spark.stream.test.JavaRecoverableNetworkWordCount --master spark://localhost:9966 --total-executor-cores 3 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar localhost /user/ec2-user/checkpoint/ /user/ec2-user/out Even though I am running as local I see it being scheduled and managed by yarn. On Mon, Aug 10, 2015 at 12:56 PM, Tathagata Das t...@databricks.com wrote: Is it receiving any data? If so, then it must be listening. Alternatively, to test these theories, you can locally running a spark standalone cluster (one node standalone cluster in local machine), and submit your app in client mode on that to see whether you are seeing the process listening on or not. On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I've verified all the executors and I don't see a process listening on the port. However, the application seem to show as running in the yarn UI On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com wrote: In yarn-client mode, the driver is on the machine where you ran the spark-submit. The executors are running in the YARN cluster nodes, and the socket receiver listening on port is running in one of the executors. On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running as a yarn-client which probably means that the program that submitted the job is where the listening is also occurring? I thought that the yarn is only used to negotiate resources in yarn-client master mode. On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote: If you are running on a cluster, the listening is occurring on one of the executors, not in the driver. On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run this program as a yarn-client. The job seems to be submitting successfully however I don't see any process listening on this host on port https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at JavaRecoverableNetworkWordCount.java:112 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10 13:27:3651 s0/2 0/2 0start at JavaRecoverableNetworkWordCount.java:152 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10 13:27:3551 s0/2 0/70
Re: WordCount example
There are no workers registered with the Spark Standalone master! That is the crux of the problem. :) Follow the instructions properly - https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts Especially make the conf/slaves file has intended workers listed. TD On Mon, Apr 6, 2015 at 9:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Interesting, I see 0 cores in the UI? - *Cores:* 0 Total, 0 Used On Fri, Apr 3, 2015 at 2:55 PM, Tathagata Das t...@databricks.com wrote: What does the Spark Standalone UI at port 8080 say about number of cores? On Fri, Apr 3, 2015 at 2:53 PM, Mohit Anchlia mohitanch...@gmail.com wrote: [ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com wrote: How many cores are present in the works allocated to the standalone cluster spark://ip-10-241-251-232:7077 ? On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this seems to work. I don't understand why though because when I give spark://ip-10-241-251-232:7077 application seem to bootstrap successfully, just doesn't create a socket on port ? On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.*seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream( localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register
Re: WordCount example
Interesting, I see 0 cores in the UI? - *Cores:* 0 Total, 0 Used On Fri, Apr 3, 2015 at 2:55 PM, Tathagata Das t...@databricks.com wrote: What does the Spark Standalone UI at port 8080 say about number of cores? On Fri, Apr 3, 2015 at 2:53 PM, Mohit Anchlia mohitanch...@gmail.com wrote: [ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com wrote: How many cores are present in the works allocated to the standalone cluster spark://ip-10-241-251-232:7077 ? On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this seems to work. I don't understand why though because when I give spark://ip-10-241-251-232:7077 application seem to bootstrap successfully, just doesn't create a socket on port ? On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.*seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream( localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend:
Re: WordCount example
If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this seems to work. I don't understand why though because when I give spark://ip-10-241-251-232:7077 application seem to bootstrap successfully, just doesn't create a socket on port ? On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms 15/03/27 13:50:48 INFO SocketInputDStream: Storage level = StorageLevel(false, false, false, false, 1) 15/03/27 13:50:48 INFO SocketInputDStream: Checkpoint interval = null 15/03/27 13:50:48 INFO SocketInputDStream: Remember duration = 1000 ms 15/03/27 13:50:48
Re: WordCount example
What does the Spark Standalone UI at port 8080 say about number of cores? On Fri, Apr 3, 2015 at 2:53 PM, Mohit Anchlia mohitanch...@gmail.com wrote: [ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com wrote: How many cores are present in the works allocated to the standalone cluster spark://ip-10-241-251-232:7077 ? On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this seems to work. I don't understand why though because when I give spark://ip-10-241-251-232:7077 application seem to bootstrap successfully, just doesn't create a socket on port ? On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.*seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream( localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
Re: WordCount example
[ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com wrote: How many cores are present in the works allocated to the standalone cluster spark://ip-10-241-251-232:7077 ? On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this seems to work. I don't understand why though because when I give spark://ip-10-241-251-232:7077 application seem to bootstrap successfully, just doesn't create a socket on port ? On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.*seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream( localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO
Re: WordCount example
How many cores are present in the works allocated to the standalone cluster spark://ip-10-241-251-232:7077 ? On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this seems to work. I don't understand why though because when I give spark://ip-10-241-251-232:7077 application seem to bootstrap successfully, just doesn't create a socket on port ? On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream( localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms 15/03/27 13:50:48 INFO SocketInputDStream: Storage level =
Re: WordCount example
I tried to file a bug in git repo however I don't see a link to open issues On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002 NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal :60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms 15/03/27 13:50:48 INFO SocketInputDStream: Storage level = StorageLevel(false, false, false, false, 1) 15/03/27 13:50:48 INFO SocketInputDStream: Checkpoint interval = null 15/03/27 13:50:48 INFO SocketInputDStream: Remember duration = 1000 ms 15/03/27 13:50:48 INFO SocketInputDStream: Initialized and validated org.apache.spark.streaming.dstream.SocketInputDStream@75efa13d 15/03/27 13:50:48 INFO FlatMappedDStream: Slide time = 1000 ms
Re: WordCount example
I checked the ports using netstat and don't see any connections established on that port. Logs show only this: 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID app-20150327135048-0002 Spark ui shows: Running Applications IDNameCoresMemory per NodeSubmitted TimeUserStateDuration app-20150327135048-0002 http://54.69.225.94:8080/app?appId=app-20150327135048-0002NetworkWordCount http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27 13:50:48ec2-userWAITING33 s Code looks like is being executed: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 *public* *static* *void* doWork(String masterUrl){ SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost, ); System.*out*.println(Successfully created connection); *mapAndReduce*(lines); jssc.start(); // Start the computation jssc.awaitTermination(); // Wait for the computation to terminate } *public* *static* *void* main(String ...args){ *doWork*(args[0]); } And output of the java program after submitting the task: java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started 15/03/27 13:50:46 INFO Remoting: Starting remoting 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal:60184] 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on port 60184. 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150327135047-5399 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file server' on port 57955. 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at http://ip-10-241-251-232.us-west-2.compute.internal:4040 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master spark://ip-10-241-251-232:7077... 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150327135048-0002 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM, BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal, 58358) 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms 15/03/27 13:50:48 INFO SocketInputDStream: Storage level = StorageLevel(false, false, false, false, 1) 15/03/27 13:50:48 INFO SocketInputDStream: Checkpoint interval = null 15/03/27 13:50:48 INFO SocketInputDStream: Remember duration = 1000 ms 15/03/27 13:50:48 INFO SocketInputDStream: Initialized and validated org.apache.spark.streaming.dstream.SocketInputDStream@75efa13d 15/03/27 13:50:48 INFO FlatMappedDStream: Slide time = 1000 ms 15/03/27 13:50:48 INFO FlatMappedDStream: Storage level = StorageLevel(false, false, false, false, 1) 15/03/27 13:50:48 INFO FlatMappedDStream: Checkpoint interval = null 15/03/27 13:50:48 INFO FlatMappedDStream: Remember duration = 1000 ms 15/03/27
Re: WordCount example
What's the best way to troubleshoot inside spark to see why Spark is not connecting to nc on port ? I don't see any errors either. On Thu, Mar 26, 2015 at 2:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run the word count example but for some reason it's not working as expected. I start nc server on port and then submit the spark job to the cluster. Spark job gets successfully submitting but I never see any connection from spark getting established. I also tried to type words on the console where nc is listening and waiting on the prompt, however I don't see any output. I also don't see any errors. Here is the conf: SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost, );
WordCount example
I am trying to run the word count example but for some reason it's not working as expected. I start nc server on port and then submit the spark job to the cluster. Spark job gets successfully submitting but I never see any connection from spark getting established. I also tried to type words on the console where nc is listening and waiting on the prompt, however I don't see any output. I also don't see any errors. Here is the conf: SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost, );
Re: WordCount example
Hi, Did you run the word count example in Spark local mode or other mode, in local mode you have to set Local[n], where n =2. For other mode, make sure available cores larger than 1. Because the receiver inside Spark Streaming wraps as a long-running task, which will at least occupy one core. Besides using lsof -p pid or netstat to make sure Spark executor backend is connected to the nc process. Also grep the executor's log to see if there's log like Connecting to host port and Connected to host port which shows that receiver is correctly connected to nc process. Thanks Jerry 2015-03-27 8:45 GMT+08:00 Mohit Anchlia mohitanch...@gmail.com: What's the best way to troubleshoot inside spark to see why Spark is not connecting to nc on port ? I don't see any errors either. On Thu, Mar 26, 2015 at 2:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run the word count example but for some reason it's not working as expected. I start nc server on port and then submit the spark job to the cluster. Spark job gets successfully submitting but I never see any connection from spark getting established. I also tried to type words on the console where nc is listening and waiting on the prompt, however I don't see any output. I also don't see any errors. Here is the conf: SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName( NetworkWordCount); JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations. *seconds*(1)); JavaReceiverInputDStreamString lines = jssc.socketTextStream( localhost, );
Re: network wordcount example
Not sure what data you are sending in. You could try calling lines.print() instead which should just output everything that comes in on the stream. Just to test that your socket is receiving what you think you are sending. On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote: Hello i just started working with spark today... and i am trying to run the wordcount network example i created a socket server and client.. and i am sending data to the server in an infinite loop when i run the spark class.. i see this output in the console... --- Time: 1396281891000 ms --- 14/03/31 11:04:51 INFO SparkContext: Job finished: take at DStream.scala:586, took 0.056794606 s 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job 1396281891000 ms.0 from job set of time 1396281891000 ms 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time 1396281891000 ms (execution: 0.058 s) 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool but i dont see any output from the workcount operation when i make this call... wordCounts.print(); any help is greatly appreciated thanks in advance
Re: network wordcount example
@eric- i saw this exact issue recently while working on the KinesisWordCount. are you passing local[2] to your example as the MASTER arg versus just local or local[1]? you need at least 2. it's documented as n1 in the scala source docs - which is easy to mistake for n=1. i just ran the NetworkWordCount sample and confirmed that local[1] does not work, but local[2] does work. give that a whirl. -chris On Mon, Mar 31, 2014 at 10:41 AM, Diana Carroll dcarr...@cloudera.comwrote: Not sure what data you are sending in. You could try calling lines.print() instead which should just output everything that comes in on the stream. Just to test that your socket is receiving what you think you are sending. On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote: Hello i just started working with spark today... and i am trying to run the wordcount network example i created a socket server and client.. and i am sending data to the server in an infinite loop when i run the spark class.. i see this output in the console... --- Time: 1396281891000 ms --- 14/03/31 11:04:51 INFO SparkContext: Job finished: take at DStream.scala:586, took 0.056794606 s 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job 1396281891000 ms.0 from job set of time 1396281891000 ms 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time 1396281891000 ms (execution: 0.058 s) 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool but i dont see any output from the workcount operation when i make this call... wordCounts.print(); any help is greatly appreciated thanks in advance