Re: Streaming of WordCount example

2015-08-10 Thread Mohit Anchlia
I am running as a yarn-client which probably means that the program that
submitted the job is where the listening is also occurring? I thought that
the yarn is only used to negotiate resources in yarn-client master mode.

On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com wrote:

 If you are running on a cluster, the listening is occurring on one of the
 executors, not in the driver.

 On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I am trying to run this program as a yarn-client. The job seems to be
 submitting successfully however I don't see any process listening on this
 host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks
 (for all stages): Succeeded/Total1foreachRDD at
 JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70





Re: Streaming of WordCount example

2015-08-10 Thread Tathagata Das
If you are running on a cluster, the listening is occurring on one of the
executors, not in the driver.

On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I am trying to run this program as a yarn-client. The job seems to be
 submitting successfully however I don't see any process listening on this
 host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks
 (for all stages): Succeeded/Total1foreachRDD at
 JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70



Re: Streaming of WordCount example

2015-08-10 Thread Tathagata Das
In yarn-client mode, the driver is on the machine where you ran the
spark-submit. The executors are running in the YARN cluster nodes, and the
socket receiver listening on port  is running in one of the executors.

On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I am running as a yarn-client which probably means that the program that
 submitted the job is where the listening is also occurring? I thought that
 the yarn is only used to negotiate resources in yarn-client master mode.

 On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com
 wrote:

 If you are running on a cluster, the listening is occurring on one of the
 executors, not in the driver.

 On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I am trying to run this program as a yarn-client. The job seems to be
 submitting successfully however I don't see any process listening on this
 host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages: 
 Succeeded/TotalTasks
 (for all stages): Succeeded/Total1foreachRDD at
 JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70






Re: Streaming of WordCount example

2015-08-10 Thread Tathagata Das
Is it receiving any data? If so, then it must be listening.
Alternatively, to test these theories, you can locally running a spark
standalone cluster (one node standalone cluster in local machine), and
submit your app in client mode on that to see whether you are seeing the
process listening on  or not.

On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I've verified all the executors and I don't see a process listening on the
 port. However, the application seem to show as running in the yarn UI

 On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com
 wrote:

 In yarn-client mode, the driver is on the machine where you ran the
 spark-submit. The executors are running in the YARN cluster nodes, and the
 socket receiver listening on port  is running in one of the executors.

 On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I am running as a yarn-client which probably means that the program that
 submitted the job is where the listening is also occurring? I thought that
 the yarn is only used to negotiate resources in yarn-client master mode.

 On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com
 wrote:

 If you are running on a cluster, the listening is occurring on one of
 the executors, not in the driver.

 On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:

 I am trying to run this program as a yarn-client. The job seems to be
 submitting successfully however I don't see any process listening on this
 host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages:
 Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at
 JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70








Re: Streaming of WordCount example

2015-08-10 Thread Mohit Anchlia
I am using the same exact code:

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java

Submitting like this:

yarn:

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class
org.sony.spark.stream.test.JavaRecoverableNetworkWordCount  --master
yarn-client --total-executor-cores 3
./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  localhost
 /user/ec2-user/checkpoint/ /user/ec2-user/out

local:

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class
org.sony.spark.stream.test.JavaRecoverableNetworkWordCount  --master
spark://localhost:9966 --total-executor-cores 3
./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  localhost
 /user/ec2-user/checkpoint/ /user/ec2-user/out

Even though I am running as local I see it being scheduled and managed by
yarn.


On Mon, Aug 10, 2015 at 12:56 PM, Tathagata Das t...@databricks.com wrote:

 Is it receiving any data? If so, then it must be listening.
 Alternatively, to test these theories, you can locally running a spark
 standalone cluster (one node standalone cluster in local machine), and
 submit your app in client mode on that to see whether you are seeing the
 process listening on  or not.

 On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I've verified all the executors and I don't see a process listening on
 the port. However, the application seem to show as running in the yarn UI

 On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com
 wrote:

 In yarn-client mode, the driver is on the machine where you ran the
 spark-submit. The executors are running in the YARN cluster nodes, and the
 socket receiver listening on port  is running in one of the executors.

 On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I am running as a yarn-client which probably means that the program
 that submitted the job is where the listening is also occurring? I thought
 that the yarn is only used to negotiate resources in yarn-client master
 mode.

 On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com
 wrote:

 If you are running on a cluster, the listening is occurring on one of
 the executors, not in the driver.

 On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia 
 mohitanch...@gmail.com wrote:

 I am trying to run this program as a yarn-client. The job seems to be
 submitting successfully however I don't see any process listening on this
 host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages:
 Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD at
 JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70









Re: Streaming of WordCount example

2015-08-10 Thread Mohit Anchlia
I do see this message:

15/08/10 19:19:12 WARN YarnScheduler: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient resources

On Mon, Aug 10, 2015 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:


 I am using the same exact code:


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java

 Submitting like this:

 yarn:

 /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class
 org.sony.spark.stream.test.JavaRecoverableNetworkWordCount  --master
 yarn-client --total-executor-cores 3
 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  localhost
  /user/ec2-user/checkpoint/ /user/ec2-user/out

 local:

 /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class
 org.sony.spark.stream.test.JavaRecoverableNetworkWordCount  --master
 spark://localhost:9966 --total-executor-cores 3
 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  localhost
  /user/ec2-user/checkpoint/ /user/ec2-user/out

 Even though I am running as local I see it being scheduled and managed by
 yarn.


 On Mon, Aug 10, 2015 at 12:56 PM, Tathagata Das t...@databricks.com
 wrote:

 Is it receiving any data? If so, then it must be listening.
 Alternatively, to test these theories, you can locally running a spark
 standalone cluster (one node standalone cluster in local machine), and
 submit your app in client mode on that to see whether you are seeing the
 process listening on  or not.

 On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I've verified all the executors and I don't see a process listening on
 the port. However, the application seem to show as running in the yarn UI

 On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com
 wrote:

 In yarn-client mode, the driver is on the machine where you ran the
 spark-submit. The executors are running in the YARN cluster nodes, and the
 socket receiver listening on port  is running in one of the executors.

 On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:

 I am running as a yarn-client which probably means that the program
 that submitted the job is where the listening is also occurring? I thought
 that the yarn is only used to negotiate resources in yarn-client master
 mode.

 On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com
 wrote:

 If you are running on a cluster, the listening is occurring on one of
 the executors, not in the driver.

 On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia 
 mohitanch...@gmail.com wrote:

 I am trying to run this program as a yarn-client. The job seems to
 be submitting successfully however I don't see any process listening on
 this host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages:
 Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD
 at JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70










Re: Streaming of WordCount example

2015-08-10 Thread Tathagata Das
1. When you are running locally, make sure the master in the SparkConf
reflects that and is not somehow set to yarn-client

2. You may not be getting any resources from YARN at all, so no executors,
so no receiver running. That is why I asked the most basic question - Is it
receiving data? That will eliminate a lot of uncertainities if it is indeed
receiving data.

TD

On Mon, Aug 10, 2015 at 4:21 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I do see this message:

 15/08/10 19:19:12 WARN YarnScheduler: Initial job has not accepted any
 resources; check your cluster UI to ensure that workers are registered and
 have sufficient resources

 On Mon, Aug 10, 2015 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:


 I am using the same exact code:


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java

 Submitting like this:

 yarn:

 /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class
 org.sony.spark.stream.test.JavaRecoverableNetworkWordCount  --master
 yarn-client --total-executor-cores 3
 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  localhost
  /user/ec2-user/checkpoint/ /user/ec2-user/out

 local:

 /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/bin/spark-submit --class
 org.sony.spark.stream.test.JavaRecoverableNetworkWordCount  --master
 spark://localhost:9966 --total-executor-cores 3
 ./spark-streaming-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  localhost
  /user/ec2-user/checkpoint/ /user/ec2-user/out

 Even though I am running as local I see it being scheduled and managed by
 yarn.


 On Mon, Aug 10, 2015 at 12:56 PM, Tathagata Das t...@databricks.com
 wrote:

 Is it receiving any data? If so, then it must be listening.
 Alternatively, to test these theories, you can locally running a spark
 standalone cluster (one node standalone cluster in local machine), and
 submit your app in client mode on that to see whether you are seeing the
 process listening on  or not.

 On Mon, Aug 10, 2015 at 12:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I've verified all the executors and I don't see a process listening on
 the port. However, the application seem to show as running in the yarn UI

 On Mon, Aug 10, 2015 at 11:56 AM, Tathagata Das t...@databricks.com
 wrote:

 In yarn-client mode, the driver is on the machine where you ran the
 spark-submit. The executors are running in the YARN cluster nodes, and the
 socket receiver listening on port  is running in one of the executors.

 On Mon, Aug 10, 2015 at 11:43 AM, Mohit Anchlia 
 mohitanch...@gmail.com wrote:

 I am running as a yarn-client which probably means that the program
 that submitted the job is where the listening is also occurring? I 
 thought
 that the yarn is only used to negotiate resources in yarn-client master
 mode.

 On Mon, Aug 10, 2015 at 11:34 AM, Tathagata Das t...@databricks.com
 wrote:

 If you are running on a cluster, the listening is occurring on one
 of the executors, not in the driver.

 On Mon, Aug 10, 2015 at 10:29 AM, Mohit Anchlia 
 mohitanch...@gmail.com wrote:

 I am trying to run this program as a yarn-client. The job seems to
 be submitting successfully however I don't see any process listening on
 this host on port 


 https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java
 Active Jobs (2)Job IdDescriptionSubmittedDurationStages:
 Succeeded/TotalTasks (for all stages): Succeeded/Total1foreachRDD
 at JavaRecoverableNetworkWordCount.java:112
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=12015/08/10
 13:27:3651 s0/2
 0/2
 0start at JavaRecoverableNetworkWordCount.java:152
 http://ec2-52-25-118-171.us-west-2.compute.amazonaws.com:8088/proxy/application_1438820875993_0007/jobs/job?id=02015/08/10
 13:27:3551 s0/2
 0/70











Re: WordCount example

2015-04-06 Thread Tathagata Das
There are no workers registered with the Spark Standalone master! That is
the crux of the problem. :)
Follow the instructions properly -
https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
Especially make the conf/slaves file has intended workers listed.

TD

On Mon, Apr 6, 2015 at 9:55 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 Interesting, I see 0 cores in the UI?


- *Cores:* 0 Total, 0 Used


 On Fri, Apr 3, 2015 at 2:55 PM, Tathagata Das t...@databricks.com wrote:

 What does the Spark Standalone UI at port 8080 say about number of cores?

 On Fri, Apr 3, 2015 at 2:53 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 [ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process
 processor   : 0
 processor   : 1
 processor   : 2
 processor   : 3
 processor   : 4
 processor   : 5
 processor   : 6
 processor   : 7

 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com
 wrote:

 How many cores are present in the works allocated to the standalone
 cluster spark://ip-10-241-251-232:7077 ?


 On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077
 this seems to work. I don't understand why though because when I
 give spark://ip-10-241-251-232:7077 application seem to bootstrap
 successfully, just doesn't create a socket on port ?


 On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia 
 mohitanch...@gmail.com wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with
 ID app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 
 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf,
 Durations.*seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(
 localhost, );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to:
 ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to:
 ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(ec2-user); users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on
 addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service
 'sparkDriver' on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity
 3.5 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI'
 on port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to
 Spark cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on
 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register
 

Re: WordCount example

2015-04-06 Thread Mohit Anchlia
Interesting, I see 0 cores in the UI?


   - *Cores:* 0 Total, 0 Used


On Fri, Apr 3, 2015 at 2:55 PM, Tathagata Das t...@databricks.com wrote:

 What does the Spark Standalone UI at port 8080 say about number of cores?

 On Fri, Apr 3, 2015 at 2:53 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 [ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process
 processor   : 0
 processor   : 1
 processor   : 2
 processor   : 3
 processor   : 4
 processor   : 5
 processor   : 6
 processor   : 7

 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com
 wrote:

 How many cores are present in the works allocated to the standalone
 cluster spark://ip-10-241-251-232:7077 ?


 On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077
 this seems to work. I don't understand why though because when I
 give spark://ip-10-241-251-232:7077 application seem to bootstrap
 successfully, just doesn't create a socket on port ?


 On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
 app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 
 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf,
 Durations.*seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(
 localhost, );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to:
 ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(ec2-user); users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on
 addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service
 'sparkDriver' on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity
 3.5 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI'
 on port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
 cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on
 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register
 BlockManager
 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block
 manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB 
 RAM,
 BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
 58358)
 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: 

Re: WordCount example

2015-04-03 Thread Mohit Anchlia
If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this
seems to work. I don't understand why though because when I
give spark://ip-10-241-251-232:7077 application seem to bootstrap
successfully, just doesn't create a socket on port ?


On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
 app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
 *seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost,
 );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(ec2-user);
 users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver'
 on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5
 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
 cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager
 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager
 ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM,
 BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
 58358)
 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0
 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms
 15/03/27 13:50:48 INFO SocketInputDStream: Storage level =
 StorageLevel(false, false, false, false, 1)
 15/03/27 13:50:48 INFO SocketInputDStream: Checkpoint interval = null
 15/03/27 13:50:48 INFO SocketInputDStream: Remember duration = 1000 ms
 15/03/27 13:50:48 

Re: WordCount example

2015-04-03 Thread Tathagata Das
What does the Spark Standalone UI at port 8080 say about number of cores?

On Fri, Apr 3, 2015 at 2:53 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 [ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process
 processor   : 0
 processor   : 1
 processor   : 2
 processor   : 3
 processor   : 4
 processor   : 5
 processor   : 6
 processor   : 7

 On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com wrote:

 How many cores are present in the works allocated to the standalone
 cluster spark://ip-10-241-251-232:7077 ?


 On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this
 seems to work. I don't understand why though because when I
 give spark://ip-10-241-251-232:7077 application seem to bootstrap
 successfully, just doesn't create a socket on port ?


 On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
 app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 
 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf,
 Durations.*seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(
 localhost, );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to:
 ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(ec2-user);
 users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on
 addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service
 'sparkDriver' on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity
 3.5 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
 cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on
 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register
 BlockManager
 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block
 manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM,
 BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
 58358)
 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0
 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
 

Re: WordCount example

2015-04-03 Thread Mohit Anchlia
[ec2-user@ip-10-241-251-232 s_lib]$ cat /proc/cpuinfo |grep process
processor   : 0
processor   : 1
processor   : 2
processor   : 3
processor   : 4
processor   : 5
processor   : 6
processor   : 7

On Fri, Apr 3, 2015 at 2:33 PM, Tathagata Das t...@databricks.com wrote:

 How many cores are present in the works allocated to the standalone
 cluster spark://ip-10-241-251-232:7077 ?


 On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this
 seems to work. I don't understand why though because when I
 give spark://ip-10-241-251-232:7077 application seem to bootstrap
 successfully, just doesn't create a socket on port ?


 On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
 app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 
 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf,
 Durations.*seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(
 localhost, );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(ec2-user);
 users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on
 addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver'
 on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity
 3.5 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
 cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register
 BlockManager
 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block
 manager ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM,
 BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
 58358)
 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0
 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO 

Re: WordCount example

2015-04-03 Thread Tathagata Das
How many cores are present in the works allocated to the standalone cluster
spark://ip-10-241-251-232:7077 ?


On Fri, Apr 3, 2015 at 2:18 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 If I use local[2] instead of *URL:* spark://ip-10-241-251-232:7077 this
 seems to work. I don't understand why though because when I
 give spark://ip-10-241-251-232:7077 application seem to bootstrap
 successfully, just doesn't create a socket on port ?


 On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
 app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 
 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
 *seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(
 localhost, );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(ec2-user);
 users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver'
 on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5
 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
 cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager
 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager
 ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM,
 BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
 58358)
 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0
 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms
 15/03/27 13:50:48 INFO SocketInputDStream: Storage level =
 

Re: WordCount example

2015-03-30 Thread Mohit Anchlia
I tried to file a bug in git repo however I don't see a link to open
issues

On Fri, Mar 27, 2015 at 10:55 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I checked the ports using netstat and don't see any connections
 established on that port. Logs show only this:

 15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
 15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
 app-20150327135048-0002

 Spark ui shows:

 Running Applications
 IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
 app-20150327135048-0002
 http://54.69.225.94:8080/app?appId=app-20150327135048-0002
 NetworkWordCount
 http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27
 13:50:48ec2-userWAITING33 s
 Code looks like is being executed:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

 *public* *static* *void* doWork(String masterUrl){

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
 *seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost,
 );

 System.*out*.println(Successfully created connection);

 *mapAndReduce*(lines);

  jssc.start(); // Start the computation

 jssc.awaitTermination(); // Wait for the computation to terminate

 }

 *public* *static* *void* main(String ...args){

 *doWork*(args[0]);

 }
 And output of the java program after submitting the task:

 java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user
 15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(ec2-user);
 users with modify permissions: Set(ec2-user)
 15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
 15/03/27 13:50:46 INFO Remoting: Starting remoting
 15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal
 :60184]
 15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver'
 on port 60184.
 15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
 15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
 15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
 /tmp/spark-local-20150327135047-5399
 15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5
 GB
 15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
 15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
 15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
 server' on port 57955.
 15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.
 15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
 http://ip-10-241-251-232.us-west-2.compute.internal:4040
 15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
 spark://ip-10-241-251-232:7077...
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
 cluster with app ID app-20150327135048-0002
 15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358
 15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager
 15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager
 ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM,
 BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
 58358)
 15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
 15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0
 15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
 15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1
 15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms
 15/03/27 13:50:48 INFO SocketInputDStream: Storage level =
 StorageLevel(false, false, false, false, 1)
 15/03/27 13:50:48 INFO SocketInputDStream: Checkpoint interval = null
 15/03/27 13:50:48 INFO SocketInputDStream: Remember duration = 1000 ms
 15/03/27 13:50:48 INFO SocketInputDStream: Initialized and validated
 org.apache.spark.streaming.dstream.SocketInputDStream@75efa13d
 15/03/27 13:50:48 INFO FlatMappedDStream: Slide time = 1000 ms
 

Re: WordCount example

2015-03-27 Thread Mohit Anchlia
I checked the ports using netstat and don't see any connections established
on that port. Logs show only this:

15/03/27 13:50:48 INFO Master: Registering app NetworkWordCount
15/03/27 13:50:48 INFO Master: Registered app NetworkWordCount with ID
app-20150327135048-0002

Spark ui shows:

Running Applications
IDNameCoresMemory per NodeSubmitted TimeUserStateDuration
app-20150327135048-0002
http://54.69.225.94:8080/app?appId=app-20150327135048-0002NetworkWordCount
http://ip-10-241-251-232.us-west-2.compute.internal:4040/0512.0 MB2015/03/27
13:50:48ec2-userWAITING33 s
Code looks like is being executed:

java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077

*public* *static* *void* doWork(String masterUrl){

SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
NetworkWordCount);

JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
*seconds*(1));

JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost,
);

System.*out*.println(Successfully created connection);

*mapAndReduce*(lines);

 jssc.start(); // Start the computation

jssc.awaitTermination(); // Wait for the computation to terminate

}

*public* *static* *void* main(String ...args){

*doWork*(args[0]);

}
And output of the java program after submitting the task:

java -cp .:* org.spark.test.WordCount spark://ip-10-241-251-232:7077
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/27 13:50:46 INFO SecurityManager: Changing view acls to: ec2-user
15/03/27 13:50:46 INFO SecurityManager: Changing modify acls to: ec2-user
15/03/27 13:50:46 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(ec2-user);
users with modify permissions: Set(ec2-user)
15/03/27 13:50:46 INFO Slf4jLogger: Slf4jLogger started
15/03/27 13:50:46 INFO Remoting: Starting remoting
15/03/27 13:50:47 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkdri...@ip-10-241-251-232.us-west-2.compute.internal:60184]
15/03/27 13:50:47 INFO Utils: Successfully started service 'sparkDriver' on
port 60184.
15/03/27 13:50:47 INFO SparkEnv: Registering MapOutputTracker
15/03/27 13:50:47 INFO SparkEnv: Registering BlockManagerMaster
15/03/27 13:50:47 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20150327135047-5399
15/03/27 13:50:47 INFO MemoryStore: MemoryStore started with capacity 3.5 GB
15/03/27 13:50:47 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/03/27 13:50:47 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-7e26df49-1520-4c77-b411-c837da59fa5b
15/03/27 13:50:47 INFO HttpServer: Starting HTTP Server
15/03/27 13:50:47 INFO Utils: Successfully started service 'HTTP file
server' on port 57955.
15/03/27 13:50:47 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
15/03/27 13:50:47 INFO SparkUI: Started SparkUI at
http://ip-10-241-251-232.us-west-2.compute.internal:4040
15/03/27 13:50:47 INFO AppClient$ClientActor: Connecting to master
spark://ip-10-241-251-232:7077...
15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: Connected to Spark
cluster with app ID app-20150327135048-0002
15/03/27 13:50:48 INFO NettyBlockTransferService: Server created on 58358
15/03/27 13:50:48 INFO BlockManagerMaster: Trying to register BlockManager
15/03/27 13:50:48 INFO BlockManagerMasterActor: Registering block manager
ip-10-241-251-232.us-west-2.compute.internal:58358 with 3.5 GB RAM,
BlockManagerId(driver, ip-10-241-251-232.us-west-2.compute.internal,
58358)
15/03/27 13:50:48 INFO BlockManagerMaster: Registered BlockManager
15/03/27 13:50:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
15/03/27 13:50:48 INFO ReceiverTracker: ReceiverTracker started
15/03/27 13:50:48 INFO ForEachDStream: metadataCleanupDelay = -1
15/03/27 13:50:48 INFO ShuffledDStream: metadataCleanupDelay = -1
15/03/27 13:50:48 INFO MappedDStream: metadataCleanupDelay = -1
15/03/27 13:50:48 INFO FlatMappedDStream: metadataCleanupDelay = -1
15/03/27 13:50:48 INFO SocketInputDStream: metadataCleanupDelay = -1
15/03/27 13:50:48 INFO SocketInputDStream: Slide time = 1000 ms
15/03/27 13:50:48 INFO SocketInputDStream: Storage level =
StorageLevel(false, false, false, false, 1)
15/03/27 13:50:48 INFO SocketInputDStream: Checkpoint interval = null
15/03/27 13:50:48 INFO SocketInputDStream: Remember duration = 1000 ms
15/03/27 13:50:48 INFO SocketInputDStream: Initialized and validated
org.apache.spark.streaming.dstream.SocketInputDStream@75efa13d
15/03/27 13:50:48 INFO FlatMappedDStream: Slide time = 1000 ms
15/03/27 13:50:48 INFO FlatMappedDStream: Storage level =
StorageLevel(false, false, false, false, 1)
15/03/27 13:50:48 INFO FlatMappedDStream: Checkpoint interval = null
15/03/27 13:50:48 INFO FlatMappedDStream: Remember duration = 1000 ms
15/03/27 

Re: WordCount example

2015-03-26 Thread Mohit Anchlia
What's the best way to troubleshoot inside spark to see why Spark is not
connecting to nc on port ? I don't see any errors either.

On Thu, Mar 26, 2015 at 2:38 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I am trying to run the word count example but for some reason it's not
 working as expected. I start nc server on port  and then submit the
 spark job to the cluster. Spark job gets successfully submitting but I
 never see any connection from spark getting established. I also tried to
 type words on the console where nc is listening and waiting on the
 prompt, however I don't see any output. I also don't see any errors.

 Here is the conf:

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
 *seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost,
 );



WordCount example

2015-03-26 Thread Mohit Anchlia
I am trying to run the word count example but for some reason it's not
working as expected. I start nc server on port  and then submit the
spark job to the cluster. Spark job gets successfully submitting but I
never see any connection from spark getting established. I also tried to
type words on the console where nc is listening and waiting on the
prompt, however I don't see any output. I also don't see any errors.

Here is the conf:

SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
NetworkWordCount);

JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
*seconds*(1));

JavaReceiverInputDStreamString lines = jssc.socketTextStream(localhost,
);


Re: WordCount example

2015-03-26 Thread Saisai Shao
Hi,

Did you run the word count example in Spark local mode or other mode, in
local mode you have to set Local[n], where n =2. For other mode, make sure
available cores larger than 1. Because the receiver inside Spark Streaming
wraps as a long-running task, which will at least occupy one core.

Besides using lsof -p pid or netstat to make sure Spark executor backend
is connected to the nc process. Also grep the executor's log to see if
there's log like Connecting to host port and Connected to host
port which shows that receiver is correctly connected to nc process.

Thanks
Jerry

2015-03-27 8:45 GMT+08:00 Mohit Anchlia mohitanch...@gmail.com:

 What's the best way to troubleshoot inside spark to see why Spark is not
 connecting to nc on port ? I don't see any errors either.

 On Thu, Mar 26, 2015 at 2:38 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I am trying to run the word count example but for some reason it's not
 working as expected. I start nc server on port  and then submit the
 spark job to the cluster. Spark job gets successfully submitting but I
 never see any connection from spark getting established. I also tried to
 type words on the console where nc is listening and waiting on the
 prompt, however I don't see any output. I also don't see any errors.

 Here is the conf:

 SparkConf conf = *new* SparkConf().setMaster(masterUrl).setAppName(
 NetworkWordCount);

 JavaStreamingContext *jssc* = *new* JavaStreamingContext(conf, Durations.
 *seconds*(1));

 JavaReceiverInputDStreamString lines = jssc.socketTextStream(
 localhost, );





Re: network wordcount example

2014-03-31 Thread Diana Carroll
Not sure what data you are sending in.  You could try calling
lines.print() instead which should just output everything that comes in
on the stream.  Just to test that your socket is receiving what you think
you are sending.


On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote:

 Hello

 i just started working with spark today... and i am trying to run the
 wordcount network example

 i created a socket server and client.. and i am sending data to the server
 in an infinite loop

 when i run the spark class.. i see this output in the console...

 ---
 Time: 1396281891000 ms
 ---

 14/03/31 11:04:51 INFO SparkContext: Job finished: take at
 DStream.scala:586, took 0.056794606 s
 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job
 1396281891000 ms.0 from job set of time 1396281891000 ms
 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time
 1396281891000 ms (execution: 0.058 s)
 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool

 but i dont see any output from the workcount operation when i make this
 call...

 wordCounts.print();

 any help is greatly appreciated

 thanks in advance



Re: network wordcount example

2014-03-31 Thread Chris Fregly
@eric-

i saw this exact issue recently while working on the KinesisWordCount.

are you passing local[2] to your example as the MASTER arg versus just
local or local[1]?

you need at least 2.  it's documented as n1 in the scala source docs -
which is easy to mistake for n=1.

i just ran the NetworkWordCount sample and confirmed that local[1] does not
work, but  local[2] does work.

give that a whirl.

-chris




On Mon, Mar 31, 2014 at 10:41 AM, Diana Carroll dcarr...@cloudera.comwrote:

 Not sure what data you are sending in.  You could try calling
 lines.print() instead which should just output everything that comes in
 on the stream.  Just to test that your socket is receiving what you think
 you are sending.


 On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote:

 Hello

 i just started working with spark today... and i am trying to run the
 wordcount network example

 i created a socket server and client.. and i am sending data to the
 server in an infinite loop

 when i run the spark class.. i see this output in the console...

 ---
 Time: 1396281891000 ms
 ---

 14/03/31 11:04:51 INFO SparkContext: Job finished: take at
 DStream.scala:586, took 0.056794606 s
 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job
 1396281891000 ms.0 from job set of time 1396281891000 ms
 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time
 1396281891000 ms (execution: 0.058 s)
 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool

 but i dont see any output from the workcount operation when i make this
 call...

 wordCounts.print();

 any help is greatly appreciated

 thanks in advance