Re: spark 1.6.0 on ec2 doesn't work

2016-01-19 Thread Calvin Jia
Hi Oleg,

The Tachyon related issue should be fixed.

Hope this helps,
Calvin

On Mon, Jan 18, 2016 at 2:51 AM, Oleg Ruchovets 
wrote:

> Hi ,
>I try to follow the spartk 1.6.0 to install spark on EC2.
>
> It doesn't work properly -  got exceptions and at the end standalone spark
> cluster installed.
> here is log information:
>
> Any suggestions?
>
> Thanks
> Oleg.
>
> oleg@robinhood:~/install/spark-1.6.0-bin-hadoop2.6/ec2$ ./spark-ec2
> --key-pair=CC-ES-Demo
>  
> --identity-file=/home/oleg/work/entity_extraction_framework/ec2_pem_key/CC-ES-Demo.pem
> --region=us-east-1 --zone=us-east-1a --spot-price=0.05   -s 5
> --spark-version=1.6.0launch entity-extraction-spark-cluster
> Setting up security groups...
> Searching for existing cluster entity-extraction-spark-cluster in region
> us-east-1...
> Spark AMI: ami-5bb18832
> Launching instances...
> Requesting 5 slaves as spot instances with price $0.050
> Waiting for spot instances to be granted...
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> All 5 slaves granted
> Launched master in us-east-1a, regid = r-9384033f
> Waiting for AWS to propagate instance metadata...
> Waiting for cluster to enter 'ssh-ready' state..
>
> Warning: SSH connection error. (This could be temporary.)
> Host: ec2-52-90-186-83.compute-1.amazonaws.com
> SSH return code: 255
> SSH output: ssh: connect to host ec2-52-90-186-83.compute-1.amazonaws.com
> port 22: Connection refused
>
> .
>
> Warning: SSH connection error. (This could be temporary.)
> Host: ec2-52-90-186-83.compute-1.amazonaws.com
> SSH return code: 255
> SSH output: ssh: connect to host ec2-52-90-186-83.compute-1.amazonaws.com
> port 22: Connection refused
>
> .
>
> Warning: SSH connection error. (This could be temporary.)
> Host: ec2-52-90-186-83.compute-1.amazonaws.com
> SSH return code: 255
> SSH output: ssh: connect to host ec2-52-90-186-83.compute-1.amazonaws.com
> port 22: Connection refused
>
> .
> Cluster is now in 'ssh-ready' state. Waited 442 seconds.
> Generating cluster's SSH key on master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Connection to ec2-52-90-186-83.compute-1.amazonaws.com closed.
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Transferring cluster's SSH key to slaves...
> ec2-54-165-243-74.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-165-243-74.compute-1.amazonaws.com,54.165.243.74'
> (ECDSA) to the list of known hosts.
> ec2-54-88-245-107.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-88-245-107.compute-1.amazonaws.com,54.88.245.107'
> (ECDSA) to the list of known hosts.
> ec2-54-172-29-47.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-172-29-47.compute-1.amazonaws.com,54.172.29.47'
> (ECDSA) to the list of known hosts.
> ec2-54-165-131-210.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-165-131-210.compute-1.amazonaws.com,54.165.131.210'
> (ECDSA) to the list of known hosts.
> ec2-54-172-46-184.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-172-46-184.compute-1.amazonaws.com,54.172.46.184'
> (ECDSA) to the list of known hosts.
> Cloning spark-ec2 scripts from
> https://github.com/amplab/spark-ec2/tree/branch-1.5 on master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Cloning into 'spark-ec2'...
> remote: Counting objects: 2068, done.
> remote: Total 2068 (delta 0), reused 0 (delta 0), pack-reused 2068
> Receiving objects: 100% (2068/2068), 349.76 KiB, done.
> Resolving deltas: 100% (796/796), done.
> Connection to ec2-52-90-186-83.compute-1.amazonaws.com closed.
> Deploying files to master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> sending incremental file list
> root/spark-ec2/ec2-variables.sh
>
> sent 1,835 bytes  received 40 bytes  416.67 bytes/sec
> total size is 1,684  speedup is 0.90
> Running setup on master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Connection to ec2-52-90-186-83.compute-1.amazonaws.com closed.
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Setting up Spark on ip-172-31-24-124.ec2.internal...
> Setting executable permissions on scripts...
> RSYNC'ing /root/spark-ec2 to other cluster nodes...
> 

Re: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread Daniel Darabos
Hi,

How do you know it doesn't work? The log looks roughly normal to me. Is
Spark not running at the printed address? Can you not start jobs?

On Mon, Jan 18, 2016 at 11:51 AM, Oleg Ruchovets 
wrote:

> Hi ,
>I try to follow the spartk 1.6.0 to install spark on EC2.
>
> It doesn't work properly -  got exceptions and at the end standalone spark
> cluster installed.
>

The purpose of the script is to install a standalone Spark cluster. So
that's not an error :).


> here is log information:
>
> Any suggestions?
>
> Thanks
> Oleg.
>
> oleg@robinhood:~/install/spark-1.6.0-bin-hadoop2.6/ec2$ ./spark-ec2
> --key-pair=CC-ES-Demo
>  
> --identity-file=/home/oleg/work/entity_extraction_framework/ec2_pem_key/CC-ES-Demo.pem
> --region=us-east-1 --zone=us-east-1a --spot-price=0.05   -s 5
> --spark-version=1.6.0launch entity-extraction-spark-cluster
> Setting up security groups...
> Searching for existing cluster entity-extraction-spark-cluster in region
> us-east-1...
> Spark AMI: ami-5bb18832
> Launching instances...
> Requesting 5 slaves as spot instances with price $0.050
> Waiting for spot instances to be granted...
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> 0 of 5 slaves granted, waiting longer
> All 5 slaves granted
> Launched master in us-east-1a, regid = r-9384033f
> Waiting for AWS to propagate instance metadata...
> Waiting for cluster to enter 'ssh-ready' state..
>
> Warning: SSH connection error. (This could be temporary.)
> Host: ec2-52-90-186-83.compute-1.amazonaws.com
> SSH return code: 255
> SSH output: ssh: connect to host ec2-52-90-186-83.compute-1.amazonaws.com
> port 22: Connection refused
>
> .
>
> Warning: SSH connection error. (This could be temporary.)
> Host: ec2-52-90-186-83.compute-1.amazonaws.com
> SSH return code: 255
> SSH output: ssh: connect to host ec2-52-90-186-83.compute-1.amazonaws.com
> port 22: Connection refused
>
> .
>
> Warning: SSH connection error. (This could be temporary.)
> Host: ec2-52-90-186-83.compute-1.amazonaws.com
> SSH return code: 255
> SSH output: ssh: connect to host ec2-52-90-186-83.compute-1.amazonaws.com
> port 22: Connection refused
>
> .
> Cluster is now in 'ssh-ready' state. Waited 442 seconds.
> Generating cluster's SSH key on master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Connection to ec2-52-90-186-83.compute-1.amazonaws.com closed.
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Transferring cluster's SSH key to slaves...
> ec2-54-165-243-74.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-165-243-74.compute-1.amazonaws.com,54.165.243.74'
> (ECDSA) to the list of known hosts.
> ec2-54-88-245-107.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-88-245-107.compute-1.amazonaws.com,54.88.245.107'
> (ECDSA) to the list of known hosts.
> ec2-54-172-29-47.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-172-29-47.compute-1.amazonaws.com,54.172.29.47'
> (ECDSA) to the list of known hosts.
> ec2-54-165-131-210.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-165-131-210.compute-1.amazonaws.com,54.165.131.210'
> (ECDSA) to the list of known hosts.
> ec2-54-172-46-184.compute-1.amazonaws.com
> Warning: Permanently added 
> 'ec2-54-172-46-184.compute-1.amazonaws.com,54.172.46.184'
> (ECDSA) to the list of known hosts.
> Cloning spark-ec2 scripts from
> https://github.com/amplab/spark-ec2/tree/branch-1.5 on master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Cloning into 'spark-ec2'...
> remote: Counting objects: 2068, done.
> remote: Total 2068 (delta 0), reused 0 (delta 0), pack-reused 2068
> Receiving objects: 100% (2068/2068), 349.76 KiB, done.
> Resolving deltas: 100% (796/796), done.
> Connection to ec2-52-90-186-83.compute-1.amazonaws.com closed.
> Deploying files to master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> sending incremental file list
> root/spark-ec2/ec2-variables.sh
>
> sent 1,835 bytes  received 40 bytes  416.67 bytes/sec
> total size is 1,684  speedup is 0.90
> Running setup on master...
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Connection to ec2-52-90-186-83.compute-1.amazonaws.com closed.
> Warning: Permanently added 
> 'ec2-52-90-186-83.compute-1.amazonaws.com,52.90.186.83'
> (ECDSA) to the list of known hosts.
> Setting up Spark on 

Re: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread Daniel Darabos
On Mon, Jan 18, 2016 at 5:24 PM, Oleg Ruchovets 
wrote:

> I thought script tries to install hadoop / hdfs also. And it looks like it
> failed. Installation is only standalone spark without hadoop. Is it correct
> behaviour?
>

Yes, it also sets up two HDFS clusters. Are they not working? Try to see if
Spark is working by running some simple jobs on it. (See
http://spark.apache.org/docs/latest/ec2-scripts.html.)

There is no program called Hadoop. If you mean YARN, then indeed the script
does not set up YARN. It sets up standalone Spark.


> Also errors in the log:
>ERROR: Unknown Tachyon version
>Error: Could not find or load main class crayondata.com.log
>

As long as Spark is working fine, you can ignore all output from the EC2
script :).


Re: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread Oleg Ruchovets
I thought script tries to install hadoop / hdfs also. And it looks like it
failed. Installation is only standalone spark without hadoop. Is it correct
behaviour?
Also errors in the log:
   ERROR: Unknown Tachyon version
   Error: Could not find or load main class crayondata.com.log

Thanks
Oleg.


Re: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread Oleg Ruchovets
It looks spark is not working fine :

I followed this link ( http://spark.apache.org/docs/latest/ec2-scripts.html.
) and I see spot instances installed on EC2.

from spark shell I am counting lines and got connection exception.
*scala> val lines = sc.textFile("README.md")*
*scala> lines.count()*



*scala> val lines = sc.textFile("README.md")*

16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0 stored as
values in memory (estimated size 26.5 KB, free 26.5 KB)
16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0_piece0 stored
as bytes in memory (estimated size 5.6 KB, free 32.1 KB)
16/01/19 03:17:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
in memory on 172.31.28.196:44028 (size: 5.6 KB, free: 511.5 MB)
16/01/19 03:17:35 INFO spark.SparkContext: Created broadcast 0 from
textFile at :21
lines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile
at :21

*scala> lines.count()*

16/01/19 03:17:55 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:56 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:57 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:58 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
3 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:59 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
4 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:00 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
5 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:01 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:02 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:03 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:04 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
9 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
java.lang.RuntimeException: java.net.ConnectException: Call to
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000 failed on
connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:567)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:291)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)
at 

Re: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread Peter Zhang
Could you run spark-shell at $SPARK_HOME DIR?

You can try to change you command run at $SPARK_HOME or, point to README.md 
with full path.


Peter Zhang
-- 
Google
Sent with Airmail

On January 19, 2016 at 11:26:14, Oleg Ruchovets (oruchov...@gmail.com) wrote:

It looks spark is not working fine : 
 
I followed this link ( http://spark.apache.org/docs/latest/ec2-scripts.html. ) 
and I see spot instances installed on EC2.

from spark shell I am counting lines and got connection exception.
scala> val lines = sc.textFile("README.md")
scala> lines.count()



scala> val lines = sc.textFile("README.md")

16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0 stored as values 
in memory (estimated size 26.5 KB, free 26.5 KB)
16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 5.6 KB, free 32.1 KB)
16/01/19 03:17:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 172.31.28.196:44028 (size: 5.6 KB, free: 511.5 MB)
16/01/19 03:17:35 INFO spark.SparkContext: Created broadcast 0 from textFile at 
:21
lines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at 
:21

scala> lines.count()

16/01/19 03:17:55 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:17:56 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 1 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:17:57 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 2 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:17:58 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 3 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:17:59 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 4 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:18:00 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 5 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:18:01 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 6 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:18:02 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 7 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:18:03 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 8 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
16/01/19 03:18:04 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 9 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1 SECONDS)
java.lang.RuntimeException: java.net.ConnectException: Call to 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000 failed on 
connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:567)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:291)
at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at 
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at 

Re: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread Oleg Ruchovets
I am running from  $SPARK_HOME.
It looks like connection  problem to port 9000. It is on master machine.
What is this process is spark tries to connect?
Should I start any framework , processes before executing spark?

Thanks
OIeg.


16/01/19 03:17:56 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:57 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:58 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
3 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:59 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
4 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:00 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
5 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:01 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:02 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
7 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:03 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
8 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:04 INFO ipc.Client: Retrying connect to server:
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried
9 time(s); retry

On Tue, Jan 19, 2016 at 1:13 PM, Peter Zhang  wrote:

> Could you run spark-shell at $SPARK_HOME DIR?
>
> You can try to change you command run at $SPARK_HOME or, point to
> README.md with full path.
>
>
> Peter Zhang
> --
> Google
> Sent with Airmail
>
> On January 19, 2016 at 11:26:14, Oleg Ruchovets (oruchov...@gmail.com)
> wrote:
>
> It looks spark is not working fine :
>
> I followed this link (
> http://spark.apache.org/docs/latest/ec2-scripts.html. ) and I see spot
> instances installed on EC2.
>
> from spark shell I am counting lines and got connection exception.
> *scala> val lines = sc.textFile("README.md")*
> *scala> lines.count()*
>
>
>
> *scala> val lines = sc.textFile("README.md")*
>
> 16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0 stored as
> values in memory (estimated size 26.5 KB, free 26.5 KB)
> 16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0_piece0
> stored as bytes in memory (estimated size 5.6 KB, free 32.1 KB)
> 16/01/19 03:17:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
> in memory on 172.31.28.196:44028 (size: 5.6 KB, free: 511.5 MB)
> 16/01/19 03:17:35 INFO spark.SparkContext: Created broadcast 0 from
> textFile at :21
> lines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile
> at :21
>
> *scala> lines.count()*
>
> 16/01/19 03:17:55 INFO ipc.Client: Retrying connect to server:
> ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already
> tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 16/01/19 03:17:56 INFO ipc.Client: Retrying connect to server:
> ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already
> tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 16/01/19 03:17:57 INFO ipc.Client: Retrying connect to server:
> ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already
> tried 2 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 16/01/19 03:17:58 INFO ipc.Client: Retrying connect to server:
> ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already
> tried 3 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 16/01/19 03:17:59 INFO ipc.Client: Retrying connect to server:
> ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already
> tried 4 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 16/01/19 03:18:00 INFO ipc.Client: Retrying connect to server:
> 

RE: spark 1.6.0 on ec2 doesn't work

2016-01-18 Thread vivek.meghanathan
Have you verified the spark master/slaves are started correctly? Please check 
using netstat command and open ports mode. Are they listening? Binds to which 
address etc..

From: Oleg Ruchovets [mailto:oruchov...@gmail.com]
Sent: 19 January 2016 11:24
To: Peter Zhang <zhangju...@gmail.com>
Cc: Daniel Darabos <daniel.dara...@lynxanalytics.com>; user 
<user@spark.apache.org>
Subject: Re: spark 1.6.0 on ec2 doesn't work

I am running from  $SPARK_HOME.
It looks like connection  problem to port 9000. It is on master machine.
What is this process is spark tries to connect?
Should I start any framework , processes before executing spark?

Thanks
OIeg.


16/01/19 03:17:56 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:57 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:58 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:17:59 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 4 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:00 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 5 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:01 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 6 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:02 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 7 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:03 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
16/01/19 03:18:04 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000<http://ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000>.
 Already tried 9 time(s); retry

On Tue, Jan 19, 2016 at 1:13 PM, Peter Zhang 
<zhangju...@gmail.com<mailto:zhangju...@gmail.com>> wrote:
Could you run spark-shell at $SPARK_HOME DIR?

You can try to change you command run at $SPARK_HOME or, point to README.md 
with full path.


Peter Zhang
--
Google
Sent with Airmail


On January 19, 2016 at 11:26:14, Oleg Ruchovets 
(oruchov...@gmail.com<mailto:oruchov...@gmail.com>) wrote:
It looks spark is not working fine :

I followed this link ( http://spark.apache.org/docs/latest/ec2-scripts.html. ) 
and I see spot instances installed on EC2.

from spark shell I am counting lines and got connection exception.
scala> val lines = sc.textFile("README.md")
scala> lines.count()



scala> val lines = sc.textFile("README.md")

16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0 stored as values 
in memory (estimated size 26.5 KB, free 26.5 KB)
16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 5.6 KB, free 32.1 KB)
16/01/19 03:17:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 172.31.28.196:44028<http://172.31.28.196:44028> (size: 5.6 KB, free: 
511.5 MB)
16/01/19 03:17:35 INFO spark.SparkContext: Created broadcast 0 from textFile at 
:21
lines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at 
:21

scala> lines.count()

16/01/19 03:17:55 INFO ipc.Client: Retrying connect to server: 
ec2-54-88-242-197.compute-1.amazonaws.com