Re: Has Spark 1.2.0 changed EC2 persistent-hdfs?

Joe Wass Fri, 13 Feb 2015 11:58:53 -0800

Looks like this is caused by issue SPARK-5008:
https://issues.apache.org/jira/browse/SPARK-5008


On 13 February 2015 at 19:04, Joe Wass <jw...@crossref.org> wrote:

> I've updated to Spark 1.2.0 and the EC2 and the persistent-hdfs behaviour
> appears to have changed.
>
> My launch script is
>
> spark-1.2.0-bin-hadoop2.4/ec2/spark-ec2 --instance-type=m3.xlarge -s 5
> --ebs-vol-size=1000 launch myproject
>
> When I ssh into master I get:
>
> $ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/xvda1            7.9G  2.9G  5.0G  37% /
> tmpfs                 7.3G     0  7.3G   0% /dev/shm
> /dev/xvdb              37G  1.3G   34G   4% /mnt
> /dev/xvdc              37G  177M   35G   1% /mnt2
> /dev/xvds            1000G   33M 1000G   1% /vol0
>
> that /vol0 is the place I want (and assume) persistent-hdfs to go. But
> when I look at the size I get:
>
> $ persistent-hdfs/bin/start-all.sh
> $ persistent-hdfs/bin/hadoop dfsadmin -report
> Warning: $HADOOP_HOME is deprecated.
>
> Configured Capacity: 42275430400 (39.37 GB)
> Present Capacity: 26448744448 (24.63 GB)
> DFS Remaining: 26448601088 (24.63 GB)
> DFS Used: 143360 (140 KB)
> DFS Used%: 0%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 5 (5 total, 0 dead)
>
> Name: 10.46.11.156:60010
> Decommission Status : Normal
> Configured Capacity: 8455086080 (7.87 GB)
> DFS Used: 28672 (28 KB)
> Non DFS Used: 3165372416 (2.95 GB)
> DFS Remaining: 5289684992(4.93 GB)
> DFS Used%: 0%
> DFS Remaining%: 62.56%
> Last contact: Fri Feb 13 17:41:46 UTC 2015
>
>
> Name: 10.41.51.155:60010
> Decommission Status : Normal
> Configured Capacity: 8455086080 (7.87 GB)
> DFS Used: 28672 (28 KB)
> Non DFS Used: 3165364224 (2.95 GB)
> DFS Remaining: 5289693184(4.93 GB)
> DFS Used%: 0%
> DFS Remaining%: 62.56%
> Last contact: Fri Feb 13 17:41:46 UTC 2015
>
>
> Name: 10.38.30.254:60010
> Decommission Status : Normal
> Configured Capacity: 8455086080 (7.87 GB)
> DFS Used: 28672 (28 KB)
> Non DFS Used: 3165249536 (2.95 GB)
> DFS Remaining: 5289807872(4.93 GB)
> DFS Used%: 0%
> DFS Remaining%: 62.56%
> Last contact: Fri Feb 13 17:41:46 UTC 2015
>
>
> Name: 10.204.134.84:60010
> Decommission Status : Normal
> Configured Capacity: 8455086080 (7.87 GB)
> DFS Used: 28672 (28 KB)
> Non DFS Used: 3165343744 (2.95 GB)
> DFS Remaining: 5289713664(4.93 GB)
> DFS Used%: 0%
> DFS Remaining%: 62.56%
> Last contact: Fri Feb 13 17:41:46 UTC 2015
>
>
> Name: 10.33.15.134:60010
> Decommission Status : Normal
> Configured Capacity: 8455086080 (7.87 GB)
> DFS Used: 28672 (28 KB)
> Non DFS Used: 3165356032 (2.95 GB)
> DFS Remaining: 5289701376(4.93 GB)
> DFS Used%: 0%
> DFS Remaining%: 62.56%
> Last contact: Fri Feb 13 17:41:46 UTC 2015
>
>
> That's tiny. My suspicions are aroused when I see:
>
> $ ls /vol
> persistent-hdfs
>
> /vol is on the small /dev/xvda1 not the large EBS /dev/xvds
>
> I thought I'd be able to edit persistent-hdfs/conf/core-site.xml to change
> the volume:
>
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/vol0/persistent-hdfs</value>  <!-- was /vol/persistent-hdfs -->
> </property>
>
> And then
>
> persistent-hdfs/bin/stop-all.sh && persistent-hdfs/bin/start-all.sh
>
> but when I do that, the persistent HDFS won't start for whatever reason. I
> can't run
>
> $ persistent-hdfs/bin/hadoop dfsadmin -report
>
> 15/02/13 18:50:25 INFO ipc.Client: Retrying connect to server:
> ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010.
> Already tried 0 time(s).
> 15/02/13 18:50:26 INFO ipc.Client: Retrying connect to server:
> ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010.
> Already tried 1 time(s).
> 15/02/13 18:50:27 INFO ipc.Client: Retrying connect to server:
> ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010.
> Already tried 2 time(s).
>
> So, it looks like I can't use the EBS for persistent-hdfs. I was doing it
> before, so something must have changed in the last couple of weeks (last
> time I was using 1.1.0).
>
> Is this a bug? Has the behaviour of AWS changed? Am I doing something
> stupid? How do I fix it?
>
> Thanks in advance!
>
> Joe
>
>
>

Re: Has Spark 1.2.0 changed EC2 persistent-hdfs?

Reply via email to