I use spark on EC2 but it's a CDH 5.3.3 distribution (starving developer 
version) installed thru Cloudera Manager. Spark is configured to run on Yarn. 

Regards
Sanjay

Sent from my iPhone

> On May 29, 2015, at 6:16 PM, roni <roni.epi...@gmail.com> wrote:
> 
> Hi ,
> Any update on this? 
> I am not sure if the issue I am seeing is related ..
> I have 8 slaves and when I created the cluster I specified ebs volume with 
> 100G.
> I see on Ec2 8 volumes created and each attached to the corresponding slave.
> But when I try to copy data on it , it complains that 
> /root/ephemeral-hdfs/bin/hadoop fs -cp /intersection 
> hdfs://ec2-54-149-112-136.us-west-2.compute.amazonaws.com:9010/
> 
> 2015-05-28 23:40:35,447 WARN  hdfs.DFSClient (DFSOutputStream.java:run(562)) 
> - DataStreamer Exception
> 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /intersection/kmer150/commonGoodKmers/_temporary/_attempt_201504010056_0004_m_000428_3948/part-00428._COPYING_
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
> 
> 
> 
> It shows only 1 datanode , but for ephermal-hdfs it shows 8 datanodes.
> 
> Any thoughts?
> 
> Thanks
> 
> _R
> 
> 
>> On Sat, May 23, 2015 at 7:24 AM, Joe Wass <jw...@crossref.org> wrote:
>> I used Spark on EC2 a while ago, but recent revisions seem to have broken 
>> the functionality.
>> 
>> Is anyone actually using Spark on EC2 at the moment?
>> 
>> The bug in question is:
>> 
>> https://issues.apache.org/jira/browse/SPARK-5008
>> 
>> It makes it impossible to use persistent HDFS without a workround on each 
>> slave node.
>> 
>> No-one seems to be interested in the bug, so I wonder if other people aren't 
>> actually having this problem. If this is the case, any suggestions? 
>> 
>> Joe
> 

Reply via email to