Re: [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

2015-09-10 Thread Ewan Leith
The last time I checked, if you launch EMR 4 with only Spark selected as an 
application, HDFS isn't correctly installed.


Did you select another application like Hive at launch time as well as Spark? 
If not, try that.


Thanks,

Ewan


-- Original message--

From: Dean Wampler

Date: Wed, 9 Sep 2015 22:29

To: shahab;

Cc: user@spark.apache.org;

Subject:Re: [Spark on Amazon EMR] : File does not exist: 
hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar


If you log into the cluster, do you see the file if you type:

hdfs dfs -ls 
hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

(with the correct server address for "ipx-x-x-x"). If not, is the server 
address correct and routable inside the cluster. Recall that EC2 instances have 
both public and private host names & IP addresses.

Also, is the port number correct for HDFS in the cluster?

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd 
Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe<http://typesafe.com>
@deanwampler<http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Wed, Sep 9, 2015 at 9:28 AM, shahab 
<shahab.mok...@gmail.com<mailto:shahab.mok...@gmail.com>> wrote:
Hi,
I am using Spark on Amazon EMR. So far I have not succeeded to submit the 
application successfully, not sure what's problem. In the log file I see the 
followings.
java.io.FileNotFoundException: File does not exist: 
hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the fat 
jar file didn't solve the problem. I am out of clue now.
I want to submit a spark application, using aws web console, as a step. I 
submit the application as : spark-submit --deploy-mode cluster --class 
mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is there 
any one who has similar problem with EMR?

best,
/Shahab



Re: [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

2015-09-10 Thread shahab
Thank you all for the comments, but my problem still exists.
@Dean,@Ewan yes, I do have hadoop file system installed and working

@Sujit: the last version of EMR (version 4)  does not need manual copying
of jar file to the server. The blog that you pointed out refers to older
version (3.x) of EMR. But I will try your solution as well.
@Neil : I think something is wrong with my fat jar file, I think I am
missing some dependencies in my jar file !

Again thank you all

/Shahab

On Wed, Sep 9, 2015 at 11:28 PM, Dean Wampler  wrote:

> If you log into the cluster, do you see the file if you type:
>
> hdfs dfs
> -ls 
> hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
>
> (with the correct server address for "ipx-x-x-x"). If not, is the server
> address correct and routable inside the cluster. Recall that EC2 instances
> have both public and private host names & IP addresses.
>
> Also, is the port number correct for HDFS in the cluster?
>
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
>  (O'Reilly)
> Typesafe 
> @deanwampler 
> http://polyglotprogramming.com
>
> On Wed, Sep 9, 2015 at 9:28 AM, shahab  wrote:
>
>> Hi,
>> I am using Spark on Amazon EMR. So far I have not succeeded to submit
>> the application successfully, not sure what's problem. In the log file I
>> see the followings.
>> java.io.FileNotFoundException: File does not exist:
>> hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
>>
>> However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the
>> fat jar file didn't solve the problem. I am out of clue now.
>> I want to submit a spark application, using aws web console, as a step. I
>> submit the application as : spark-submit --deploy-mode cluster --class
>> mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is
>> there any one who has similar problem with EMR?
>>
>> best,
>> /Shahab
>>
>
>


Re: [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

2015-09-10 Thread Work
Ewan,


What issue are you having with HDFS when only Spark is installed? I'm not aware 
of any issue like this.




Thanks,

 Jonathan





—
Sent from Mailbox

On Wed, Sep 9, 2015 at 11:48 PM, Ewan Leith <ewan.le...@realitymine.com>
wrote:

> The last time I checked, if you launch EMR 4 with only Spark selected as an 
> application, HDFS isn't correctly installed.
> Did you select another application like Hive at launch time as well as Spark? 
> If not, try that.
> Thanks,
> Ewan
> -- Original message--
> From: Dean Wampler
> Date: Wed, 9 Sep 2015 22:29
> To: shahab;
> Cc: user@spark.apache.org;
> Subject:Re: [Spark on Amazon EMR] : File does not exist: 
> hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
> If you log into the cluster, do you see the file if you type:
> hdfs dfs -ls 
> hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
> (with the correct server address for "ipx-x-x-x"). If not, is the server 
> address correct and routable inside the cluster. Recall that EC2 instances 
> have both public and private host names & IP addresses.
> Also, is the port number correct for HDFS in the cluster?
> dean
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd 
> Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe<http://typesafe.com>
> @deanwampler<http://twitter.com/deanwampler>
> http://polyglotprogramming.com
> On Wed, Sep 9, 2015 at 9:28 AM, shahab 
> <shahab.mok...@gmail.com<mailto:shahab.mok...@gmail.com>> wrote:
> Hi,
> I am using Spark on Amazon EMR. So far I have not succeeded to submit the 
> application successfully, not sure what's problem. In the log file I see the 
> followings.
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
> However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the fat 
> jar file didn't solve the problem. I am out of clue now.
> I want to submit a spark application, using aws web console, as a step. I 
> submit the application as : spark-submit --deploy-mode cluster --class 
> mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is 
> there any one who has similar problem with EMR?
> best,
> /Shahab

[Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

2015-09-09 Thread shahab
 Hi,
I am using Spark on Amazon EMR. So far I have not succeeded to submit the
application successfully, not sure what's problem. In the log file I see
the followings.
java.io.FileNotFoundException: File does not exist:
hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the
fat jar file didn't solve the problem. I am out of clue now.
I want to submit a spark application, using aws web console, as a step. I
submit the application as : spark-submit --deploy-mode cluster --class
mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is
there any one who has similar problem with EMR?

best,
/Shahab


Re: [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

2015-09-09 Thread Dean Wampler
If you log into the cluster, do you see the file if you type:

hdfs dfs
-ls 
hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

(with the correct server address for "ipx-x-x-x"). If not, is the server
address correct and routable inside the cluster. Recall that EC2 instances
have both public and private host names & IP addresses.

Also, is the port number correct for HDFS in the cluster?

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
 (O'Reilly)
Typesafe 
@deanwampler 
http://polyglotprogramming.com

On Wed, Sep 9, 2015 at 9:28 AM, shahab  wrote:

> Hi,
> I am using Spark on Amazon EMR. So far I have not succeeded to submit the
> application successfully, not sure what's problem. In the log file I see
> the followings.
> java.io.FileNotFoundException: File does not exist:
> hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
>
> However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the
> fat jar file didn't solve the problem. I am out of clue now.
> I want to submit a spark application, using aws web console, as a step. I
> submit the application as : spark-submit --deploy-mode cluster --class
> mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is
> there any one who has similar problem with EMR?
>
> best,
> /Shahab
>