Problem running Spark shell (1.0.0) on EMR

2014-08-05 Thread Omer Holzinger
I'm having similar problem to:
http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/browser

I'm trying to follow the tutorial at:

When I run: val file = sc.textFile(s3://bigdatademo/sample/wiki/)

I get:

WARN storage.BlockManager: Putting block broadcast_1 failed
java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;

I found a few other people raising this issue, but wasn't able to find a
solution or an explanation.
Have anyone encountered this? Any help or advice will be highly appreciated!

thank you,
  -- Omer


Re: Problem running Spark shell (1.0.0) on EMR

2014-07-22 Thread Martin Goodson
I am also having exactly the same problem, calling using pyspark. Has
anyone managed to get this script to work?


-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Wed, Jul 16, 2014 at 2:10 PM, Ian Wilkinson ia...@me.com wrote:

 Hi,

 I’m trying to run the Spark (1.0.0) shell on EMR and encountering a
 classpath issue.
 I suspect I’m missing something gloriously obviously, but so far it is
 eluding me.

 I launch the EMR Cluster (using the aws cli) with:

 aws emr create-cluster --name Test Cluster  \
 --ami-version 3.0.3 \
 --no-auto-terminate \
 --ec2-attributes KeyName=... \
 --bootstrap-actions
 Path=s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \
 --instance-groups
 InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium  \
 InstanceGroupType=CORE,InstanceCount=1,InstanceType=m1.medium
 --region eu-west-1

 then,

 $ aws emr ssh --cluster-id ... --key-pair-file ... --region eu-west-1

 On the master node, I then launch the shell with:

 [hadoop@ip-... spark]$ ./bin/spark-shell

 and try performing:

 scala val logs = sc.textFile(s3n://.../“)

 this produces:

 14/07/16 12:40:35 WARN storage.BlockManager: Putting block broadcast_0
 failed
 java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;


 Any help mighty welcome,
 ian




Problem running Spark shell (1.0.0) on EMR

2014-07-16 Thread Ian Wilkinson
Hi,

I’m trying to run the Spark (1.0.0) shell on EMR and encountering a classpath 
issue.
I suspect I’m missing something gloriously obviously, but so far it is eluding 
me.

I launch the EMR Cluster (using the aws cli) with:

aws emr create-cluster --name Test Cluster  \
--ami-version 3.0.3 \
--no-auto-terminate \
--ec2-attributes KeyName=... \
--bootstrap-actions 
Path=s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \
--instance-groups 
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium  \
InstanceGroupType=CORE,InstanceCount=1,InstanceType=m1.medium --region 
eu-west-1

then,

$ aws emr ssh --cluster-id ... --key-pair-file ... --region eu-west-1

On the master node, I then launch the shell with:

[hadoop@ip-... spark]$ ./bin/spark-shell

and try performing:

scala val logs = sc.textFile(s3n://.../“)

this produces:

14/07/16 12:40:35 WARN storage.BlockManager: Putting block broadcast_0 failed
java.lang.NoSuchMethodError: 
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;


Any help mighty welcome,
ian