Re: Run a self-contained Spark app on a Spark standalone cluster

Ted Yu Sat, 16 Apr 2016 10:01:14 -0700

Kevin:
Can you describe how you got over the Metadata fetch exception ?


> On Apr 16, 2016, at 9:41 AM, Kevin Eid <kevin.e...@mail.dcu.ie> wrote:
> 
> One last email to announce that I've fixed all of the issues. Don't hesitate 
> to contact me if you encounter the same. I'd be happy to help.
> 
> Regards,
> Kevin
> 
>> On 14 Apr 2016 12:39 p.m., "Kevin Eid" <kevin.e...@mail.dcu.ie> wrote:
>> Hi all, 
>> 
>> I managed to copy my .py files from local to the cluster using SCP . And I 
>> managed to run my Spark app on the cluster against a small dataset. 
>> 
>> However, when I iterate over a dataset of 5GB I got the followings: 
>> org.apache.spark.shuffle.MetadataFetchFailedException + please see the 
>> joined screenshots. 
>> 
>> I am deploying 3*m3.xlarge and using the following parameters while 
>> submitting the app: --executor-memory 50g --driver-memory 20g 
>> --executor-cores 4 --num-executors 3. 
>> 
>> Can you recommend other configurations (driver executors number memory) or 
>> do I have to deploy more and larger instances  in order to run my app on 5GB 
>> ? Or do I need to add more partitions while reading the file? 
>> 
>> Best, 
>> Kevin
>> 
>>> On 12 April 2016 at 12:19, Sun, Rui <rui....@intel.com> wrote:
>>> Which py file is your main file (primary py file)? Zip the other two py 
>>> files. Leave the main py file alone. Don't copy them to S3 because it seems 
>>> that only local primary and additional py files are supported.
>>> 
>>> ./bin/spark-submit --master spark://... --py-files <zip file> <main py file>
>>> 
>>> -----Original Message-----
>>> From: kevllino [mailto:kevin.e...@mail.dcu.ie]
>>> Sent: Tuesday, April 12, 2016 5:07 PM
>>> To: user@spark.apache.org
>>> Subject: Run a self-contained Spark app on a Spark standalone cluster
>>> 
>>> Hi,
>>> 
>>> I need to know how to run a self-contained Spark app  (3 python files) in a 
>>> Spark standalone cluster. Can I move the .py files to the cluster, or 
>>> should I store them locally, on HDFS or S3? I tried the following locally 
>>> and on S3 with a zip of my .py files as suggested  here 
>>> <http://spark.apache.org/docs/latest/submitting-applications.html>  :
>>> 
>>> ./bin/spark-submit --master
>>> spark://ec2-54-51-23-172.eu-west-1.compute.amazonaws.com:5080    --py-files
>>> s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@mubucket//weather_predict.zip
>>> 
>>> But get: “Error: Must specify a primary resource (JAR or Python file)”
>>> 
>>> Best,
>>> Kevin
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
>> 
>> 
>> -- 
>> Kevin EID 
>> M.Sc. in Computing, Data Analytics    
>> 
>>

Re: Run a self-contained Spark app on a Spark standalone cluster

Reply via email to