Well there are a number of performance tuning guidelines in dedicated
sections of the spark documentation - have you read and applied them 

 

Secondly any performance problem within a distributed cluster environment
has two aspects:

 

1.       Infrastructure 

2.       App Algorithms 

 

You seem to be focusing only on 1, but what you said about the performance
differences between single laptop and cluster points to potential
algorithmic inefficiency in your app when e.g. distributing and performing
parallel processing and data. On a single laptop data moves instantly
between workers because all worker instances run in the memory of a single
machine ..

 

Regards,

Evo Eftimov  

 

From: Manish Gupta 8 [mailto:mgupt...@sapient.com] 
Sent: Thursday, April 16, 2015 6:03 PM
To: user@spark.apache.org
Subject: General configurations on CDH5 to achieve maximum Spark Performance

 

Hi,

 

Is there a document/link that describes the general configuration settings
to achieve maximum Spark Performance while running on CDH5? In our
environment, we did lot of changes (and still doing it) to get decent
performance otherwise our 6 node dev cluster with default configurations,
lags behind a single laptop running Spark.

 

Having a standard checklist (taking a base node size of 4-CPU, 16GB RAM)
would be really great. Any pointers in this regards will be really helpful.

 

We are running Spark 1.2.0 on CDH 5.3.0.

 

Thanks,

 

Manish Gupta

Specialist | Sapient Global Markets

 

Green Boulevard (Tower C)

3rd & 4th Floor

Plot No. B-9A, Sector 62

Noida 201 301

Uttar Pradesh, India

 

Tel: +91 (120) 479 5000

Fax: +91 (120) 479 5001

Email: mgupt...@sapient.com

 

sapientglobalmarkets.com

 

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
(your) computer.

 

***Please consider the environment before printing this email.***

 

Reply via email to