RE: General configurations on CDH5 to achieve maximum Spark Performance

Manish Gupta 8 Thu, 16 Apr 2015 10:21:56 -0700

Thanks Evo. Yes, my concern is only regarding the infrastructure 
configurations. Basically, configuring Yarn (Node manager) + Spark is must and 
default setting never works. And what really happens, is we make changes as and 
when an issue is faced because of one of the numerous default configuration 
settings. And every time, we have to google a lot to decide on the right values 
:)


Again, my issue is very centric to running Spark on Yarn in CDH5 environment.

If you know a link that talks about optimum configuration settings for running 
Spark on Yarn (CDH5), please share the same.

Thanks,
Manish

From: Evo Eftimov [mailto:evo.efti...@isecc.com]
Sent: Thursday, April 16, 2015 10:38 PM
To: Manish Gupta 8; user@spark.apache.org
Subject: RE: General configurations on CDH5 to achieve maximum Spark Performance

Well there are a number of performance tuning guidelines in dedicated sections 
of the spark documentation - have you read and applied them

Secondly any performance problem within a distributed cluster environment has 
two aspects:


1.       Infrastructure

2.       App Algorithms

You seem to be focusing only on 1, but what you said about the performance 
differences between single laptop and cluster points to potential algorithmic 
inefficiency in your app when e.g. distributing and performing parallel 
processing and data. On a single laptop data moves instantly between workers 
because all worker instances run in the memory of a single machine ....

Regards,
Evo Eftimov

From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Thursday, April 16, 2015 6:03 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: General configurations on CDH5 to achieve maximum Spark Performance

Hi,

Is there a document/link that describes the general configuration settings to 
achieve maximum Spark Performance while running on CDH5? In our environment, we 
did lot of changes (and still doing it) to get decent performance otherwise our 
6 node dev cluster with default configurations, lags behind a single laptop 
running Spark.

Having a standard checklist (taking a base node size of 4-CPU, 16GB RAM) would 
be really great. Any pointers in this regards will be really helpful.

We are running Spark 1.2.0 on CDH 5.3.0.

Thanks,

Manish Gupta
Specialist | Sapient Global Markets

Green Boulevard (Tower C)
3rd & 4th Floor
Plot No. B-9A, Sector 62
Noida 201 301
Uttar Pradesh, India

Tel: +91 (120) 479 5000
Fax: +91 (120) 479 5001
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>

sapientglobalmarkets.com

The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any (your) computer.

***Please consider the environment before printing this email.***

RE: General configurations on CDH5 to achieve maximum Spark Performance

Reply via email to