Re: Identify the performance bottleneck from hardware prospective

Julaiti Alafate Thu, 05 Mar 2015 14:49:00 -0800

Hi Mitch,

I think it is normal. The network utilization will be high when there is
some shuffling process happening. After that, the network utilization
should come down, while each slave nodes will do the computation on the
partitions assigned to them. At least it is my understanding.


Best,
Julaiti


On Tue, Mar 3, 2015 at 2:32 AM, Mitch Gusat <mgu...@gmail.com> wrote:

> Hi Julaiti,
>
> Have you made progress in discovering the bottleneck below?
>
> While i suspect a configuration setting or program bug, i'm intrigued by 
> "network
> utilization is high for several seconds at the beginning, then drop close
> to 0"... Do you know more?
>
> thanks,
> Mitch Gusat (IBM research)
>
> On Tue, Feb 17, 2015 at 11:20 AM, Julaiti Alafate <jalaf...@eng.ucsd.edu>
> wrote:
>
> > Hi there,
> >
> > I am trying to scale up the data size that my application is handling.
> > This application is running on a cluster with 16 slave nodes. Each slave
> > node has 60GB memory. It is running in standalone mode. The data is coming
> > from HDFS that also in same local network.
> >
> > In order to have an understanding on how my program is running, I also had
> > a Ganglia installed on the cluster. From previous run, I know the stage
> > that taking longest time to run is counting word pairs (my RDD consists of
> > sentences from a corpus). My goal is to identify the bottleneck of my
> > application, then modify my program or hardware configurations according to
> > that.
> >
> > Unfortunately, I didn't find too much information on Spark monitoring and
> > optimization topics. Reynold Xin gave a great talk on Spark Summit 2014 for
> > application tuning from tasks perspective. Basically, his focus is on tasks
> > that oddly slower than the average. However, it didn't solve my problem
> > because there is no such tasks that run way slow than others in my case.
> >
> > So I tried to identify the bottleneck from hardware prospective. I want to
> > know what the limitation of the cluster is. I think if the executers are
> > running hard, either CPU, memory or network bandwidth (or maybe the
> > combinations) is hitting the roof. But Ganglia reports the CPU utilization
> > of cluster is no more than 50%, network utilization is high for several
> > seconds at the beginning, then drop close to 0. From Spark UI, I can see
> > the nodes with maximum memory usage is consuming around 6GB, while
> > "spark.executor.memory" is set to be 20GB.
> >
> > I am very confused that the program is not running fast enough, while
> > hardware resources are not in shortage. Could you please give me some hints
> > about what decides the performance of a Spark application from hardware
> > perspective?
> >
> > Thanks!
> >
> > Julaiti
>
>

Re: Identify the performance bottleneck from hardware prospective

Reply via email to