Check whether your partitioning results in balanced partitions ie partitions 
with similar sizes - one of the reasons for the performance differences 
observed by you may be that after your explicit repartitioning, the partition 
on your master node is much smaller than the RDD partitions on the other 2 
nodes  

-----Original Message-----
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Monday, April 20, 2015 12:57 PM
To: jamborta
Cc: user@spark.apache.org
Subject: Re: writing to hdfs on master node much faster

What machines are HDFS data nodes -- just your master? that would explain it. 
Otherwise, is it actually the write that's slow or is something else you're 
doing much faster on the master for other reasons maybe? like you're actually 
shipping data via the master first in some local computation? so the master's 
executor has the result much faster?

On Mon, Apr 20, 2015 at 12:21 PM, jamborta <jambo...@gmail.com> wrote:
> Hi all,
>
> I have a three node cluster with identical hardware. I am trying a 
> workflow where it reads data from hdfs, repartitions it and runs a few 
> map operations then writes the results back to hdfs.
>
> It looks like that all the computation, including the repartitioning 
> and the maps complete within similar time intervals on all the nodes, 
> except when it writes it back to HDFS when the master node does the 
> job way much faster then the slaves (15s for each block as opposed to 1.2 min 
> for the slaves).
>
> Any suggestion what the reason might be?
>
> thanks,
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on
> -master-node-much-faster-tp22570.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For 
> additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to