What machines are HDFS data nodes -- just your master? that would
explain it. Otherwise, is it actually the write that's slow or is
something else you're doing much faster on the master for other
reasons maybe? like you're actually shipping data via the master first
in some local computation? so the master's executor has the result
much faster?

On Mon, Apr 20, 2015 at 12:21 PM, jamborta <jambo...@gmail.com> wrote:
> Hi all,
>
> I have a three node cluster with identical hardware. I am trying a workflow
> where it reads data from hdfs, repartitions it and runs a few map operations
> then writes the results back to hdfs.
>
> It looks like that all the computation, including the repartitioning and the
> maps complete within similar time intervals on all the nodes, except when it
> writes it back to HDFS when the master node does the job way much faster
> then the slaves (15s for each block as opposed to 1.2 min for the slaves).
>
> Any suggestion what the reason might be?
>
> thanks,
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on-master-node-much-faster-tp22570.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to