Not sure what would slow it down as the repartition completes equally fast on all nodes, implying that the data is available on all, then there are a few computation steps none of them local on the master.
On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen <so...@cloudera.com> wrote: > What machines are HDFS data nodes -- just your master? that would > explain it. Otherwise, is it actually the write that's slow or is > something else you're doing much faster on the master for other > reasons maybe? like you're actually shipping data via the master first > in some local computation? so the master's executor has the result > much faster? > > On Mon, Apr 20, 2015 at 12:21 PM, jamborta <jambo...@gmail.com> wrote: > > Hi all, > > > > I have a three node cluster with identical hardware. I am trying a > workflow > > where it reads data from hdfs, repartitions it and runs a few map > operations > > then writes the results back to hdfs. > > > > It looks like that all the computation, including the repartitioning and > the > > maps complete within similar time intervals on all the nodes, except > when it > > writes it back to HDFS when the master node does the job way much faster > > then the slaves (15s for each block as opposed to 1.2 min for the > slaves). > > > > Any suggestion what the reason might be? > > > > thanks, > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on-master-node-much-faster-tp22570.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >