Not sure what would slow it down as the repartition completes equally fast
on all nodes, implying that the data is available on all, then there are a
few computation steps none of them local on the master.

On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen <so...@cloudera.com> wrote:

> What machines are HDFS data nodes -- just your master? that would
> explain it. Otherwise, is it actually the write that's slow or is
> something else you're doing much faster on the master for other
> reasons maybe? like you're actually shipping data via the master first
> in some local computation? so the master's executor has the result
> much faster?
>
> On Mon, Apr 20, 2015 at 12:21 PM, jamborta <jambo...@gmail.com> wrote:
> > Hi all,
> >
> > I have a three node cluster with identical hardware. I am trying a
> workflow
> > where it reads data from hdfs, repartitions it and runs a few map
> operations
> > then writes the results back to hdfs.
> >
> > It looks like that all the computation, including the repartitioning and
> the
> > maps complete within similar time intervals on all the nodes, except
> when it
> > writes it back to HDFS when the master node does the job way much faster
> > then the slaves (15s for each block as opposed to 1.2 min for the
> slaves).
> >
> > Any suggestion what the reason might be?
> >
> > thanks,
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hdfs-on-master-node-much-faster-tp22570.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>

Reply via email to